Writing Job Results into AWS S3 / Riak CS

This article explains how to write job results directly to AWS S3 / Riak CS.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • For Riak CS bucket with upload permission.
  • For AWS: the IAM User with s3:PutObject and s3:AbortMultipartUpload permissions. We recommend that the permissions of the IAM User used for this feature to have no other permissions.

Usage



For on-demand jobs, just add the --result option to the td query command. After the job is finished, the results are written to the S3 bucket with the given name and path.

$ td query \
  --result 's3://accesskey:secretkey@/bucketname/path/to/file.csv.gz?compression=gz' \
  -w -d testdb \
  "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code"
Untitled-3
The access key and secret key must be URL encoded.

For security reasons, you may want to use AWS IAM to manage storage write access permissions.

You can specify compression option (only gz is allowed at this moment) in —result URL to compress the result. Without compression parameter it generates uncompressed data.

$ td query \
  --result 's3://accesskey:secretkey@/bucketname/path/to/file.csv' \
  -w -d testdb \
  "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code"
Untitled-3
The access key and secret key must be URL encoded.

The default result output format is CSV RFC 4180. Output in TSV format is also supported. For both CSV and TSV formats these are the available options to customize the final format of the files written into the destination:

Name Description Restrictions CSV default TSV default
format Main setting to specify the file format csv csv (Use ‘tsv’ to select the TSV format)
delimiter Used to specify the delimiter character , (comma) \t (tab)
quote Used to specify the quote character not available for TSV format “ (double quote) (no character)
escape Specifies the character used to escape other special characters not available for TSV format “ (double quote) (no character)
null Used to specify how a ‘null’ value is displayed (empty string) \N (backslash capital n)
newline Used to specify the EOL (End-Of-Line) representation \r\n (CRLF) \r\n (CRLF)
header Can be used to suppress the column header column header printed. Use ‘false’ to suppress column header printed. Use ‘false’ to suppress

This is the a default sample output in CSV format when no customization is requested:

code,cnt
"200",4981
"302",
"404",17
"500",2

When the format=tsv, delimiter=“, and null=NULL options are specified:

$ td query \
  --result 's3://accesskey:secretkey@/bucket_name/path/to/file.tsv?format=tsv&delimiter=%22&null=empty' \
  -w -d testdb \
  "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code"
Untitled-3
The access key and secret key must be URL encoded.

the output will change to:

"code" "cnt"
"200" 4981
"302" NULL
"404" 17
"500" 2

Last modified: Oct 20 2015 02:50:38 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.