Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

hiddentrue

Keep the introduction brief. The customer is looking for this integration because they already have the product.

...

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. Amazon S3 provides features for data organization and configuration of access controls for your business, organization, and compliance requirements.

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. Amazon S3 provides features for data organization and configuration of access controls for your business, organization, and compliance requirements.

...

hiddentrue

Replace the product name in the following sentence:

...

This TD export integration allows you to write job results from Treasure Data directly to

...

Amazon S3.


This TD export integration allows you to write job results from Treasure Data directly to Amazon S3.topic includes:

Table of Contents
maxLevel1

What can you do with this Integration?

...

hiddentrue

The Product Manager needs to add two to three bullets to explain how the customer can benefit from this integration.

Example:

...

...

Opt-out: When an end-user updates the consent choosing not to participate.

...

Upload your own tracking of events, purchases, or custom profile attributes. This information can help you build precise customer segments that enhance the user experience for your campaigns.

  • Create buckets: Create and name a bucket that stores data.
  • Storing data: Store an infinite amount of data in a bucket.

This topic includes:

Table of Contents
maxLevel3
excludeWhat can you do with this Integration?

Prerequisites

Excerpt
hiddentrue

Include a bulleted list of prerequisites the customer needs to successfully implement this export integration.

Example:

  • Basic Knowledge of Treasure Data.

  • Basic knowledge of Airship / Airship Audience Lists

...

Differences between Amazon S3 Export Integration v2 and Amazon S3 Export Integration v1

Review the information in the following table to understand the differences and potential advantages between v2 and v1.

FeatureAmazon S3 v2Amazon S3 v1
Server-side Encryption with Customer Master Key (CMK)
stored in AWS Key Management Service
X
Support for Quote Policy for output data formatX
Support Assume Role authentication methodX

Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

  • For AWS: the IAM User :

    • with s3:PutObject, s3:AbortMultipartUpload permissions permissions.

    • with kms:Decrypt, kms:GenerateDataKey* permissions when selecting the sse-kms setting.

Requirements and Limitations

...

...

List any product or integration limitations that affect the function of the integration.

Example:

  • The default query result limit for export to S3 is 100GB.   you You could config part size setting up to 5000 (MB), the file limit will be about 5TB.

  • The default export format is CSV RFC 4180.

  • Output in TSV, JSONL format is also supported.

Static IP Address of Treasure Data

The static IP address of Treasure Data is the access point and source of the linkage for this Integration. To determine the static IP address, contact your Customer Success representative or Technical support.

About S3 Server-Side Encryption

...

Use the Server-Side Encryption bucket policy if you require server-side encryption for all objects that are stored in your bucket. When you have the server-side encryption enabled, you don't have to turn on the SSE option. However, job results might may fail if you have bucket policies to reject HTTP requests without encryption information.

...

When you enable AWS KMS for server-side encryption in Amazon S3

...

:

  • If you don't input the KMS Key ID, it will create/using use the default KMS key.

  • if If you input the KMS Key  Key ID, you must choose asymmetric symmetric CMK , (not asymmetric CMKs).

  • The AWS KMS CMK must be in the same Region as the bucket.

About File Formats for S3

Expand
titleOptions to customize file format...

For both the CSV, TSV , and JSONL formats, the following table lists the options you can use to customize the final format of the files written into the destination:

Name

Description

Restrictions

CSV default

TSV default

JSONL

formatFormat

Main The main setting to specify the file format.


csv

csv (Use ‘tsv’ to select the TSV format)

Use JSONL to select JSONL format

delimiterDelimiter

Use to specify the delimiter character.


, (comma)

\t (tab)

parameter Parameter ignored
quote Quote policyUse to determine field type to quote.
MINIMALMINIMALparameter Parameter ignored

quoteQuote

Use to specify the quote character

not Not available for TSV format

“ (double quote)

(no character)

parameter Parameter ignored

escapeEscape

Specifies the character used to escape other special characters.

not Not available for TSV format

“ (double quote)

(no character)

parameter Parameter ignored

nullNull

Use to specify how a ‘null’ value is displayed.


(empty string)

\N (backslash capital n)

parameter Parameter ignored

newlineNewline

Use to specify the EOL (End-Of-Line) representation.


\r\n (CRLF)

\r\n (CRLF)

\r\n (CRLF)

headerHeader

Can be used to suppress the column header.


The column header is printed. Use ‘false’ to suppress.

the The column header is printed. Use ‘false’ to suppress.

parameter Parameter ignored


Expand
titleOptions to customize file format...

The following example shows a default sample output in CSV format when no customization is requested:

Code Block
linenumberstrue
code,cnt
200,4981
302,
404,17
500,2


When the format=tsv, delimiter=|, and null=NULL options are specified. The output changes to:

Code Block
linenumberstrue
code|cnt
200|4981
302|NULL
404|17
500|2


When the format=jsonl. The output changes to:

Code Block
linenumberstrue
{"code": 200, "cnt": 4981}
{"code": 302, "cnt": null}
{"code": 404, "cnt": 17}
{"code": 500, "cnt": 2}



Use the TD Console to Create a Connection

In Treasure Data, you must create must create and configure the data connection prior to before running your query. As part of the data connection,  you you provide authentication to access the integration.

...

Numbered Headings
start-numbering-with1
start-numbering-ath5
Open TD Console.
Navigate to Integrations Hub Catalog.
Search for S3 and select
Image Removed
AmazonS3.
Select Create Authentication.
Image Added
Type the credentials to authenticate:
ParameterDescription

Endpoint

S3 service endpoint override. You can find region and endpoint information from AWS Document. (Ex. s3.ap-northeast-1.amazonaws.com)

 When specified, it will override the region setting.
RegionAWS Region
Authentication Methodbasic
  • Uses access_key_id and secret_access_key to authenticate. See AWS Programmatic access.

    • Access Key ID

    • Secret access key

session (Recommended)
  • Uses temporary-generated access_key_id, secret_access_key and session_token.

    • Access Key ID

    • Secret access key

    • Secret token

assume_role
  • Uses role access. See AWS AssumeRole

    • TD's Instance Profile

    • Account ID

    • Your Role Name

    • External ID
    • Duration In Seconds
anonymousNot Support
Access Key IDAWS S3 issued
Secret Access KeyAWS S3 issued
Image Removed


Image Added

Info

Create authentication with the assume_role authentication method 

  1. Create a new authentication with the assume_role authentication method
    Image Added
  2. Create your AWS IAM role
    Image AddedImage Added

Select Continue
Type a name for your connection.
Select Done.


Define your Query

Numbered Headings
  1. Complete the instructions in Creating a Destination Integration.
  2. Navigate to Data Workbench > Queries.

  3. Select a query for which you would like to export data.

  4. Run the query to validate the result set.

  5. Select Export Results.

Image Modified

Select an existing integration authentication.
Image Removed


Specify the Result Export Target

Numbered Headings
start-numbering-with1
start-numbering-ath5
 Select Export Results.Image Added
You can select an existing authentication or create a new authentication for the external service to be used for output. Choose one of the following:

Use Existing Integration

Image Added

Create a New Integration

Image Added

Image Added

(Optional) Specify information for Export to Amazon S3.

Image Added

FieldDescription
Is user directory Root?

If selected, the user directory is treated as the root directory.

(ex. ‘/home/treasure-data’ as ‘/’)

Path prefix:The file path where the file will be stored.
Rename file after upload finishIf selected, SFTP result output renames the file on the remote SFTP server from “.xxx.tmp” to “.xxx” after all the data is transferred.
Some MA tools try to import data when a file with a specific name exists on the SFTP server. The temp name option is useful for such cases.
Format

The format of the exported files:

  • csv (comma separated)
  • tsv (tab separated) 
Compression

The compression format of the exported files:

  • None
  • GZ
  • bzip2
Header line?The header line with column name as the first line.
Delimiter

The delimited character:

  • Default
  • ,
  • Tab
  • |
Quote policy

The policy for a quote:

  • ALL
  • MINIMAL:  Add the quote character to only fields which contain delimiter, quote, or any of the characters in lineterminator.
  • NONE
Null string

How null value of the result of the query displays:

  • Default
  • empty string
  • \N
  • NULL
  • null
End-of-line character

The EOL (end-of-line) character:

  • CRLF
  • LF
  • CR
Temp filesize threshold

The maximum file size (in bytes) of a local temp file. When the temp file reaches the threshold, the file flushes to a remote file.

If you encounter the error `channel is broken`, reduce the value of this option to resolve the error.


Create an Activation Using an Integration

Include Page
PD:Create an Activation
PD:Create an Activation


Integration Export Parameters for S3 

  1. Define any additional Export Results details

...

  1. and content review the integration parameters.
    For example, your Export Results screen might be different, or you might not have additional details to fill out

...

  1. .Image AddedImage Added

...

  1. Select Done.
  2. Run your query
  3. Validate that your data moved to the destination you specified.

...

Image RemovedImage Removed


ParameterData TypeRequired?Supported in V1?Description
Server-side EncryptionString
yes, only sse-s3

Support values:

  • sse-s3: Server-side Encryption Mode

  • sse-kms: new SSE Mode

Server-side Encryption AlgorithmString
yes

Support value:

  • SEA256 AES256 
KMS Key IDString
noSymmetric AWS KMS Key Id, if not input KMS key idID. If there is no input for the KMS Key ID, it will create/using use the default KMS keyKey.
BucketStringyesyes

Provide the S3 bucket name (Ex., your_bucket_name).

PathStringyesyesSpecify the s3 filename (object key), and include an extension (Ex. test.csv).
FormatString
yesFormat of the exported file: csv, tsv, jsonl
Compression String
yesThe compression format of the exported files (Ex., None or gz)
HeaderBoolean
yesInclude a header in the exported file.
DelimiterString
yesUse to specify the delimiter character (Ex., (comma))
String for NULL valuesString
yesPlaced holder to insert for null values (Ex. Empty String)
End-of-line characterString
yes Specify the EOL(End-Of-Line) representation (Ex. CRLF, LF)
Quote PolicyString
noUse to determine field type to quote. Support values:
  • ALL    Quote all fields
  • MINIMAL    Only quote those fields which contain delimiter, quote or any of the characters in the lineterminator.
  • NONE    Never quote fields. When the delimiter occurs in the field, escape with escape char.

Default value: MINIMAL

Quote character (Optional)Char
yesThe character used for quotes in the exported file (Ex. "). Only quote those fields which contain the delimiter, quote, or any of the characters in the lineterminator. If the input is more than 1 character, the default value will be used.
Escape character(Optional)Char
yes

The escape character is used in the exported file. If the input is more than 1 character, the default value will be used.

Part Size (MB) (Optional)Integer
no

The part size in multipart upload upload.

Default 10, min 5, max 5000

...

Code Block
languagesql
linenumberstrue
SELECT * FROM www_access


(Optional) Schedule

...

Query Export Jobs

You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify.

...

Within Treasure Workflow, you can specify the use of this data connector to export data.

Learn more at Using Workflows to Export Exporting Data with the TD ToolbeltParameters.

S3 (v2) Configuration Keys

...

Code Block
_export:
  td:
  database: td.database

+s3v2_test_export_task:
  td>: export_s3v2_test.sql
  database: ${td.database}
  result_connection: s3v2_conn
  result_settings:
  	bucket: my-bucket
  	path: /path/to/target.csv
  	sse_type: sse-s3
  	format: csv
  	compression: gz
  	header: false
    delimiter: default
    null_value:  empty
    newline: LF
  	quote_policy: MINIMAL
  	escape: '"'
  	quote: '"'
  	part_size: 20

(Optional)

...

Export

...

Integration Using the CLI

To output the result of a single query to an S3 buck add the --result option to the td query command. After the job is finished, the results are written into your s3.
You can specify detailed settings to export your S3 via the --result parameter. 

...

Creating authentication with Assume Role is only supported through the console. Attempting to create it through the TD CLI will result in an error.

Example CLI Command for S3 (v2)

Code Block
languagesql
linenumberstrue
td query \
--result '{"type":"s3_v2","auth_method":"basic","region":"us-east-2","access_key_id": "************","secret_access_key":"***************","bucket":"bucket_name","path":"path/to/file.csv","format":"csv","compression":"none","header":true,"delimiter":"default","null_value":"default","newline":"CRLF","quote_policy":"NONE","part_size":10}' \
-w -d testdb \
"SELECT 1 as col" -T presto

...