# Amazon S3 Export Integration V1 Amazon Simple Storage Service (Amazon S3) is an object storage service that offers scalability, data availability, security, and performance. You can use it to store and protect any amount of data for things such as data lakes, websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics. Amazon S3 provides features for data organization and configuration of access controls for your business, organization, and compliance requirements. This TD export integration allows you to write job results from Treasure Data directly to Amazon S3. ## What can you do with this Integration? - **Storing data**: Store an infinite amount of data in a bucket. ## Prerequisites - Basic knowledge of Treasure Data, including the [TD Toolbelt](https://toolbelt.treasuredata.com/). - For AWS: the IAM User with `s3:PutObject` and `s3:AbortMultipartUpload` permissions. We recommend that you set no other permissions for the IAM User used for this connection. ## Requirements and Limitations - The query result limit for export to S3 is 100GB. If the query result exceeds the limit, you see the following message in the log: `The number of chunks for multipart upload is exceeded.` Try to split data by query. - The default export format is [CSV RFC 4180](http://www.ietf.org/rfc/rfc4180.txt). - Output in TSV format is also supported. ## Static IP Address of Treasure Data Integration If your security policy requires IP whitelisting, you must add Treasure Data's IP addresses to your allowlist to ensure a successful connection. Please find the complete list of static IP addresses, organized by region, at the following link: [https://api-docs.treasuredata.com/en/overview/ip-addresses-integrations-result-workers/](https://api-docs.treasuredata.com/en/overview/ip-addresses-integrations-result-workers/) ## About S3 Server-Side Encryption You can encrypt upload data with [AWS S3 Server-Side Encryption](http://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html). You don’t need to prepare an encryption key for this. Data will be encrypted on the server side using the 256-bit Advanced Encryption Standard (AES-256). Use the Server-Side Encryption bucket policy if you require server-side encryption for all objects that are stored in your bucket. When you have server-side encryption enabled, you don't have to turn on the **use_sse** option. However, job results might fail if you have bucket policies set to reject HTTP requests without encryption information. ```bash td query \ --result 's3://accesskey:secretkey@/bucketname/path/to/file.csv?use_sse=true&sse_algorithm=AES256' \ -w -d testdb \ "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code" ``` ## About File Formats for S3 The default export format is [CSV RFC 4180](http://www.ietf.org/rfc/rfc4180.txt). Output in TSV format is also supported. Options to customize file format... For both CSV and TSV formats, you can use the following table lists options to customize the final format of the files written into the destination: | Name | Description | Restrictions | CSV default | TSV default | JSONL | | --- | --- | --- | --- | --- | --- | | format | The main setting to specify the file format. | | csv | csv (Use 'tsv' to select the TSV format) | Use JSONL to select JSONL format. | | delimiter | Use to specify the delimiter character. | | , (comma) | \t (tab) | Parameter ignored | | quote | Use to specify the quote character. | not available for TSV format | " (double quote) | (no character) | Parameter ignored | | escape | Specifies the character used to escape other special characters. | not available for TSV format | " (double quote) | (no character) | Parameter ignored | | null | Use to specify how a 'null' value is displayed. | | (empty string) | \N (backslash capital n) | Parameter ignored | | newline | Use to specify the EOL (End-Of-Line) representation. | | (CRLF) | (CRLF) | | | header | Can be used to suppress the column header. | | The column header is printed. Use 'false' to suppress | The column header is printed. Use 'false' to suppress | Parameter ignored | The following example shows a default sample output in CSV format when no customization is requested: ``` code,cnt "200",4981 "302", "404",17 "500",2 ``` When the format=tsv, delimiter=“, and null=NULL options are specified: ```bash td query \ --result 's3://accesskey:secretkey@/bucket_name/path/to/file.tsv?format=tsv&delimiter=%22&null=empty' \ -w -d testdb \ "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code" ``` The access key and secret key must be [URL encoded](http://en.wikipedia.org/wiki/Percent-encoding). The output changes to: ``` "code" "cnt" "200" 4981 "302" NULL "404" 17 "500" 2 ``` # Use the TD Console to Create a Connection In Treasure Data, you must create and configure the data connection before running your query. As part of the data connection, you provide authentication to access the integration. ## Create a New Authentication 1. Open **TD Console.** 2. Navigate to **Integrations Hub** > **Catalog.** 3. Search for AWS S3. 4. Select **Create Authentication.**![](/assets/amazons3.76925fb4451d1607d39ea898d7e17d50476ebaeb9ef9b2152962f3207b76b9f5.ecf98b31.png) 5. The New Authentication dialog opens. You need a client ID and access keys to authenticate using credentials.  ![](/assets/screenshot-2021-10-28-12.40.19.db4fe96762e85ab10bfcc61d02f499bd94d3fbfcc9325dfc0a40492799b9144d.ecf98b31.png) 6. Set the following parameters. | **Parameter** | **Description** | | --- | --- | | **Endpoint** | - S3 endpoint login user name. You can find region and endpoint information from [AWS Document](http://docs.aws.amazon.com/general/latest/gr/rande.md#s3_region). (Ex. [*s3-ap-northeast-1.amazonaws.com*](http://s3-ap-northeast-1.amazonaws.com)) | | **Authentication Method** | | | **basic** | - Uses access_key_id and secret_access_key to authenticate. See [AWS Programmatic access](https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.md). - Access Key ID - Secret access key | | **anonymous** | - Uses anonymous access. This auth method can access only public files. | | **session (Recommended)** | - Uses temporary-generated access_key_id, secret_access_key and session_token. (This authentication method is only available with data import. This can't be used with data export for now.) - Access Key ID - Secret access key - Secret token | | **Access Key ID** | AWS S3 issued | | **Secret Access Key** | AWS S3 issued | 1. Select **Continue**. 2. Name your new AWS S3 connection. 3. Select **Done**. ## Define your Query 1. Complete the instructions in [Creating a Destination Integration](https://docs.treasuredata.com/smart/project-product-documentation/creating-a-destination-integration). 2. Navigate to **Data Workbench > Queries**. 3. Select a query for which you would like to export data. 4. Run the query to validate the result set. 5. Select**Export Results**. 6. Select an existing integration authentication. ![](/assets/amazon-s3-export-integration-v1-2024-06-19-1.9adb829424614e86cf3483ba1168465ed5a14da5e198ae77b6a11f7a7a0da247.ecf98b31.png) 7. Define any additional Export Results details. In your export integration content review the integration parameters. For example, your Export Results screen might be different, or you might not have additional details to fill out. 8. Select **Done**. 9. Run your query. 10. Validate that your data moved to the destination you specified. ## Specify the Result Export Target 1. Select **Export Results**. ![](/assets/image2021-9-7_15-10-56.ee7ed43caab64adefafcc22595462fd8068c974c4f47b5959a7babd7d99972b8.ecf98b31.png) 2. You can select an existing authentication or create a new authentication for the external service to be used for output. Choose one of the following: **Use Existing Integration** ![](/assets/image2021-9-7_15-28-30.d271866c7c3cea4dab234b61bea815a69b186746c80435855b4b86d1f77cc30e.ecf98b31.png) Create a New Integration ![](/assets/image2021-9-7_15-30-17.3285b5d5c406c0a80239f6fb997dba38329830a15dd556d1b57b0b43ca1818be.ecf98b31.png) ![](/assets/image2021-9-7_15-33-54.40fc7ad84a59b94dc3c08c45ae41d10835d9dd527acff5841e81d82eb87ecf38.ecf98b31.png) **(Optional) Specify information for Export to Amazon S3** ![](/assets/s3_v1_export_settings.3a38230cb120be2f82519bfc35782ec845e76d945dfe3d9339385863ef441e17.ecf98b31.png) | Field | Description | | --- | --- | | Use AWS S3 Server-Side Encryption | If selected, please select AES256 as **Server-Side Encryption algorithm** | | Bucket | Provide the S3 bucket name | | Path | Specify the path as the exported file includes file name | | Part Size | Specify the target part size for multipart upload Default: 10 (MB), min: 10, max: 5000 | | Format | Format of the exported file - csv - tsv - jsonl | | Compression | The compression format of the exported files: - None - gz | | Include header line? | If selected, the header line with column name as the first line. | | Delimiter | The delimited character: - Default - , - Tab - | | | String for null cells | How null values of the query result are displayed: - Default - empty string - \N - NULL - null | | End-of-line character | The EOL (end-of-line) character: - CRLF - LF - CR | | Quote character | The character used for quotes in the exported file. Only quote those fields which contain delimiter, quote, or any of characters in line terminator. | | Escape character | The escape character used in the exported file | ## Integration Parameters for S3 Define the following transfer parameters: ![](/assets/image2020-12-7_15-10-9.1d1b28978caf00dc67d1d977dcb97d6ae2d6da04eb4f4944fbd2bc71c8a110b0.ecf98b31.png) - **If the `Use AWS S3 Server-Side Encryption` box is checked:** - **Server-Side Encryption algorithm.***(Ex. AES256).* - **Bucket**: Provide the S3 bucket name (Ex. your_bucket_name). - **Path**: Specify a prefix for target keys (Ex. logs/data_). - **Format**: Format of the exported files (Ex. *csv (comma separated or tab separated*)). - **Compression**: The compression format of the exported files *(Ex. None or gz).* - **Delimiter**: Use to specify the delimiter character *(Ex, (comma)).* - **String for null cells**: Placed holder to insert for null values *(Ex. Empty String).* - **End-of-line character**: Specify the EOL(End-Of-Line) representation *(Ex. CRLF).* - **Quote Character (Optional)**: The character used for quotes in the exported file (Ex. "). Only quote the fields which contain the delimiter, quote, or any of the characters in the lineterminator. - **Escape character (Optional)**: The escape character used in the exported file. ### Example Query For example: ```sql SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code ``` 1. Verify the results in the Amazon S3 bucket that you specified while entering the transfer details. ## Activate a Segment in Audience Studio You can also send segment data to the target platform by creating an activation in the Audience Studio. 1. Navigate to **Audience Studio**. 2. Select a parent segment. 3. Open the target segment, right-mouse click, and then select **Create Activation.** 4. In the **Details** panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters. 5. Customize the activation output in the **Output Mapping** panel. ![](/assets/ouput.b2c7f1d909c4f98ed10f5300df858a4b19f71a3b0834df952f5fb24018a5ea78.8ebdf569.png) - Attribute Columns - Select **Export All Columns** to export all columns without making any changes. - Select **+ Add Columns** to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select **+ Add Columns**to add new columns for your activation output. - String Builder - **+ Add string** to create strings for export. Select from the following values: - String: Choose any value; use text to create a custom value. - Timestamp: The date and time of the export. - Segment Id: The segment ID number. - Segment Name: The segment name. - Audience Id: The parent segment number. 1. Set a **Schedule**. ![](/assets/snippet-output-connector-on-audience-studio-2024-08-28.a99525173709da1eb537f839019fa7876ffae95045154c8f2941b030022f792c.8ebdf569.png) - Select the values to define your schedule and optionally include email notifications. 1. Select **Create**. If you need to create an activation for a batch journey, review [Creating a Batch Journey Activation](/products/customer-data-platform/journey-orchestration/batch/creating-a-batch-journey-activation). - [Achieving Time Partitioning in S3 of Data Exported using Bulk Export](/int/achieving-time-partitioning-in-s3-of-data-exported-using-bulk-export) ## (Optional) Schedule Query Export Jobs You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify. Treasure Data's scheduler feature supports periodic query execution to achieve high availability. When two specifications provide conflicting schedule specifications, the specification requesting to execute more often is followed while the other schedule specification is ignored. For example, if the cron schedule is `‘0 0 1 * 1’`, then the ‘day of month’ specification and ‘day of week’ are discordant because the former specification requires it to run every first day of each month at midnight (00:00), while the latter specification requires it to run every Monday at midnight (00:00). The latter specification is followed. ### (Optional) Schedule Query Export Jobs You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify. Treasure Data's scheduler feature supports periodic query execution to achieve high availability. When two specifications provide conflicting schedule specifications, the specification requesting to execute more often is followed while the other schedule specification is ignored. For example, if the cron schedule is `'0 0 1 * 1'`, then the 'day of month' specification and 'day of week' are discordant because the former specification requires it to run every first day of each month at midnight (00:00), while the latter specification requires it to run every Monday at midnight (00:00). The latter specification is followed. #### Scheduling your Job Using TD Console 1. Navigate to **Data Workbench > Queries** 2. Create a new query or select an existing query. 3. Next to **Schedule**, select None. ![](/assets/image2021-1-15_17-28-51.f1b242f6ecc7666a0097fdf37edd1682786ec11ef80eff68c66f091bc405c371.0f87d8d4.png) 4. In the drop-down, select one of the following schedule options: ![](/assets/image2021-1-15_17-29-47.45289a1c99256f125f4d887e501e204ed61f02223fde0927af5f425a89ace0c0.0f87d8d4.png) | Drop-down Value | Description | | --- | --- | | Custom cron... | Review [Custom cron... details](#custom-cron-details). | | @daily (midnight) | Run once a day at midnight (00:00 am) in the specified time zone. | | @hourly (:00) | Run every hour at 00 minutes. | | None | No schedule. | #### Custom cron... Details ![](/assets/image2021-1-15_17-30-23.0f94a8aa5f75ea03e3fec0c25b0640cd59ee48d1804a83701e5f2372deae466c.0f87d8d4.png) | **Cron Value** | **Description** | | --- | --- | | `0 * * * *` | Run once an hour. | | `0 0 * * *` | Run once a day at midnight. | | `0 0 1 * *` | Run once a month at midnight on the morning of the first day of the month. | | "" | Create a job that has no scheduled run time. | ``` * * * * * - - - - - | | | | | | | | | +----- day of week (0 - 6) (Sunday=0) | | | +---------- month (1 - 12) | | +--------------- day of month (1 - 31) | +-------------------- hour (0 - 23) +------------------------- min (0 - 59) ``` The following named entries can be used: - Day of Week: sun, mon, tue, wed, thu, fri, sat. - Month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec. A single space is required between each field. The values for each field can be composed of: | Field Value | Example | Example Description | | --- | --- | --- | | A single value, within the limits displayed above for each field. | | | | A wildcard `'*'` to indicate no restriction based on the field. | `'0 0 1 * *'` | Configures the schedule to run at midnight (00:00) on the first day of each month. | | A range `'2-5'`, indicating the range of accepted values for the field. | `'0 0 1-10 * *'` | Configures the schedule to run at midnight (00:00) on the first 10 days of each month. | | A list of comma-separated values `'2,3,4,5'`, indicating the list of accepted values for the field. | `0 0 1,11,21 * *'` | Configures the schedule to run at midnight (00:00) every 1st, 11th, and 21st day of each month. | | A periodicity indicator `'*/5'` to express how often based on the field's valid range of values a schedule is allowed to run. | `'30 */2 1 * *'` | Configures the schedule to run on the 1st of every month, every 2 hours starting at 00:30. `'0 0 */5 * *'` configures the schedule to run at midnight (00:00) every 5 days starting on the 5th of each month. | | A comma-separated list of any of the above except the `'*'` wildcard is also supported `'2,*/5,8-10'`. | `'0 0 5,*/10,25 * *'` | Configures the schedule to run at midnight (00:00) every 5th, 10th, 20th, and 25th day of each month. | 1. (Optional) You can delay the start time of a query by enabling the Delay execution. ### Execute the Query Save the query with a name and run, or just run the query. Upon successful completion of the query, the query result is automatically exported to the specified destination. Scheduled jobs that continuously fail due to configuration errors may be disabled on the system side after several notifications. (Optional) You can delay the start time of a query by enabling the Delay execution. # (Optional) Configure Export Results in Workflow Within Treasure Workflow, you can specify the use of this data connector to export data. - [About Using Workflows to Export Data with TD Toolbelt](https://docs.treasuredata.com/display/PD/About+Using+Workflows+to+Export+Data+with+TD+Toolbelt) for more information on using data connectors in the workflow to export data. - [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/tree/e5d13703022cb6a3f608f9bd0d9ccba07f93229f/scenarios/result_export/export_result_s3) to see an example workflow. - [About Workflow Secrets Management](https://docs.treasuredata.com/smart/project-product-documentation/about-workflow-secret-management) to learn more about how to configure secrets to mask credentials in your workflow. Learn more at [Using Workflows to Export Data with the TD Toolbelt](https://docs.treasuredata.com/display/PD/About+Using+Workflows+to+Export+Data+with+TD+Toolbelt). ``` timezone: UTC _export: td: database: sample_datasets +td-result-into-s3: td>: queries/sample.sql result_connection: your_connections_name result_settings: bucket: your_bucket path: /path/file_${moment(session_time).format("YYYYMMDD")}.csv.gz compression: 'gz' header: true newline: \r\n "null": "hoge" ``` # (Optional) Export Integration Using the CLI If the TD Console is not available or does not meet your needs, you can use the CLI to issue queries and output results. Format the query output results using the CLI. ## Required The access key and secret key must be [URL encoded](http://en.wikipedia.org/wiki/Percent-encoding). ## Define the Query Export in CLI To output the result of a single query to an S3 buck add the --result option to the td query command. After the job is finished, the results are written into your database. For on-demand jobs, just add the --result option to the td query command. After the job is finished, the results are written to the S3 bucket with the given name and path. The access key and secret key must be [URL encoded](http://en.wikipedia.org/wiki/Percent-encoding). ```bash td query \ --result 's3://accesskey:secretkey@/bucketname/path/to/file.csv.gz?compression=gz' \ -w -d testdb \ "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code" ``` For security reasons, you may want to use [AWS IAM](http://aws.amazon.com/iam/) to manage storage write and access permissions. You can specify the compression option (only gz is allowed at this moment) in —result URL to compress the result. Without the compression parameter, it generates uncompressed data. The access key and secret key must be [URL encoded](http://en.wikipedia.org/wiki/Percent-encoding). ```bash td query \ --result 's3://accesskey:secretkey@/bucketname/path/to/file.csv' \ -w -d testdb \ "SELECT code, COUNT(1) AS cnt FROM www_access GROUP BY code" ```