Skip to content
Last updated

Repro Export Integration

Learn more about Repro Import Integration.

You can use the new Repro Export Integration connector to export files to your Repro’s Amazon S3 buckets with customized parameters for an easy configuration.

Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.
  • An S3 bucket with region ID.

Using the TD Console to Create Your Connection

Create a New Connection

When you create a data connection, you must provide authentication to access the integration. In Treasure Data, you configure the authentication first and then specify the source information.

  1. Open TD Console.
  2. Navigate to Integrations HubCatalog
  3. Search for and select Repro.
  4. The following dialog opens:

  1. Enter a name for your connection and click Done

Configure Export Results in Your Data Connection

In this step you create or reuse a query. In the query, you configure the data connection. You may need to define the column mapping in the query.

Configure the Connection by Specifying the Parameters

  1. Open the TD console.
  2. Navigate to Data Workbench > Queries.
  3. Select the query that you plan to use to export data.
  4. Click Export Results located at top of your query editor. The Choose Integration dialog opens. You have two options when selecting a connection to use to export the results, using an existing connection or creating a new one.

Use an existing connection

  1. Type the connection name in the search box to filter.
  2. Select your connection.

Create a new Repro Connection.

  1. Fill in the field values to create the new connection.
  2. Enter the required credentials for your new connection. Set the following parameters.
ParameterDescription
Use AWS S3 Server-Side Encryption (optional)Use S3 Server-side encryption
Server-Side Encryption algorithm: (optional)The algorithm used for encryption
Bucket (required)The bucket name in s3
File Path (required)The full path of the file includes file name and extension, i.e: production/<app_id>/user-list/filename.csv.gz
Format (required)File export to csv, required extension in the file path
Header line (required)The export file will contain first row as columns name
Null String (optional)The value that will replace null in the file
End-of-line character (optional)Character marked as end-of-line in the file
Quote Policy (required)Quote policy for the file
Compression (required)- Compress the file in gz, required extension in the file path - PGP Encryption  - the file is encrypted  using the public key before being uploaded
Public Key- The public key is used to encrypt the file before being uploaded
Key Identifier- Specifies the Key ID of the encryption subkey used to secure the file. The master key is excluded from the encryption process.
Amor- Whether to use ASCII armor or not
Compression Type- Defines the compression algorithm used to compress the file, which will be compressed before encryption for uploading to the SFTP server. - Note: Please ensure that you compress your file before encrypting and uploading. When you decrypt it, the file will return to a compressed format such as .gz or .bz2.

Here is a sample configuration:

Example of a Query to Populate Repro

From Treasure Data, run the following query with export results into a connection for Repro:

Code Example

SELECT
 an_email_column AS EMAIL,
 another_phone_column AS PHONE
FROM
 your_table;

(Optional) Schedule Query Export Jobs

You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify.

Treasure Data's scheduler feature supports periodic query execution to achieve high availability.

When two specifications provide conflicting schedule specifications, the specification requesting to execute more often is followed while the other schedule specification is ignored.

For example, if the cron schedule is '0 0 1 * 1', then the 'day of month' specification and 'day of week' are discordant because the former specification requires it to run every first day of each month at midnight (00:00), while the latter specification requires it to run every Monday at midnight (00:00). The latter specification is followed.

Scheduling your Job Using TD Console

  1. Navigate to Data Workbench > Queries

  2. Create a new query or select an existing query.

  3. Next to Schedule, select None.

  4. In the drop-down, select one of the following schedule options:

    Drop-down ValueDescription
    Custom cron...Review Custom cron... details.
    @daily (midnight)Run once a day at midnight (00:00 am) in the specified time zone.
    @hourly (:00)Run every hour at 00 minutes.
    NoneNo schedule.

Custom cron... Details

Cron ValueDescription
0 * * * *Run once an hour.
0 0 * * *Run once a day at midnight.
0 0 1 * *Run once a month at midnight on the morning of the first day of the month.
""Create a job that has no scheduled run time.
 *    *    *    *    *
 -    -    -    -    -
 |    |    |    |    |
 |    |    |    |    +----- day of week (0 - 6) (Sunday=0)
 |    |    |    +---------- month (1 - 12)
 |    |    +--------------- day of month (1 - 31)
 |    +-------------------- hour (0 - 23)
 +------------------------- min (0 - 59)

The following named entries can be used:

  • Day of Week: sun, mon, tue, wed, thu, fri, sat.
  • Month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec.

A single space is required between each field. The values for each field can be composed of:

Field ValueExampleExample Description
A single value, within the limits displayed above for each field.
A wildcard '*' to indicate no restriction based on the field.'0 0 1 * *'Configures the schedule to run at midnight (00:00) on the first day of each month.
A range '2-5', indicating the range of accepted values for the field.'0 0 1-10 * *'Configures the schedule to run at midnight (00:00) on the first 10 days of each month.
A list of comma-separated values '2,3,4,5', indicating the list of accepted values for the field.0 0 1,11,21 * *'Configures the schedule to run at midnight (00:00) every 1st, 11th, and 21st day of each month.
A periodicity indicator '*/5' to express how often based on the field's valid range of values a schedule is allowed to run.'30 */2 1 * *'Configures the schedule to run on the 1st of every month, every 2 hours starting at 00:30. '0 0 */5 * *' configures the schedule to run at midnight (00:00) every 5 days starting on the 5th of each month.
A comma-separated list of any of the above except the '*' wildcard is also supported '2,*/5,8-10'.'0 0 5,*/10,25 * *'Configures the schedule to run at midnight (00:00) every 5th, 10th, 20th, and 25th day of each month.
  1. (Optional) You can delay the start time of a query by enabling the Delay execution.

Execute the Query

Save the query with a name and run, or just run the query. Upon successful completion of the query, the query result is automatically exported to the specified destination.

Scheduled jobs that continuously fail due to configuration errors may be disabled on the system side after several notifications.

(Optional) You can delay the start time of a query by enabling the Delay execution.

Optional: Configure Export Results in Workflow

Within Treasure Workflow, you can specify the use of this data connector to export data.

timezone: UTC

_export:
  td:
    database: sample_datasets

+td-result-into-target:
  td>: queries/sample.sql
  result_connection: your_connections_name
  result_settings:
      type: repro
      bucket: bucket_name
      region: ap-northeast-2
      use_sse: true
      sse_algorithm: AES256
      auth_method: basic
      session_token: session_token
      path: /td-export-repro/file_output.csv
      access_key_id: access_id
      secret_access_key: secret_key
      formatter: {type: csv, delimiter: "\t", newline: CRLF, newline_in_field: LF, charset: UTF-8,
      quote_policy: MINIMAL, quote: '"', escape: \, null_string: \N, default_timezone: UTC}
      encoders: {type: gzip}

Click here for more information on using data connectors in workflow to export data.

References