You can use the Treasure Data connector to write job results to an Treasure Data database. Usually, this connector is used to export data from one Treasure Data account to another Treasure Data account.

For sample workflows on writing job results from one TD account to another TD account, view Treasure Boxes.

We strongly recommend that you not use this connector to write data from one table in your account to another table in your account.
Instead, use "INSERT INTO" query or "INSERT OVERWRITE/CREATE TABLE AS SELECT" query for the best performance.
Presto: "CREATE TABLE AS SELECT" or "INSERT INTO"
Hive: "INSERT INTO/OVERWRITE"


Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

Basic Usage

You can configure a connection from TD Console or from the CLI.

Create a Connection in TD Console

You can configure a connection from within TD Console. To complete an account-to-account transfer, begin from the target Treasure Data account. You must copy the API Key information from the target account profile.

Retrieve the API Key Information

  1. Navigate to the TD Console > My Settings. You must have admin permissions.


  2. Select API Keys.


  3. Validate access, if needed.


  4. When validation is complete and you return to the profile, and you can see both a master key and write-only key. Highlight the Write-only API key. By selecting it, it automatically copies to your clipboard.



If your data transfer involves updating the existing data, e.g. with mode=replace, you'll need to use the master key instead of the write-only key

Create a New Authentication

When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and then specify the source information.

  1. Open TD Console.

  2. Navigate to Integrations HubCatalog.

  3. Search and select the Treasure Data tile.


  4. Select Create. Enter the write-only API Key from the target account. In API Hostname enter the server to use for export. You can leave the field blank to use the default server for your region. Alternatively, you can enter one of the following values:

  5. Select Continue. Name your connection and select Create Connection.

If you are writing data from one region to another, e.g. from AWS US to AWS JP the API Hostname field is required.

Specify Output in your Treasure Data Connector

To create a transfer, create or reuse a query that configures the data connection.

  1. Open the TD Console.

  2. Navigate to the Data Workbench > Queries. Highlight the query that you plan to use to export data.

  3. Select Export Results in the top right of the query editor.

    SELECT code, COUNT(1) FROM www_access GROUP BY code
  4. The Choose Integration dialog opens.


  5. Select Use Existing Integration and search for the name of the saved authentication.

Specifying the Parameters

  1. After selecting the authentication, select Next. The Export Results dialog opens.


  2. Specify the parameters and select Done.

Parameters

Description

Database name (required)

The name of an existing database.

Table name (required)

The table name. A new table will be created if the table does not exist.

Legacy mode (optional)

The flag to choose to use the legacy Treasure Data connector instead of the new Treasure Data connector.

Mode (required)

Choose the Append or Replace method for importing the data.

  • Append (default)-Data import results are appended to the table. If the table does not exist, it will be created.

  • Replace-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created. 

Time (optional)

User-defined time value. It is applied if there is no time column in the query result.



For On-Demand Jobs from the CLI

For on-demand jobs, just add the --result option to the td query command. When the job is finished, the results are written into output_table within the testdb database.

$ td query --result 'td://@/testdb/output_table' \
  -w -d testdb \
  "SELECT code, COUNT(1) FROM www_access GROUP BY code"      

For Scheduled Jobs

For scheduled jobs, just add the --result option when scheduling a job. Every time the job runs, the results are written into output_table.

$ td result:create mydb td://@/testdb
$ td sched:create hourly_count_example "0 * * * *" \
  -d testdb "select count(*) from www_access" \
  --result mydb:output_table


Two Ways to Modify Data

You can add or delete data in two ways:

  • append

  • replace

td://@/testdb/output_table              # append
td://@/testdb/output_table?mode=append  # append
td://@/testdb/output_table?mode=replace # replace

Append

This is the default mode. The query results are appended to the table. If the table does not exist yet, a new table will be created. This operation is not idempotent; you cannot make that same call repeatedly and produce the same result. This mode is not atomic.

Replace

If the table already exists, the rows of the existing table are replaced with the query results. If the table does not exist yet, a new table will be created.

We achieve atomicity (so that a consumer of the table always has consistent data) by performing the following three steps in a single transaction.

  1. Create a temporary table.

  2. Write to the temporary table.

  3. Replace the existing table with the temporary table atomically.


  • No labels