Skip to content
Last updated

Dropbox Export Integration

This article explains how to export job results directly from Treasure Data to your Dropbox account.

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • A Dropbox account
  • Authorized Treasure Data account access

Use the TD Console to Create Your Connection

Create a New Connection

When you configure a data connection, you provide authentication to access the integration. In Treasure Data, you configure the authentication and then specify the source information.

  1. Open the TD Console.
  2. Navigate to the Integrations Hub > Catalog.
  3. Click the search icon on the far-right of the Catalog screen, and enter Dropbox.
  4. Hover over the Dropbox connector and select Create Authentication.
  5. The following dialog opens. Access to Treasure Data Dropbox requires OAuth2 authentication. The authentication requires that users manually connect their Treasure Data account to their respective Dropbox account.

To authenticate, select Click here to connect to a new account.

6. Log into your Dropbox account in the popup window and grant access to the Treasure Data app (You only need to do this the first time.)

7. You will be redirected back to TD Console. Repeat the first step (Create a new connection) and choose your new OAuth connection.

8. Name your new Dropbox Connection. Select Done.

Configure Output Results to the Data Connection

In this step, you create or reuse a query. In the query, you configure the data connection.

  1. Navigate to Data Workbench > Queries.
  2. Select the query that you plan to use to export data OR create a new query, if needed.
  3. Select Export Results to specify the results export target.

Specify the Result Export Target

  1. After selecting Export Results, the following dialog opens.
  2. Type the connection name in the search box to filter and select your Dropbox connection.

3. Edit all the information.

ParameterDescription
Path Root ModesPath Root Header Modes. Available options: home, root, namespace, refer  path-root-header-modes
Namespace IdShow and require if select namespace path root mode
Folder PathDirectory of the folder in Dropbox
File NameDestination file name without extension
FormatFile Extension
Encoders- None - no encoder is applied (default) - GZ - the file is compressed using gzip before being uploaded - BZIP2 - the file is compressed using bzip2 before being uploaded - PGP Encryption  - the file is encrypted  using the public key before being uploaded
Public Key Required WHEN Encoder Is PGP encryption- The public key is used to encrypt the file before being uploaded
Key Identifier Required WHEN Encoder Is PGP encryption- Specifies the Key ID of the encryption subkey used to secure the file. The master key is excluded from the encryption process.
Amor optional- Whether to use ASCII armor or not
Compression Type optional- Defines the compression algorithm used to compress the file, which will be compressed before encryption for uploading to the SFTP server. - Note: Please ensure that you compress your file before encrypting and uploading. When you decrypt it, the file will return to a compressed format such as .gz or .bz2.
Header lineSelect if the exported data has the column name as the header line.
Null StringUse this value to represent NULL values. Available options: - Default:select if an empty string ( '' ) is used to represent the Null value for file format CSV, and \N is used for the TSV format. - Empty string - \N NULL - null
End-of-line characterThe character at the end of lines. Available options are CRLF, LF, and CR
Quote PolicyAvailable options: - ALL: select if all values are enclosed by double quotes (""). - MINIMAL: select if any value that contains an embedded quote (") is presented with a consecutive pair of quotes (""). The MINIMAL quote policy is applied to CSV. - NONE: select if no escape for embedded quote is applied. By default, NONE is applied to TSV file format.
Max RetriesWhen an error occurs due to a Dropbox server error or network error, the number of retries attempted before aborting the upload.

Execute the Query

  1. Select Save to save the query with a name and run OR select Run to run the query.

2. After the query has run successfully, the query result is automatically imported into the specified container destination.

Example of a Query

SELECT email, first_name, last_name, region, age, gender, website FROM (
 VALUES ('kate1@gmail.com', 'Kate', 'Tiny', 'Asia', '41 and above', 'female', 'google.com'),
 ('ronan@gmail.com', 'R', 'P', 'Americas', '21-30', 'male', 'google.com'),
 ('michelle@gmail.com', 'M', 'C', 'EMEA', '31-40', 'male', 'facebook.com')
) tbl (email, first_name, last_name, region, age, gender, website);

(Optional) Schedule Query Export Jobs

You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify.

Treasure Data's scheduler feature supports periodic query execution to achieve high availability.

When two specifications provide conflicting schedule specifications, the specification requesting to execute more often is followed while the other schedule specification is ignored.

For example, if the cron schedule is '0 0 1 * 1', then the 'day of month' specification and 'day of week' are discordant because the former specification requires it to run every first day of each month at midnight (00:00), while the latter specification requires it to run every Monday at midnight (00:00). The latter specification is followed.

Scheduling your Job Using TD Console

  1. Navigate to Data Workbench > Queries

  2. Create a new query or select an existing query.

  3. Next to Schedule, select None.

  4. In the drop-down, select one of the following schedule options:

    Drop-down ValueDescription
    Custom cron...Review Custom cron... details.
    @daily (midnight)Run once a day at midnight (00:00 am) in the specified time zone.
    @hourly (:00)Run every hour at 00 minutes.
    NoneNo schedule.

Custom cron... Details

Cron ValueDescription
0 * * * *Run once an hour.
0 0 * * *Run once a day at midnight.
0 0 1 * *Run once a month at midnight on the morning of the first day of the month.
""Create a job that has no scheduled run time.
 *    *    *    *    *
 -    -    -    -    -
 |    |    |    |    |
 |    |    |    |    +----- day of week (0 - 6) (Sunday=0)
 |    |    |    +---------- month (1 - 12)
 |    |    +--------------- day of month (1 - 31)
 |    +-------------------- hour (0 - 23)
 +------------------------- min (0 - 59)

The following named entries can be used:

  • Day of Week: sun, mon, tue, wed, thu, fri, sat.
  • Month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec.

A single space is required between each field. The values for each field can be composed of:

Field ValueExampleExample Description
A single value, within the limits displayed above for each field.
A wildcard '*' to indicate no restriction based on the field.'0 0 1 * *'Configures the schedule to run at midnight (00:00) on the first day of each month.
A range '2-5', indicating the range of accepted values for the field.'0 0 1-10 * *'Configures the schedule to run at midnight (00:00) on the first 10 days of each month.
A list of comma-separated values '2,3,4,5', indicating the list of accepted values for the field.0 0 1,11,21 * *'Configures the schedule to run at midnight (00:00) every 1st, 11th, and 21st day of each month.
A periodicity indicator '*/5' to express how often based on the field's valid range of values a schedule is allowed to run.'30 */2 1 * *'Configures the schedule to run on the 1st of every month, every 2 hours starting at 00:30. '0 0 */5 * *' configures the schedule to run at midnight (00:00) every 5 days starting on the 5th of each month.
A comma-separated list of any of the above except the '*' wildcard is also supported '2,*/5,8-10'.'0 0 5,*/10,25 * *'Configures the schedule to run at midnight (00:00) every 5th, 10th, 20th, and 25th day of each month.
  1. (Optional) You can delay the start time of a query by enabling the Delay execution.

Activate a Segment in Audience Studio

You can also send segment data to the target platform by creating an activation in the Audience Studio.

  1. Navigate to Audience Studio.
  2. Select a parent segment.
  3. Open the target segment, right-mouse click, and then select Create Activation.
  4. In the Details panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters.
  5. Customize the activation output in the Output Mapping panel.

  • Attribute Columns
    • Select Export All Columns to export all columns without making any changes.
    • Select + Add Columns to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select + Add Columnsto add new columns for your activation output.
  • String Builder
    • + Add string to create strings for export. Select from the following values:
      • String: Choose any value; use text to create a custom value.
      • Timestamp: The date and time of the export.
      • Segment Id: The segment ID number.
      • Segment Name: The segment name.
      • Audience Id: The parent segment number.
  1. Set a Schedule.

  • Select the values to define your schedule and optionally include email notifications.
  1. Select Create.

If you need to create an activation for a batch journey, review Creating a Batch Journey Activation.

Optional: Configure Export Results in Workflow

Within Treasure Workflow, you can specify the use of this data connector to output data.

timezone: UTC

_export:
  td:
 database: sample_datasets

+td-result:
  td>: queries/sample.sql
  result_connection: your_connection_name
  result_settings:
   access_token: ###
   path_root_mode: namespace | home | root (default: home)
   namespace_id: 1234567
   file_name: file01
   folder_path: /abc
   replace_existing: false
   format: csv
   compression: gz
   header_line: true
   null_string: default
   newline: CRLF
   quote_policy: null
   retry_count: 5
   retry_initial_wait_millis: 1000
   max_retry_wait_millis: 30000

Select here for more information on using data connectors in the workflow to export data.

Use the CLI to create your connection

Install ‘td’ command

Install the Treasure Data Toolbelt.

For On-demand Jobs

Add the Dropbox result output destination by using the r/ -result option for the td query command:

td query -d sample_datasets -w 'SELECT host, path FROM www_access' --type presto -r '{"type":"dropbox", "access_token":"your_token","folder_path":"/abc","file_name":"test_file","replace_existing":null,"format":"csv","compression":"gz","header_line":true,"null_string":"default","newline":"CRLF","quote_policy":null,"retry_count":"5","retry_initial_wait_millis":"1000","max_retry_wait_millis":"300000"}'

For Scheduled Jobs

Add the Dropbox result output destination by using the r/ -result option for the td sched:create command:

td sched:create every_6_mins "*/6 * * * *" -d test_db -w 'SELECT id, via FROM table1' --type presto -r '{"type":"dropbox", "access_token":"your_token","folder_path":"/abc","file_name":"test_file","replace_existing":null,"format":"csv","compression":"gz","header_line":true,"null_string":"default","newline":"CRLF","quote_policy":null,"retry_count":"5","retry_initial_wait_millis":"1000","max_retry_wait_millis":"300000"}'

Appendix

FAQ for Export into Dropbox

Q: Dropbox file size limits?

  • All files uploaded to your Dropbox must be smaller than your storage space. For example, if your account has a storage quota of 2 GB, you can upload one 2 GB file or many files that add up to 2 GB. If you are over your storage quota, Dropbox will stop syncing.
  • Files uploaded through must be 350GB or smaller.

Q: How to Identify Root and Home Namespaces

  • To access files on users’ namespaces, first you need to retrieve the namespace IDs from the API. This is done with the endpoint /users/get_current_account.