Skip to content
Last updated

Google Cloud Storage Export Integration

You can write job results directly to your Google Cloud Storage. For the Import Integration, see Google Cloud Storage Import Integration.

Prerequisites

  • Basic knowledge of Treasure Data, including TD Toolbelt.
  • A Google Cloud Platform account with specific permissions

Static IP Address of Treasure Data Integration

If your security policy requires IP whitelisting, you must add Treasure Data's IP addresses to your allowlist to ensure a successful connection.

Please find the complete list of static IP addresses, organized by region, at the following link:
https://api-docs.treasuredata.com/en/overview/ip-addresses-integrations-result-workers/

Obtain the Required Google Cloud Platform Credentials

To use this feature, you need the following:

  • Google Project ID
  • JSON Credential
  • Storage Object Creator role is required to create an Object in the GCS bucket.
  • Storage Object Viewer is required to list Objects in the GCS bucket.

Obtain the Destination Bucket in Google Cloud Storage

List the Cloud Storage buckets. They are ordered in the list lexicographically by name.

To list the buckets in a project:

  1. Open the Cloud Storage browser in the Google Cloud Console.
  2. Optionally, use filtering to narrow the results in your list.

Buckets that are part of the currently selected project, appear in the browser list.

Optionally Create the Destination Bucket in Google Cloud Storage

To create a new storage bucket:

  1. Open the Cloud Storage browser in the Google Cloud Console.
  2. Select Create bucket to open the bucket creation form.

  1. Enter your bucket information and select Continue to complete each step:
    • Specify a Name, subject to the bucket name requirements.

    • Select a Location type and Location where the bucket data will be permanently stored.

    • Select a Default storage class for the bucket. The default storage class is assigned by default to all objects uploaded to the bucket.

      The Monthly cost estimate panel in the right pane estimates the bucket's monthly costs based on your selected storage class and location, as well as your expected data size and operations.

    • Select an Access control model to determine how you control access to the bucket's objects.

    • Optionally, you can add bucket labels, set a retention policy, and choose an encryption method.

  2. Select Create.

Obtain the Google JSON Credentials

The integration with Google Cloud Storage is based on server-to-server API authentication.

The Service Account used to generate the JSON Credentials must have Storage Object Creator permission and Storage Object Viewer permissions for the destination bucket.

  1. Visit your Google Developer Console.

  2. Select Credentials under APIs & auth at the left menu.

  3. Select Service account:

  4. Select the JSON-based key type that is Google’s recommended configuration. The key is automatically downloaded by the browser.

Use the TD Console to Create Your Connection

Create a New Authentication

In Treasure Data, you must create and configure the data connection before running your query. As part of the data connection, you provide authentication to access the integration.

  1. Open TD Console.
  2. Navigate to Integrations Hub > Catalog.
  3. Search for and select Google Cloud Storage.
  4. Select Create Authentication.
  5. Type the credentials to authenticate.
  6. Type a name for your connection.
  7. Select Continue.

Define your Query

  1. Complete the instructions in Creating a Destination Integration.
  2. Navigate to Data Workbench > Queries.
  3. Select a query for which you would like to export data.
  4. Run the query to validate the result set.
  5. SelectExport Results.
  6. Select an existing integration authentication. 
  7. Define any additional Export Results details. In your export integration content, review the integration parameters. For example, your Export Results screen might be different, or you might not have additional details to fill out.
  8. Select Done.
  9. Run your query.
  10. Validate that your data moved to the destination you specified.

Integration Parameters for Google Cloud Storage

ParameterValuesDescription
bucketDestination Google Cloud Storage bucket name (string, required).
path_prefixObject path prefix, including the filename (string, required). Example: /path/to/filename.csv.
content_typeMIME type of the output file (string, optional). Default: application/octet-stream.
formatcsv, tsvOutput file format (string, required).
compressionnone, gz, bzip2, zip_builtin, zlib_builtin, bzip2_builtinCompression applied to the exported file (string, optional). Default: none.
header_linetrue, falseWrite the header line with column names as the first line (boolean, optional). Default: true.
delimiter,, \t, ``, single-byte character
null_stringSubstitution string for NULL values (string, optional). Default: empty string for CSV, \N for TSV.
end_of_line_characterCRLF, LF, CRLine termination character (string, optional). Default: CRLF.

Example Query

SELECT 
  c0 AS EMAIL
FROM 
  e_1000 
WHERE c0 != 'email'

Validating Export Results

Upon successful completion of the query, the results are automatically imported to the specified Google Cloud Storage destination:

Activate a Segment in Audience Studio

You can also send segment data to the target platform by creating an activation in the Audience Studio.

  1. Navigate to Audience Studio.
  2. Select a parent segment.
  3. Open the target segment, right-mouse click, and then select Create Activation.
  4. In the Details panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters.
  5. Customize the activation output in the Output Mapping panel.

  • Attribute Columns
    • Select Export All Columns to export all columns without making any changes.
    • Select + Add Columns to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select + Add Columnsto add new columns for your activation output.
  • String Builder
    • + Add string to create strings for export. Select from the following values:
      • String: Choose any value; use text to create a custom value.
      • Timestamp: The date and time of the export.
      • Segment Id: The segment ID number.
      • Segment Name: The segment name.
      • Audience Id: The parent segment number.
  1. Set a Schedule.

  • Select the values to define your schedule and optionally include email notifications.
  1. Select Create.

If you need to create an activation for a batch journey, review Creating a Batch Journey Activation.

Exporting Data from Google Cloud Storage CLI

The following command allows you to set a scheduled query that sends query results to Google Cloud Storage.

  • Specify your JSON keys in the following sample syntax.
  • Use backslash to break a line without breaking the code syntax.
'{"type":"gcs","bucket":"samplebucket","path_prefix":"/output/test.csv","format":"csv","compression":"","header_line":false,"delimiter":",","null_string":"","newline":"CRLF""json_keyfile":"{\"private_key_id\": \"ABCDEFGHIJ\", \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nABCDEFGHIJ\\ABCDEFGHIJ\\n-----END PRIVATE KEY-----\\n\", \"client_email\": \"ABCDEFGHIJ@developer.gserviceaccount.com\", \"client_id\": \"ABCDEFGHIJ.apps.googleusercontent.com\", \"type\": \"service_account\"}"}'

For example,

$ td sched:create scheduled_gcs "10 6 * * *" \
-d dataconnector_db "SELECT id,account,purchase,comment,time FROM data_connectors" \
-r '{"type":"gcs","bucket":"samplebucket","path_prefix":"/output/test.csv","format":"csv","compression":"","header_line":false,"delimiter":",","null_string":"","newline":"CRLF",  "json_keyfile":"{\"private_key_id\": \"ABCDEFGHIJ\", \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nABCDEFGHIJ\\ABCDEFGHIJ\\n-----END PRIVATE KEY-----\\n\", \"client_email\": \"ABCDEFGHIJ@developer.gserviceaccount.com\", \"client_id\": \"ABCDEFGHIJ.apps.googleusercontent.com\", \"type\": \"service_account\"}"}'

Options

OptionValues
formatcsv or tsv
compression"" or gz
null_string"" or \N
newlineCRLF, CR, or LF
json_keyfileEscape newline \n with a backslash

Other configurations

  • The Result Export can be scheduled to upload data to a target destination periodically.
  • All import and export integrations can be added to a TD Workflow. The td data operator can be used to export a query result to a specified connector. For more information, see Reference for Treasure Data Operators.

References

The Embulk-encoder-Encryption document

FAQ for the GCS Data Connector

Note: Please ensure that you compress your file before encrypting and uploading.

  1. When you decrypt using non-built-in encrypti on, the file will return to a compressed format such as .gz or .bz2.

  2. When you decrypt using built-in encrypti on, the file will return to raw data.