Learn more about Google Cloud Storage Import Integration.

You can write job results directly to your Google Cloud Storage.

For sample workflows of how to export job results to Google Cloud Storage, view Treasure Boxes.

This topic contains:

Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

  • A Google Cloud Platform account with specific permissions

Obtain the Required Google Cloud Platform Credentials

To use this feature, you need the following information:

  • Google Project ID

  • JSON Credential

  • Storage Object Creator role is required to create an Object in the GCS bucket.

  • Storage Object Viewer is required to list Objects in the GCS bucket.

Obtain the Destination Bucket in Google Cloud Storage

List the Cloud Storage buckets. They are ordered in the list lexicographically by name. 

To list the buckets in a project:

  1. Open the Cloud Storage browser in the Google Cloud Console.

  2. Optionally, use filtering to narrow the results in your list.

Buckets that are part of the currently selected project appear in the browser list.

Optionally Create the Destination Bucket in Google Cloud Storage

To create a new storage bucket:

  1. Open the Cloud Storage browser in the Google Cloud Console.
  2. Click Create bucket to open the bucket creation form.
  3. Enter your bucket information and click Continue to complete each step:
    • Specify a Name, subject to the bucket name requirements.
    • Select a Location type and Location where the bucket data will be permanently stored.
    • Select a Default storage class for the bucket. The default storage class is assigned by default to all objects uploaded to the bucket.

      The Monthly cost estimate panel in the right pane estimates the bucket's monthly costs based on your selected storage class and location, as well as your expected data size and operations.

    • Select an Access control model to determine how you control access to the bucket's objects.
    • Optionally, you can add bucket labels, set a retention policy, and choose an encryption method.
  4. Click Create.

Obtain the Google JSON Credentials

The integration with Google Cloud Storage is based on server-to-server API authentication.

The Service Account used to generate the JSON Credentials must have Storage Object Creator permission and Storage Object Viewer permissions for the destination bucket.

  1. Visit your Google Developer Console.

  2. Select Credentials under APIs & auth at the left menu.

  3. Select Service account:


  4. Select the JSON-based key type that is Google’s recommended configuration. The key is automatically downloaded by the browser.



Use the TD Console to Create Your Connection

Create a New Connection

In Treasure Data, you must create and configure the data connection prior to running your query. As part of the data connection, you provide authentication to access the integration.

1. Open TD Console.
2. Navigate to Integrations Hub Catalog.
3. Search for and select Google Cloud Storage.

4. Select Create Authentication.
5. Type the credentials to authenticate.

6. Type a name for your connection.
7. Select Done.



Define your Query


  1. Complete the instructions in Creating a Destination Integration.
  2. Navigate to Data Workbench > Queries.

  3. Select a query for which you would like to export data.

  4. Run the query to validate the result set.

  5. Select Export Results.

  6. Select an existing integration authentication.
  7. Define any additional Export Results details. In your export integration content review the integration parameters.
    For example, your Export Results screen might be different, or you might not have additional details to fill out:
  8. Select Done.

  9. Run your query.

  10. Validate that your data moved to the destination you specified.


Integration Parameters for Google Cloud Storage


ParameterValuesDescription
Bucket

Path Prefix

Format

CSV

TSV


Compression

None

gz

bzip2


Header Line

Write the header line with the column name as the first line.

Delimiter

,

tab

|

Indicate whether a comma, tab, or pipe is used to separate the data in the file.
Null String

Defaults to empty string for csv, '\N' for tsv

End of Line Character

CRLF

LF

CR



Application Name
Arbitrary client name associated with API requests. For example, Treasure Data GCS Output.

Example Query

SELECT c0 AS EMAIL 
FROM e_1000 WHERE c0 != 'email'

Validating Export Results

Upon successful completion of the query, the results are automatically imported to the specified Google Cloud Storage destination:



Optionally Schedule the Query Export Jobs

You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify.


1. Navigate to Data Workbench > Queries.
2. Create a new query or select an existing query.
3. Next to Schedule, select None.

4. In the drop-down, select one of the following schedule options.

Drop-down ValueDescription
Custom cron...

Review Custom cron... details.

@daily (midnight)Run once a day at midnight (00:00 am) in the specified time zone.
@hourly (:00)Run every hour at 00 minutes.
NoneNo schedule.

Custom cron... Details

Cron Value

Description

0 * * * *

Run once an hour

0 0 * * *

Run once a day at midnight

0 0 1 * *

Run once a month at midnight on the morning of the first day of the month

""

Create a job that has no scheduled run time.

 *    *    *    *    *
 -    -    -    -    -
 |    |    |    |    |
 |    |    |    |    +----- day of week (0 - 6) (Sunday=0)
 |    |    |    +---------- month (1 - 12)
 |    |    +--------------- day of month (1 - 31)
 |    +-------------------- hour (0 - 23)
 +------------------------- min (0 - 59)

The following named entries can be used:

  • Day of Week: sun, mon, tue, wed, thu, fri, sat

  • Month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec

A single space is required between each field. The values for each field can be composed of:

Field ValueExampleExample Description

a single value, within the limits displayed above for each field.



a wildcard ‘*’ to indicate no restriction based on the field. 

‘0 0 1 * *’ configures the schedule to run at midnight (00:00) on the first day of each month.
a range ‘2-5’, indicating the range of accepted values for the field.‘0 0 1-10 * *’ configures the schedule to run at midnight (00:00) on the first 10 days of each month.
a list of comma-separated values ‘2,3,4,5’, indicating the list of accepted values for the field.

0 0 1,11,21 * *’


configures the schedule to run at midnight (00:00) every 1st, 11th, and 21st day of each month.
a periodicity indicator ‘*/5’ to express how often based on the field’s valid range of values a schedule is allowed to run.

‘30 */2 1 * *’


configures the schedule to run on the 1st of every month, every 2 hours starting at 00:30. ‘0 0 */5 * *’ configures the schedule to run at midnight (00:00) every 5 days starting on the 5th of each month.
a comma-separated list of any of the above except the ‘*’ wildcard is also supported ‘2,*/5,8-10’‘0 0 5,*/10,25 * *’configures the schedule to run at midnight (00:00) every 5th, 10th, 20th, and 25th day of each month.
5.  (Optional) If you enabled the Delay execution, you can delay the start time of a query.

Execute the Query

Save the query with a name and run, or just run the query. Upon successful completion of the query, the query result is automatically imported to the specified container destination.


Scheduled jobs that continuously fail due to configuration errors may be disabled on the system side after several notifications.



Optionally Configure Export Results in Workflow

Within Treasure Workflow, you can specify the use of this data connector to export data.

Learn more at Using Workflows to Export Data with the TD Toolbelt.

Example Workflow for Google Cloud Storage


timezone: UTC

+td-result-output-gcs:
  td>:
  query: SELECT * FROM www_access
  database: sample_datasets
  result_connection: YOUR_GCS_CONNECTION_NAME
  result_settings:
    bucket: BUCKET_NAME
    path_prefix: /filename.csv.gz
    format: csv
    header_line: true
    delimiter: ","
    null_string: ""
    newline: CRLF
    compression: 'gz'
  • No labels