Skip to content
Last updated

Google Bigquery Import Integration CLI

If you prefer, you can use the connector via TD Toolbelt.

Set up the TD Toolbelt on the CLI.

Create Configuration File

Create configuration YAML file that is referred to as "config.yml" here.

Example (config.yml)

in:
  type: bigquery
  project_id: my-project
  auth_method: json_key
  json_keyfile:
    content: |
      {
        "type": "service_account",
        "project_id": "xxxxxx",
        ...
       }
  import_type: table
  dataset: my_dataset
  table: my_table
  incremental: true
  incremental_columns: [id]
  export_to_gcs: true
  temp_dataset: temp
  temp_table: temp_table
  gcs_bucket: my-bucket
  gcs_path_prefix: data-connector/result-
out:
  type: td

Authentication of GCP

JSON Key

Specify "auth_method: json_key" and put a JSON content of your service account key into "json_keyfile**.**content"

auth_method: json_key
json_keyfile:
  content: |
    {
      "type": "service_account",
      "project_id": "xxxxxx",
      ...

OAuth

If you want to use authorized account by your OAuth 2 application, specify "auth_method: oauth2", "client_id", "client_secret" and "refresh_token"

auth_method: oauth2
client_id: 000000000000-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
client_secret: yyyyyyyyyyyyyyyyyyyyyyyy
refresh_token: zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

Import Type

Table Loading

With table loading, specify "import_type: table", "dataset" and "table"

import_type: table dataset: my_dataset table: my_table

Query Loading

With query loading, specify "import_type: query" and "query"

import_type: query
query: |-
  SELECT
    id, first_name, last_name, created_at
  FROM
    my_dataset.my_table
  WHERE first_name = "Treasure"

You can optionally specify "query_option". "use_leagacy_sql" is false by default and "use_query_cache" is true by default.

query: SELECT ...
query_option:
  use_legacy_sql: false
  use_query_cache: true 

Data Location

You can specify the location by "location" if necessary

location: asia-northeast1

Incremental Loading

To enable it, specify "incremental: true" and "incremental_columns"

incremental: true
incremental_columns: [id]

Import Large Dataset

To enable it, specify "export_to_gcs: true", then add "temp_dataset", "temp_table", "gcs_bucket" and "gcs_path_prefix"

export_to_gcs: true
temp_dataset: temp
temp_table: temp_table
gcs_bucket: my-bucket
gcs_path_prefix: data-connector/result-

(Optional) Preview

Run td connector:preview command to validate your configuration file

$ td connector:preview config.yml

Create New Connector Session

Run td connector:create.

By the following example, a daily import session with BigQuery connector is created.

$ td connector:create daily_bigquery_import \
    "10 0 * * *" td_sample_db td_sample_table config.yml

Data Partition Key

Connector sessions need at least one timestamp column in result data to be used as data partition key and the first timestamp column is chosen as the key by default. Use "--time-column" option if you want to explicitly specify a column.

$ td connector:create --time-column created_at \
    daily_bigquery_import ...

If your result data doesn't have any timestamp column, add the "time" column by adding the filter configuration as follows.

in:
  type: bigquery
  ...
filters:
- type: add_time
  from_value:
    mode: upload_time
  to_column:
    name: time
out:
  type: td