# Google Bigquery Import Integration V2

[Learn more about Google BigQuery Export Integration V2](/int/google-bigquery-export-integration-v2).

The integration for Google BigQuery V2 enables the import of data from your BigQuery tables or from query results into Treasure Data.

## Prerequisites

- Basic knowledge of Treasure Data
- Basic knowledge of Google Cloud Platform (BigQuery, Cloud Storage and IAM)


## Limitations

- OAuth is no longer supported for this connector. Only JSON keyfileis supported.
- If your dataset is in a location other than the US or EU multi-region, you need to specify the location. Otherwise, your job in TD will fail with an error: Cannot find job_id xxxxx.
- This connector doesn't support importing [external tables](https://cloud.google.com/bigquery/external-data-sources).
- Import from view must use with import type query instead of table
- Cannot use export to GCS in case table or query contain field that have data type Range. Otherwise error `Unsupported type for JSON export: RANGE`
- When import type query from partition table that was enable the **Require partition filter** option. Then sql statement must have partition field in WHERE clause. If not an error will return `Cannot query over table 'xxxxx' without a filter over column(s) '{partition field}' that can be used for partition elimination`
- When import type query and use legacy, source table that contains fields with data types as `INTERVAL, TIMESTAMP(12), JSON, or RANGE` will not support and return error: `Querying tables with INTERVAL, TIMESTAMP(12), JSON, or RANGE type is not supported in Legacy SQL`
- Don’t support incremental for Partition Table that was enable the **Require partition filter** option in import mode table but can support in import mode query


## Permissions and Roles

For use of this data connector, authorized accounts (service account) must have the following permissions or IAM roles.

| **Category** | **Required permissions** | **Minimum IAM roles** |
|  --- | --- | --- |
| **To use table loading** | - bigquery.tables.get - bigquery.tables.getData | - BigQuery Data Viewer |
| **To use query loading** | - bigquery.jobs.create | - BigQuery Job User |
| **To use "Import Large Dataset"** | - bigquery.tables.export - bigquery.tables.delete - storage.buckets.get - storage.objects.list - storage.objects.create - storage.objects.delete - storage.objects.get | - BigQuery Data Editor - Storage Legacy Bucket Writer - Storage Legacy Object Reader |


For more details about IAM permissions and roles see the Google Cloud documents: [BigQuery](https://cloud.google.com/bigquery/docs/access-control) and [Cloud Storage](https://cloud.google.com/storage/docs/access-control/iam-roles).

## Use TD Console

### Create a New Authentication

1. Go to **Integrations Hub** > **Catalog**.
2. Search and select **Google BigQuery**.
3. A dialog will open. Only **JSON keyfile** is supported.                                            
![](/assets/screenshot-2025-06-19-at-19.36.07.72bff3697a33d042c422ef4851450b60f0c04ea69082df8ca3d3924e42c846f9.743d8bf4.png)
4. Provide the JSON string of your service account key into the "JSON keyfile" section.
5. See [the Google Cloud document](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) to create a new service account key.


### Validate your Query

If you want to load a SQL result, select "Query statement", then input an SQL query into the "SQL statement".
Before creating a transfer, confirm that your query is valid in the BigQuery Web UI.  [https://cloud.google.com/bigquery/quickstart-web-ui](https://cloud.google.com/bigquery/quickstart-web-ui)

### Create a New Transfer

After creating the connection, you are automatically taken to the Authentications tab. Look for the connection you created and select Source.

#### Fetch from

Configure the data source to import.
Input an ID of your Google Cloud Platform project into "Project ID".
Input a dataset name where store your table

#### Import Types

Select a type of import, either loading a whole table (**table loading**) or loading a SQL result (**query loading**).

#### Table Loading

If you want to load a whole table, select "Table", then provide the "Table name" that you want to export.

If you want to load a materialized view, please select "Query Statement" instead.

![](/assets/screenshot-2025-06-19-at-19.45.09.e970ef1f4a21796c734e018a235d92aaf68068158b17b6041490dab27aaca6f3.743d8bf4.png)

#### Query Loading

If you want to load a SQL result, select "Query statement", then input an SQL query into the "SQL statement".

![](/assets/screenshot-2025-06-19-at-19.46.50.00c1474f54b84589a4ad38635d7da6d358a48a2b586b80a8356c27a183403ed0.743d8bf4.png)

The default SQL dialect is [Standard SQL](https://cloud.google.com/bigquery/docs/reference/standard-sql/). Check **Use Legacy SQL** if you want to use [Legacy SQL](https://cloud.google.com/bigquery/docs/reference/legacy-sql).

By default, this connector uses [cached result](https://cloud.google.com/bigquery/docs/cached-results) under specific conditions. Uncheck **Use Cached Results** if you want to disable caching.

User Defined Function List: the functions that  use in your query

![](/assets/screenshot-2025-06-19-at-20.20.11.64913b55b52cfa6d9211513592169ef61dd410afcb41757d319fd39ff24d1700.113453a0.png)

For legacy SQL follow: [https://cloud.google.com/bigquery/docs/user-defined-functions-legacy#register](https://cloud.google.com/bigquery/docs/user-defined-functions-legacy#register)

For Standard SQL follow: [https://cloud.google.com/bigquery/docs/user-defined-functions#sql-udf-structure](https://cloud.google.com/bigquery/docs/user-defined-functions#sql-udf-structure)

The query can be error if the UDF already exist.

### Data Location

You must specify the location if your data is in a location other than the US or EU multi-region.

You must specify the location when your data in the asia-northeast1 region.

See the [Google Cloud document](https://cloud.google.com/bigquery/docs/dataset-locations) for more details about the location.

### Incremental Loading

Incremental loading can load only new records after last execution by using increasing, unique column(s), such as an auto-increment ID column or timestamp column for the created date.

To enable it, check **Incremental Loading**, then specify column name to increment into "Incremental Column Name."
Only INTEGER type and TIMESTAMP type are supported as an incremental column.

#### How This Works

This connector records "last record" which is the latest record ordered by the incremental column. In the next execution, it loads records by running a query built by the following rule using the last record:

With table loading, all fields are selected with the WHERE clause.


```sql
SELECT   * FROM   `${dataset}.${table}` WHERE   ${incremental_column} > ${value_of_last_record}
```

With query loading, the raw query is wrapped with the WHERE clause.


```sql
SELECT   * FROM   (${query}) embulk_incremental_ WHERE   ${incremental_column} > ${value_of_last_record}
```

### Import Large Dataset

When you load a large dataset (more than 500MB as a benchmark), we recommend that you use this "Import Large Dataset" option. This option exports the data as GCS (Google Cloud Storage) objects and loads the data in multiple tasks. Hence, loading is faster.

To enable this option, check **Import Large Dataset** then specify "GCS bucket" and "GCS path prefix".

#### How This Works

- When running a query (query loading or table loading with incremental loading), the query result is exported to a temporary BigQuery table in your dataset config.
- Then the temp table is exported to "*gs://my-bucket/data-connector/result-[12 digists number].jsonl.gz*" as gzipped [JSON Lines](http://jsonlines.org/) files to Google Cloud Storage Bucket with path prefix. The number of files depends on the size of result data.
- With a table loading without incremental loading, all the data in the source table is directly exported to GCS.
- After completion, the temp table and GCS objects are deleted.


#### Data Location of Temp Resources

- GCS bucket must be also in the same location as the tables unless the dataset is set to "US." You can export data from a US-based dataset to a Cloud Storage bucket in another region. See the Google Cloud [export limitations document](https://cloud.google.com/bigquery/docs/exporting-data#export_limitations) for more details.


### Data Preview

You can see a [preview](/products/customer-data-platform/integration-hub/batch/import/previewing-your-source-data) of your data before running the import by selecting Generate Preview. Data preview is optional and you can safely skip to the next page of the dialog if you choose to.

1. Select **Next**. The Data Preview page opens.
2. If you want to preview your data, select **Generate Preview**.
3. Verify the data.


### Data Placement

For data placement, select the target database and table where you want your data placed and indicate how often the import should run.

1. Select **Next.** Under Storage, you will create a new or select an existing database and create a new or select an existing table for where you want to place the imported data.
2. Select a **Database** > **Select an existing** or **Create New Database**.
3. Optionally, type a database name.
4. Select a **Table**> **Select an existing** or **Create New Table**.
5. Optionally, type a table name.
6. Choose the method for importing the data.
  - **Append** (default)-Data import results are appended to the table.
If the table does not exist, it will be created.
  - **Always Replace**-Replaces the entire content of an existing table with the result output of the query. If the table does not exist, a new table is created.
  - **Replace on New Data**-Only replace the entire content of an existing table with the result output when there is new data.
7. Select the **Timestamp-based Partition Key** column.
If you want to set a different partition key seed than the default key, you can specify the long or timestamp column as the partitioning time. As a default time column, it uses upload_time with the add_time filter.
8. Select the **Timezone** for your data storage.
9. Under **Schedule**, you can choose when and how often you want to run this query.


#### Run once

1. Select **Off**.
2. Select **Scheduling Timezone**.
3. Select **Create & Run Now**.


#### Repeat Regularly

1. Select **On**.
2. Select the **Schedule**. The UI provides these four options: *@hourly*, *@daily* and *@monthly* or custom *cron*.
3. You can also select **Delay Transfer** and add a delay of execution time.
4. Select **Scheduling Timezone**.
5. Select **Create & Run Now**.


After your transfer has run, you can see the results of your transfer in **Data Workbench** > **Databases.**

## Data Conversion

BigQuery's data types are automatically converted to a corresponding Treasure Data type, as indicated in the following table. If you include unsupported types in the schema of the table or query result, you receive errors.

| BigQuery | Treasure Data |
|  --- | --- |
| STRING | string |
| BYTES | string |
| INTERVAL | string |
| RANGE | string |
| GEOGRAPHY | string |
| INTEGER | long |
| FLOAT | double |
| NUMERIC | string |
| BIGNUMERIC | string |
| BOOLEAN | long (true is 1, false is 0) |
| TIMESTAMP | string  (yyyy-MM-dd HH:mm:ss.SSS) |
| DATE | string |
| TIME | string |
| DATETIME | string |
| RECORD | string (as JSON) |
| REPEATED (PRIMITIVE or RECORD) | string (as JSON) |


## Quotas and Limits

Any quotas and limits of [BigQuery and Cloud Storage](https://cloud.google.com/bigquery/quotas) are applied to your GCP project.

## Use BigQuery Connector via CLI

If you prefer, you can use the connector via [TD Toolbelt](https://toolbelt.treasuredata.com/).

Set up [TD](https://api-docs.treasuredata.com/en/tools/cli/quickstart/) [Tool](https://api-docs.treasuredata.com/en/tools/cli/quickstart/)[belt on the CLI](https://api-docs.treasuredata.com/en/tools/cli/quickstart/).

### Create Configuration File

Create configuration YAML file that is referred to as "config.yml" here.

- Example import from table with export to GCS and incremental


```yaml
in:
  type: bigquery_v2
  json_keyfile:
    content: |
      {
      xxxxxxxx
      }
  project_id: xxxx
  dataset: xxx
  import_type: table
  table: xxxx
  location: US
  export_to_gcs: true
  gcs_bucket: xxxx
  gcs_path_prefix: xxxx
  incremental: true
  incremental_column: xxx
```

### Example import from Query with user defined function and export to GCS and incremental


```yaml
in:
  type: bigquery_v2
  json_keyfile:
    content: |
      {
      xxxxxxxx
      }
  project_id: xxxx
  dataset: xxx
  import_type: query
  query: xxxxxxx
  location: US
  udf:
    - function: |
        CREATE TEMP FUNCTION addTwo(x INT64)
        RETURNS INT64
        AS ( x + 2 )
    - function: |
        CREATE TEMP FUNCTION addTwo(x INT64)
        RETURNS INT64
        AS ( x + 2 )
  location: US
  export_to_gcs: true
  gcs_bucket: xxxx
  gcs_path_prefix: xxxx
  incremental: true
  incremental_column: xxx
```

### Parameters

| Name | Description | Type | Value | Default Value | Required |
|  --- | --- | --- | --- | --- | --- |
| type | connector type | string | bigquery_v2 | N/A | Yes |
| json_keyfile | Google service account JSON key | Object with content property  Ex:  json_keyfile:    content: | xxxxxxxx | N/A | N/A |
| project_id | BigQuery project ID | string | N/A | N/A | Yes |
| dataset | BigQuery dataset | string | N/A | N/A | Yes |
| import_type | Source import | string | Support values:   - table - query | table | Yes |
| table | Table Name | string | N/A | N/A | Yes if import_type is table |
| query | SQL statement | string | N/A | N/A | Yes if import_type is query |
| udf | User Defined Function List | Array of function  Ex:  udf:    - function: | xxxxxxx    - function: | xxxxxxxxx | N/A |
| use_legacy_sql | Use legacy SQL dialect | boolean | true/false | false | No |
| use_query_cache | Use cached results | boolean | true/false | true | No |
| location | Location (region) of your dataset | string | Supported values:  UNSPECIFIED and list value from  https://cloud.google.com/bigquery/docs/dataset-location | UNSPECIFIED | No |
| incremental | Whether to enable incremental loading | boolean | true/false | false | No |
| incremental_column | Column name for incremental loading | string | N/A | N/A | Yes if incremental is true |
| export_to_gcs | Whether to use export to GCS feature | boolean | true/false | true | No |
| gcs_bucket | GCS bucket to export result | string | N/A | N/A | Yes if export_to_gcs is true |
| gcs_path_prefix | Prefix of file path of GCS files | string | N/A | N/A | Yes if export_to_gcs is true |


(Optional) Preview

Run td [td connector:preview](https://api-docs.treasuredata.com/en/tools/cli/api/#td-connector-preview) command to validate your configuration file


```bash
td connector:preview config.yml
```

### Create New Connector Session

Run [td connector:create](https://api-docs.treasuredata.com/en/tools/cli/api/#td-connector-create).

By the following example, a daily import session with BigQuery connector is created.


```bash
$ td connector:create daily_bigquery_import \     "10 0 * * *" td_sample_db td_sample_table config.yml Name     : daily_bigquery_import Cron     : 10 0 * * * Timezone : UTC Delay    : 0 Database : td_sample_db Table    : td_sample_table Config --- in:   ...
```

### Data Partition Key

Connector sessions need at least one timestamp column in result data to be used as data partition key and the first timestamp column is chosen as the key by default. Use "**--time-column**" option if you want to explicitly specify a column.


```bash
$ td connector:create --time-column created_at \     daily_bigquery_import ...
```

If your result data doesn't have any timestamp column, add the "**time**" column by adding the filter configuration as follows.


```yaml
in:
  type: bigquery
  ...
filters:
  - type: add_time
    from_value:
      mode: upload_time
    to_column:
      name: time
out:
  type: td
```