Incremental loading is the activity of loading only new or updated records from a source into Treasure Data. Incremental loads are useful because they run efficiently when compared to full loads, and particularly for large data sets.

Incremental loading is available for many of the Treasure Data integrations. In some cases, it is a simple checkbox choice and in others, after you select incremental loading you are provided with other fields that must be specified. 

Limitations, Supported, Suggestions

  • For some integrations, if you choose incremental loading, you might need to make sure that there is an index on the columns to avoid a full table scan.
  • Only Timestamp, Datetime, and numerical columns are supported as incremental_columns.
  • For the raw query, the incremental_columns is required because it won't be able to detect the Primary keys for a complex query.

About Incremental Loading for Integrations

Treasure Data Incremental loading has 4 patterns (3 types of data connector + 1 workflow td_load operator.), then the 3 data connector loading examples are as follows:

  • Cloud storage service (e.g. AWS S3, GCS and etc.)

    • Lexicographic order of file name

  • Query (e.g. MySQL, BigQuery and etc.)

    • Date time

  • Variable period (Google Analytics, etc)

    • Use start_date for loading

Incremental Loading for Connectors

If incremental loading is selected, data for the connector is loaded incrementally.

This mode is useful when you want to fetch just the object targets that have changed since the previously scheduled run.

For example, in the UI:

Database integrations, such as MySQL, BigQuery, and SQL server, require column or field names to load incremental data. For example:

Learn more About Database-based Integrations.

  • No labels