Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Incremental loading is the activity of loading only new or updated records from a source into Treasure Data. Incremental loads are useful because they run efficiently when compared to full loads, and particularly for large data sets.

Incremental loading is available for many of the Treasure Data integrations. In some cases, it is a simple checkbox choice and in others, after you select incremental loading you are provided with other fields that must be specified. 

Limitations, Supported, Suggestions

  • For some integrations, if you choose incremental loading, you might need to make sure that there is an index on the columns to avoid a full table scan.
  • Only Timestamp, Datetime, and numerical columns are supported as incremental_columns.
  • For the raw query, the incremental_columns is required because it won't be able to detect the Primary keys for a complex query.

About Incremental Loading for Integrations

Treasure Data Incremental loading has 4 patterns (3 types of data connector + 1 workflow td_load operator.), then the 3 data connector loading examples are as follows:

  • Cloud storage service (e.g. AWS S3, GCS and etc.)

    • Lexicographic order of file name

  • Query (e.g. MySQL, BigQuery and etc.)

    • Date time

  • Variable period (Google Analytics, etc)

    • Use start_date for loading

Incremental Loading for Connectors

If incremental loading is selected, data for the connector is loaded incrementally.

This mode is useful when you want to fetch just the object targets that have changed since the previously scheduled run.

For example, in the UI:

Database integrations, such as MySQL, BigQuery, and SQL server, require column or field names to load incremental data. For example:

Learn more About Database-based Integrations.

Incremental Loading for Data Extensions

Treasure Data supports incremental loading for Data Extensions that have a date field.

If incremental loading is selected, the integration loads records according to the range specified by the from_date and the fetch_days for the specified date field.

The following list provides details about all available options:

  • target: Target object. For example, a Customer or Invoice that accepts a search operation. (string, required)

  • email: Email address. (string, required)

  • password: Password. (string, required)

  • account: Account ID. (string, required)

  • role: Role ID. (string, default: nil)

  • sandbox: Use sandbox environment if true. (bool, default: false)

  • from_datetime: Fetch data after this time. (string, default: nil)

Example

For example, you set up an incremental job with
Start Time = 2020-11-01T01:43:47.900Z and End Time = 2020-12-01T01:43:47.900Z

  • After the first run, the next start time will be calculated base on the precision setting.

    • HOURLY next start time will be 2020-12-01T02:43:47.900Z

    • DAILY next start time will be 2020-12-02T01:43:47.900Z

    • MONTHLY next start time will be 2021-01-01T01:43:47.900Z

  • And the End Time will be the current execution time.