Skip to content
Last updated

About Database Based Integrations

If you decide to use incremental loading, you should determine if you need to create an index on the columns to avoid a full table scan. For example, the following index could be created:

CREATE INDEX embulk_incremental_loading_index ON TABLE (updated_at, id);

In some cases, the recommendation is to leave incremental_columns unset and let the integration automatically find an AUTO_INCREMENT primary key.

If incremental loading is not chosen, the integration fetches all the records of the specified integration object type, regardless of when they were last updated. This mode is best combined with writing data into a destination table using the ‘replace’ mode.

When using incremental loadingfor some integrations the following behavior occurs:

If incremental_columns: [updated_at, id] option is set, the query is as follows:

SELECT *FROM (...original query IS here...)ORDER BY updated_at, id

When bulk data loading finishes successfully, it outputs last_record: parameter as config-diff so that the next execution uses it.
At the next execution, when last_record: is also set, and generates an additional WHERE clause to load records larger than the last record. For example, if last_record: ["2017-01-01T00:32:12.000000", 5291] is set:

SELECT *FROM (...original query IS here...)WHERE updated_at > '2017-01-01T00:32:12.000000'  OR (updated_at = '2017-01-01T00:32:12.000000'      AND id > 5291)  ORDER BY updated_at,           id

Then, it updates last_record: so that the next execution uses the updated last_record.

For other integrations, the following behavior occurs:

SELECT ...FROM `target_table`WHERE ((`col1` > ?))

The bind variable (? literal in the query above) includes the maximum value of the previous execution. Larger values are recognized as the difference as to the target column (col1).