Skip to content
Last updated

Scheduling A Data Connector Job Execution From The CLI

Scheduled execution supports additional configuration parameters that control the behavior of the data connector during its periodic attempts to fetch data from the integration source:

  • incremental This configuration is used to control the load mode, which governs how the data connector fetches data from the integration based on one of the native timestamp or numeric field associated with each object
  • incremental_columnn This configuration is used to define a based column to imported into Treasure Data. You can define only one column for this field. Suggested value is created, createdTimestamp, updated, updatedTimestamp
  • Timezone abbreviations like PST, CST are not supported and may lead to unexpected schedules.
  • The --timezone option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc.

Here’s an example of a load file using incremental mode

in:
  type: intg_type
  data_center: US1
  authentication_mode: key_secret
  application_key: your_application_user_key
  secret_key: your_application_secret_key
  api_key: your_api_key
  data_source: account
  batch_size: 1000
  query: SELECT * FROM table_name
  incremental: true
  incremental_column: created
filters:
- type: add_time
  from_value:
    mode: upload_time
  to_column:
    name: time

Create the Schedule

A new schedule can be created using the td connector:create command. The name of the schedule, cron-style schedule, the database and table where their data will be stored, and the data connector configuration file are required.

The cron parameter accepts these options: @hourly, @daily and @monthly.

By default, the schedule is setup in UTC timezone. You can set the schedule in a timezone using -t or --timezone option.

For example, you can create a scheduled job using the command td connector:create to run daily:

td connector:create connector_name @daily \
connector_database connector_table load.yml

It’s also recommended to specify the --time-column option, because Treasure Data’s storage is partitioned by time (see also data partitioning).

td connector:create daily_import \
"10 0 * * *" \
td_sample_db td_sample_table load.yml \
--time-column created_at

The cron parameter also accepts three special options: @hourly, @daily, and @monthly.

By default, the schedule is setup in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. --timezone option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

List All Schedules

You can see the list of currently scheduled entries by running the command td connector:list.

td connector:list

Show Schedule Settings and History

td connector:show shows the execution setting of a schedule entry.

td connector:show daily_import
Name     : daily_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table
Config
---
in:
  type: s3
  access_key_id: access_key_id
  secret_access_key: secret_access_key
  endpoint: endpoint
  bucket: bucket
  path_prefix: path_prefix
  parser:
    charset: UTF-8
    ...

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual run, use td job jobid.

td connector:history daily_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete Schedule

td connector:delete removes the schedule.

td connector:delete daily_import