You can schedule periodic data connector execution for incremental file import. We configure our scheduler carefully to ensure high availability.

For the scheduled import, you can import all files that match the specified prefix and one of these fields by condition:

  • If use_modified_time is disabled, the last path is saved for the next execution. On the second and subsequent runs, the connector only imports files that come after the last path in alphabetical order.

  • Otherwise, the time that the job is executed is saved for the next execution. On the second and subsequent runs, the connector only imports files that were modified after that execution time in alphabetical order.

Create a Schedule Using the TD Toolbelt

A new schedule can be created using the td connector:create command.

$ td connector:create daily_import "10 0 * * *" \
    td_sample_db td_sample_table load.yml

It’s also recommended to specify the --time-column option, because Treasure Data’s storage is partitioned by time (see also data partitioning).

$ td connector:create daily_import "10 0 * * *" \
    td_sample_db td_sample_table load.yml \
    --time-column created_at

The `cron` parameter also accepts three special options: `@hourly`, `@daily`, and `@monthly`.

By default, the schedule is setup in the UTC timezone. You can set the schedule in a timezone using -t or --timezone option. `--timezone` option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles' etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.

List All Schedules

You can see the list of currently scheduled entries by running the command td connector:list.

$ td connector:list
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------+
| Name         | Cron         | Timezone | Delay | Database     | Table           | Config                                   |
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------+
| daily_import | 10 0 * * *   | UTC      | 0     | td_sample_db | td_sample_table | {"in"=>{"type"=>"s3", "access_key_id"... |
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------+

Show Schedule Settings and History

td connector:show shows the execution setting of a schedule entry.

Where:



<access_key_id>

Allows you to access the TD AWS Services

<secret_access_key>

Allows you to access the TD AWS Services

<endpoint>

A computer that communicates back and forth with a network

Example value: s3.amazonaws.com

<bucket>

Container object within a database

Example value: https://my-bucket.s3.us-west-2.amazonaws.com.

<path_prefix>

Specify a prefix for target keys

Example values:

logging/

path/to/sample_201501.csv.gz, path/to/sample_201502.csv.gz, …, path/to/sample_201505.csv.gz


% td connector:show daily_import
Name     : daily_import
Cron     : 10 0 * * *
Timezone : UTC
Delay    : 0
Database : td_sample_db
Table    : td_sample_table
Config
---
in:
  type: s3
  access_key_id: <access_key_id>
  secret_access_key: <secret_access_key>
  endpoint: <endpoint>
  bucket: <bucket>
  path_prefix: <path_prefix>
  parser:
    charset: UTF-8
    ...

td connector:history shows the execution history of a schedule entry. To investigate the results of each individual run, use td job <jobid>.

% td connector:history daily_import
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| JobID  | Status  | Records | Database     | Table           | Priority | Started                   | Duration |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
| 578066 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-18 00:10:05 +0000 | 160      |
| 577968 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-17 00:10:07 +0000 | 161      |
| 577914 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-16 00:10:03 +0000 | 152      |
| 577872 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-15 00:10:04 +0000 | 163      |
| 577810 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-14 00:10:04 +0000 | 164      |
| 577766 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-13 00:10:04 +0000 | 155      |
| 577710 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-12 00:10:05 +0000 | 156      |
| 577610 | success | 10000   | td_sample_db | td_sample_table | 0        | 2015-04-11 00:10:04 +0000 | 157      |
+--------+---------+---------+--------------+-----------------+----------+---------------------------+----------+
8 rows in set

Delete Schedule

td connector:delete removes the schedule.

$ td connector:delete daily_import



  • No labels