Install the most current TD Toolbelt.
$ td --version
0.11.10in:
type: sftp_ddp
file_names:
- dmp_20180926_123456789.dat
out:
mode: append
exec: {}Use connector:guess. This command automatically reads the source file and assesses (uses logic to guess) the file format.
$ td connector:guess seed.yml -o load.ymlIf you open load.yml, you’ll see the guessed file format definitions. This example is trying to load CSV files.
in:
type: sftp_ddp
file_names:
- dmp_20180926_10130066a050f4e68b46d5b052abaedb05dd2f.dat
out:
mode: append
exec: {}Then, you can preview how the system will parse the file by using the preview command.
$ td connector:preview load.yml
+------------+---------------------------------+
| recordInfo | rawData |
+------------+---------------------------------+
| | 1 lines were processed correctly|
| 1111111111 | Line 1111111111 could not find |Submit the load job
$ td connector:issue load.yml --database td_sample_db --table td_sample_tableThe connector:issue command assumes that you have already created a database(td_sample_db) and a table(td_sample_table). If the database or the table do not exist in TD, the connector:issue command fails, so create the database and table manually or use --an auto-create-table option with td connector:issue command to auto create the database and table:
$ td connector:issue load.yml --database td_sample_db --table td_sample_table --time-column created_at --auto-create-tableYou can schedule periodic Data Connector execution for incremental SFTP DDP file import. We manage our scheduler to ensure high availability. By using this feature, you no longer need a cron daemon on your local data center.
A new schedule can be created using the td connector:create command. The following are required: the name of the schedule, the cron-style schedule, the database and table where the data will be stored, and the Data Connector configuration file.
$ td connector:create \
daily_import \
"10 0 * * *" \
td_sample_db \
td_sample_table \
load.ymlThe cron parameter also accepts three special options: @hourly, @daily, and @monthly.
By default, the schedule is set up in the UTC timezone. You can set the schedule in a timezone using the -t or --timezone option. Note that the --timezone option supports only extended timezone formats like 'Asia/Tokyo', 'America/Los_Angeles', etc. Timezone abbreviations like PST, CST are *not* supported and may lead to unexpected schedules.
You can see the list of currently scheduled entries by running the command td connector:list.
$ td connector:list
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+
| Name | Cron | Timezone | Delay | Database | Table | Config |
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+
| daily_import | 10 0 * * * | UTC | 0 | td_sample_db | td_sample_table | {"in"=>{"type"=>"sftp_ddp", "access_key_id"....|
+--------------+--------------+----------+-------+--------------+-----------------+------------------------------------------------+td connector:show shows the execution setting of a schedule entry.
% td connector:show daily_import
Name : daily_import
Cron : 10 0 * * *
Timezone : UTC
Delay : 0
Database : td_sample_db
Table : td_sample_table
Config
---
in:
type: sftp_ddp
file_names:
- dmp_20180926_10130066a050f4e68b46d5b052abaedb05dd2f.dat
out:
mode: append
exec: {}td connector:delete removes the schedule.