You can run Treasure Data from the command line using these commands.
| Command | Example |
|---|---|
| Basic Commands | td |
| Database Commands | td db:create <db> |
| Table Commands | td table:list [db] |
| Query Commands | td query [sql] |
| Import Commands | td import:list |
| Bulk Import Commands | td bulk_import:list |
| Result Commands | td result:list |
| Schedule Commands | td sched:list |
| Schema Commands | td schema:show <db> <table> |
| Connector Commands | td connector:guess [config] |
| User Commands | td user:list |
| Workflow Commands | td workflow init |
| Job Commands | td job:show <job_id> |
You can use the following commands to enable basic functions in Treasure Data.
Show list of options in Treasure Data.
Usage
td| Options | Description |
|---|---|
-c, --config PATH | Path to the configuration file (default: ~/.td/td.conf) |
-k, --apikey KEY | Use this API key instead of reading the config file |
-e, --endpoint API_SERVER | Specify the URL for API server to use (default: https://api.treasuredata.com). The URL must contain a scheme (http:// or https:// prefix) to be valid. |
--insecure | Insecure access: disable SSL/TLS verification. Insecure mode is disabled by default. |
-v, --verbose | Verbose mode |
-r, --retry-post-requests | Retry on failed post requests. Warning: can cause resource duplication, such as duplicated job submissions. |
--version | Show version |
Usage
td <command>| Options | Description |
|---|---|
db | create/delete/list databases |
table | create/delete/list/import/export/tail tables |
query | issue a query |
job | show/kill/list jobs |
import | manage bulk import sessions (Java based fast processing) |
bulk_import | manage bulk import sessions (Old Ruby-based implementation) |
result | create/delete/list result URLs |
sched | create/delete/list schedules that run a query periodically |
schema | create/delete/modify schemas of tables |
connector | manage connectors |
workflow | manage workflows |
status | show scheds, jobs, tables and results |
apikey | show/set API key |
server | show status of the Treasure Data server |
sample | create a sample log file |
help | show help messages |
You can create, delete, and view lists of databases from the command line.
Create a database.
Usage
td db:create <db>Example
td db:create example_dbDelete a database.
Usage
td db:delete <db>| Options | Description |
|---|---|
-f, --force | clear tables and delete the database |
Example
td db:delete example_dbShow list of tables in a database.
Usage
td db:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td db:list
td dbsYou can create, list, show, and organize table structure using the command line.
- td table:list
- td table:show
- td table:create
- td table:delete
- td table:import
- td table:export
- td table:swap
- td table:rename
- td table:tail
- td table:expire
Show list of tables.
Usage
td table:list [db]| Options | Description |
|---|---|
-n, --num_threads VAL | number of threads to get list in parallel |
--show-bytes | show estimated table size in bytes |
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td table:list
td table:list example_db
td tablesDescribe information in a table.
Usage
td table:show <db> <table>| Options | Description |
|---|---|
-v | show more attributes |
Example
td table example_db table1Create a table.
Usage
td table:create <db> <table>| Options | Description |
|---|---|
-T, --type TYPE --expire-days DAYS --include-v BOOLEAN --detect-schema BOOLEAN | set table type (log) set table expire days set include_v flag set detect schema flag |
Example
td table:create example_db table1Delete a table.
Usage
td table:delete <db> <table>| Options | Description |
|---|---|
-f, --force | never prompt |
Example
td table:delete example_db table1Parse and import files to a table
Usage
td table:import <db> <table> <files...>| Options | Description |
|---|---|
--format FORMAT | file format (default: apache) |
--apache | same as --format apache; apache common log format |
--syslog | same as --format syslog; syslog |
--msgpack | same as --format msgpack; msgpack stream format |
--json | same as --format json; LF-separated json format |
-t, --time-key COL_NAME | time key name for json and msgpack format (e.g. 'created_at') |
--auto-create-table | Create table and database if doesn't exist |
Example
td table:import example_db table1 --apache access.log
td table:import example_db table1 --json -t time - < test.json% is a recognized environment variable, so you must use ‘%%’ to set it.
td import:prepare --format csv --column-header \
--time-column 'date' --time-format '%%Y-%%m-%%d' test.csvDump logs in a table to the specified storage
Usage
td table:export <db> <table>| Options | Description |
|---|---|
-w, --wait | wait until the job is completed |
-f, --from TIME | export data which is newer than or same with the TIME |
-t, --to TIME | export data which is older than the TIME |
-b, --s3-bucket NAME | name of the destination S3 bucket (required) |
-p, --prefix PATH | path prefix of the file on S3 |
-k, --aws-key-id KEY_ID | AWS access key id to export data (required) |
-s, --aws-secret-key SECRET_KEY | AWS secret access key to export data (required) |
-F, --file-format FILE_FORMAT | file format for exported data. Available formats are tsv.gz (tab-separated values per line) and jsonl.gz (JSON record per line). The json.gz and line-json.gz formats are default and still available but only for backward compatibility purpose;use is discouraged because they have far lower performance. |
-O, --pool-name NAME | specify resource pool by name |
-e, --encryption ENCRYPT_METHOD | export with server side encryption with the ENCRYPT_METHOD |
-a ASSUME_ROLE_ARN, --assume-role | export with assume role with ASSUME_ROLE_ARN as role arn |
Example
td table:export example_db table1 \
--s3-bucket mybucket -k KEY_ID -s SECRET_KEYSwap the names of two tables.
Usage
td table:swap <db> <table1> <table2>Example
td table:swap example_db table1 table2Rename the existing table.
Usage
td table:rename <db> <from_table> <dest_table>| Options | Description |
|---|---|
--overwrite | replace existing dest table |
Example
td table:rename example_db table1 table2Get recently imported logs.
Usage
td table:tail <db> <table>| Options | Description |
|---|---|
-n, --count N | number of logs to get |
-P, --pretty | pretty print |
Example
td table:tail example_db table1
td table:tail example_db table1 -n 30Expire data in table after specified number of days. Set to 0 to disable the expiration.
Usage
td table:expire <db> <table> <expire_days>Example
td table:expire example_db table1 30You can issue queries from the command line.
Issue a query
Usage
td query [sql]| Options | Description |
|---|---|
-d, --database DB_NAME | use the database (required) |
-w, --wait[=SECONDS] | wait for finishing the job (for seconds) |
-G, --vertical | use vertical table to show results |
-o, --output PATH | write result to the file |
-f, --format FORMAT | format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz) |
-r, --result RESULT_URL | write result to the URL (see also result:create subcommand) It is suggested for this option to be used with the -x / --exclude option to suppress printing of the query result to stdout or -o / --output to dump the query result into a file. |
-u, --user NAME | set user name for the result URL |
-p, --password | ask password for the result URL |
-P, --priority PRIORITY | set priority |
-R, --retry COUNT | automatic retrying count |
-q, --query PATH | use file instead of inline query |
-T, --type TYPE | set query type (hive, trino(presto)) |
--sampling DENOMINATOR | OBSOLETE - enable random sampling to reduce records 1/DENOMINATOR |
-l, --limit ROWS | limit the number of result rows shown when not outputting to file |
-c, --column-header | output of the columns' header when the schema is available for the table (only applies to json, tsv and csv formats) |
-x, --exclude | do not automatically retrieve the job result |
-O, --pool-name NAME | specify resource pool by name |
--domain-key DOMAIN_KEY | optional user-provided unique ID. You can include this ID with your create request to ensure idempotence. |
--engine-version ENGINE_VERSION | specify query engine version by name |
Example
td query -d example_db -w -r rset1 "select count(*) from table1"
td query -d example_db -w -r rset1 -q query.txtYou can import and organize data from the command line using these commands.
- td import:list
- td import:show
- td import:create
- td import:jar_version
- td import:jar_update
- td import:prepare
- td import:upload
- td import:auto
- td import:perform
- td import:error_records
- td import:commit
- td import:delete
- td import:freeze
- td import:unfreeze
- td import:config
List bulk import sessions
Usage
td import:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td import:listShow list of uploaded parts.
Usage
td import:show <name>Example
td import:showCreate a new bulk import session to the table
Usage
td import:create <name> <db> <table>Example
td import:create logs_201201 example_db event_logsShow import jar version
Usage
td import:jar_versionExample
td import:jar_versionUpdate import jar to the latest version
Usage
td import:jar_updateExample
td import:jar_updateConvert files into part file format
Usage
td import:prepare <files...>| Options | Description |
|---|---|
-f, --format FORMAT | source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv |
-C, --compress TYPE | compressed type [gzip, none, auto]; default=auto detect |
-T, --time-format FORMAT | specifies the strftime format of the time column The format slightly differs from Ruby's Time#strftime format in that the'%:z' and '%::z' timezone options are not supported. |
-e, --encoding TYPE | encoding type [UTF-8, etc.] |
-o, --output DIR | output directory. default directory is 'out'. |
-s, --split-size SIZE_IN_KB | size of each parts (default: 16384) |
-t, --time-column NAME | name of the time column |
--time-value TIME,HOURS | time column's value. If the data doesn't have a time column,users can auto-generate the time column's value in 2 ways: Fixed time value with --time-value TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported. Incremental time value with --time-value TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours. This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp to overflow the range (timestamp >= TIME + HOURS * 3600), the next timestamp will restart at TIME and continue from there. E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there. |
--primary-key NAME:TYPE | pair of name and type of primary key declared in your item table |
--prepare-parallel NUM | prepare in parallel (default: 2; max 96) |
--only-columns NAME,NAME,... | only columns |
--exclude-columns NAME,NAME,... | exclude columns |
--error-records-handling MODE | error records handling mode [skip, abort]; default=skip |
--invalid-columns-handling MODE | invalid columns handling mode [autofix, warn]; default=warn |
--error-records-output DIR | write error records; default directory is 'error-records'. |
--columns NAME,NAME,... | column names (use --column-header instead if the first line has column names) |
--column-types TYPE,TYPE,... | column types [string, int, long, double] |
--column-type NAME:TYPE | column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int' |
S, --all-string | disable automatic type conversion |
--empty-as-null-if-numeric | the empty string values are interpreted as null values if columns are numerical types. |
CSV/TSV Specific Options
| Options | Description |
|---|---|
--column-header | first line includes column names |
--delimiter CHAR | delimiter CHAR; default="," at csv, "\t" at tsv |
--escape CHAR | escape CHAR; default=\ |
--newline TYPE | newline [CRLF, LF, CR]; default=CRLF |
--quote CHAR | quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE |
MySQL Specific Options
| Options | Description |
|---|---|
--db-url URL | JDBC connection URL |
--db-user NAME | user name for MySQL account |
--db-password PASSWORD | password for MySQL account |
REGEX Specific Options
| Options | Description |
|---|---|
--regex-pattern PATTERN | pattern to parse line. When 'regex' is used as source file format, this option is required |
Example
td import:prepare logs/*.csv --format csv \
--columns date_code,uid,price,count --time-value 1394409600,10 -o parts/
td import:prepare mytable --format mysql \
--db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
td import:prepare "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" \
--format csv --column-header --time-column date_time -o parts/Upload or re-upload files into a bulk import session
Usage
td import:upload <session name> <files...>| Options | Description |
|---|---|
--retry-count NUM | upload process will automatically retry at specified time; default: 10 |
--auto-create DATABASE.TABLE | create automatically bulk import session by specified database and table names If you use 'auto-create' option, you MUST not specify any session name as first argument. |
--auto-perform | perform bulk import job automatically |
--auto-commit | commit bulk import job automatically |
--auto-delete | delete bulk import session automatically |
--parallel NUM | upload in parallel (default: 2; max 8) |
-f, --format FORMAT | source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv |
-C, --compress TYPE | compressed type [gzip, none, auto]; default=auto detect |
-T, --time-format FORMAT | specifies the strftime format of the time column The format slightly differs from Ruby's Time#strftime format in that the '%:z' and '%::z' timezone options are not supported. |
-e, --encoding TYPE | encoding type [UTF-8, etc.] |
-o, --output DIR | output directory. default directory is 'out'. |
-s, --split-size SIZE_IN_KB | size of each parts (default: 16384) |
-t, --time-column NAME | name of the time column |
--time-value TIME,HOURS | time column's value. If the data doesn't have a time column, users can auto-generate the time column's value in 2 ways: Fixed time value with -time-value TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported. Incremental time value with -time-value TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours. This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp to overflow the range (timestamp >= TIME + HOURS 3600), the next timestamp will restart at TIME and continue from there. E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there. |
--primary-key NAME:TYPE | pair of name and type of primary key declared in your item table |
--prepare-parallel NUM | prepare in parallel (default: 2; max 96) |
--only-columns NAME,NAME,... | only columns |
--exclude-columns NAME,NAME,... | exclude columns |
--error-records-handling MODE | error records handling mode [skip, abort]; default=skip |
--invalid-columns-handling MODE | invalid columns handling mode [autofix, warn]; default=warn |
--error-records-output DIR | write error records; default directory is 'error-records'. |
--columns NAME,NAME,... | column names (use --column-header instead if the first line has column names) |
--column-types TYPE,TYPE,... | column types [string, int, long, double] |
--column-type NAME:TYPE | column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int' |
-S, --all-string | disable automatic type conversion |
--empty-as-null-if-numeric | the empty string values are interpreted as null values if columns are numerical types. |
CSV/TSV Specific Options
| Options | Description |
|---|---|
--column-header | first line includes column names |
--delimiter CHAR | delimiter CHAR; default="," at csv, "\t" at tsv |
--escape CHAR | escape CHAR; default=\ |
--newline TYPE | newline [CRLF, LF, CR]; default=CRLF |
--quote CHAR | quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE |
MySQL Specific Options
| Options | Description |
|---|---|
--db-url URL | JDBC connection URL |
--db-user NAME | user name for MySQL account |
--db-password PASSWORD | password for MySQL account |
REGEX Specific Options
| Options | Description |
|---|---|
--regex-pattern PATTERN | pattern to parse line. When 'regex' is used as source file format, this option is required |
Example
td import:upload mysess parts/* --parallel 4
td import:upload mysess parts/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
td import:upload parts/*.csv --auto-create mydb.mytbl --format csv --columns time,uid,price,count --time-column time -o parts/
td import:upload mysess mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
td import:upload "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/Automatically upload or re-upload files into a bulk import session. It's functional equivalent of 'upload' command with 'auto-perform', 'auto-commit' and 'auto-delete' options. But it, by default, doesn't provide 'auto-create' option. If you want 'auto-create' option, you explicitly must declare it as command options.
Usage
td import:auto <session name> <files...>| Options | Description |
|---|---|
--retry-count NUM | upload process will automatically retry at specified time; default: 10 |
--auto-create DATABASE.TABLE | create automatically bulk import session by specified database and table names If you use 'auto-create' option, you MUST not specify any session name as first argument. |
--parallel NUM | upload in parallel (default: 2; max 8) |
-f, --format FORMAT | source file format [csv, tsv, json, msgpack, apache, regex, mysql]; default=csv |
-C, --compress TYPE | compressed type [gzip, none, auto]; default=auto detect |
-T, --time-format FORMAT | specifies the strftime format of the time column The format slightly differs from Ruby's Time#strftime format in that the '%:z' and '%::z' timezone options are not supported. |
-e, --encoding TYPE | encoding type [UTF-8, etc.] |
-o, --output DIR | output directory. default directory is 'out'. |
-s, --split-size SIZE_IN_KB | size of each parts (default: 16384) |
-t, --time-column NAME | name of the time column |
--time-value TIME,HOURS | time column's value. If the data doesn't have a time column, users can auto-generate the time column's value in 2 ways: Fixed time value with --time-value TIME: where TIME is a Unix time in seconds since Epoch. The time column value is constant and equal to TIME seconds. E.g. '--time-value 1394409600' assigns the equivalent of timestamp 2014-03-10T00:00:00 to all records imported. Incremental time value with --time-value TIME,HOURS: where TIME is the Unix time in seconds since Epoch and HOURS is the maximum range of the timestamps in hours. This mode can be used to assign incremental timestamps to subsequent records. Timestamps will be incremented by 1 second each record. If the number of records causes the timestamp tooverflow the range (timestamp >= TIME + HOURS * 3600), the next timestamp will restart at TIME and continue from there.E.g. '--time-value 1394409600,10' will assign timestamp 1394409600 to the first record, timestamp 1394409601 to the second, 1394409602 to the third, and so on until the 36000th record which will have timestamp 1394445600 (1394409600 + 10 * 3600). The timestamp assigned to the 36001th record will be 1394409600 again and the timestamp will restart from there. |
--primary-key NAME:TYPE | pair of name and type of primary key declared in your item table |
--prepare-parallel NUM | prepare in parallel (default: 2; max 96) |
--only-columns NAME,NAME,... | only columns |
--exclude-columns NAME,NAME,... | exclude columns |
--error-records-handling MODE | error records handling mode [skip, abort]; default=skip |
--invalid-columns-handling MODE | invalid columns handling mode [autofix, warn]; default=warn |
--error-records-output DIR | write error records; default directory is 'error-records'. |
--columns NAME,NAME,... | column names (use --column-header instead if the first line has column names) |
--column-types TYPE,TYPE,... | column types [string, int, long, double] |
--column-type NAME:TYPE | column type [string, int, long, double]. A pair of column name and type can be specified like 'age:int' |
-S, --all-string | disable automatic type conversion |
--empty-as-null-if-numeric | the empty string values are interpreted as null values if columns are numerical types. |
CSV/TSV Specific Options
| Options | Description |
|---|---|
--column-header | first line includes column names |
--delimiter CHAR | delimiter CHAR; default="," at csv, "\t" at tsv |
--escape CHAR | escape CHAR; default=\ |
--newline TYPE | newline [CRLF, LF, CR]; default=CRLF |
--quote CHAR | quote [DOUBLE, SINGLE, NONE]; if csv format, default=DOUBLE. if tsv format, default=NONE |
MySQL Specific Options
| Options | Description |
|---|---|
--db-url URL | JDBC connection URL |
--db-user NAME | user name for MySQL account |
--db-password PASSWORD | password for MySQL account |
REGEX Specific Options
| Options | Description |
|---|---|
--regex-pattern PATTERN | pattern to parse line. When 'regex' is used as source file format, this option is required |
Example
td import:auto mysess parts/* --parallel 4
td import:auto mysess parts/*.csv --format csv --columns time,uid,price,count --time-column time -o parts/
td import:auto parts/*.csv --auto-create mydb.mytbl --format csv --columns time,uid,price,count --time-column time -o parts/
td import:auto mysess mytable --format mysql --db-url jdbc:mysql://localhost/mydb --db-user myuser --db-password mypass
td import:auto "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" --format csv --column-header --time-column date_time -o parts/Start to validate and convert uploaded files
Usage
td import:perform <name>| Options | Description |
|---|---|
-w, --wait | wait for finishing the job |
-f, --force | force start performing |
-O, --pool-name NAME | specify resource pool by name |
Example
td import:perform logs_201201Show records which did not pass validations
Usage
td import:error_records <name>Example
td import:error_records logs_201201Start to commit a performed bulk import session
Usage
td import:commit <name>| Options | Description |
|---|---|
-w, --wait | wait for finishing the commit |
Example
td import:commit logs_201201Delete a bulk import session
Usage
td import:delete <name>Example
td import:delete logs_201201Pause any further data upload for a bulk import session/Reject succeeding uploadings to a bulk import session
Usage
td import:freeze <name>Example
td import:freeze logs_201201Unfreeze a bulk import session
Usage
td import:unfreeze <name>Example
td import:unfreeze logs_201201create guess config from arguments Usage
td import:config <files...>| Options | Description |
|---|---|
-o, --out FILE_NAME | output file name for connector:guess |
-f, --format FORMAT | source file format [csv, tsv, mysql]; default=csv |
--db-url URL | Database Connection URL |
--db-user NAME | user name for database |
--db-password PASSWORD | password for database |
--columns COLUMNS | not supported |
--column-header COLUMN-HEADER | not supported |
--time-column TIME-COLUMN | not supported |
--time-format TIME-FORMAT | not supported |
Example
td import:config "s3://<s3_access_key>:<s3_secret_key>@/my_bucket/path/to/*.csv" -o seed.You can create and organize bulk imports from the command line.
- td bulk_import:list
- td bulk_import:show <name>
- td bulk_import:create <name> <db> <table>
- td bulk_import:prepare_parts <files...>
- td bulk_import:upload_parts <name> <files...>
- td bulk_import:delete_parts <name> <ids...>
- td bulk_import:perform <name>
- td bulk_import:error_records <name>
- td bulk_import:commit <name>
- td bulk_import:delete <name>
- td bulk_import:freeze <name>
- td bulk_import:unfreeze <name>
For instructions on how to use the bulk import commands, refer to the Bulk Import API Tutorial.
List bulk import sessions
Usage
td bulk_import:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td bulk_import:listShows a list of uploaded parts
Usage
td bulk_import:show <name>Example
td bulk_import:show logs_201201Creates a new bulk import session to the table
Usage
td bulk_import:create <name> <db> <table>Example
Converts files into part file format
Usage
td bulk_import:prepare_parts <files...>| Options | Description |
|---|---|
-f, --format NAME | source file format [csv, tsv, msgpack, json] |
-h, --columns NAME,NAME,... | column names (use --column-header instead if the first line has column names) |
-H, --column-header | first line includes column names |
-d, --delimiter REGEX --null REGEX --true REGEX --false REGEX | delimiter between columns (default: (?-mix:\t|,)) null expression for the automatic type conversion (default: (?i-mx:\A(?:null||-|\N)\z)) true expression for the automatic type conversion (default: (?i-mx:\A(?:true)\z)) false expression for the automatic type conversion (default: (?i-mx:\A(?:false)\z)) |
-S, --all-string | disable automatic type conversion |
-t, --time-column NAME | name of the time column |
-T, --time-format FORMAT | strftime(3) format of the time column |
-time-value TIME | value of the time column |
-e, --encoding NAME | text encoding |
-C, --compress NAME | compression format name [plain, gzip] (default: auto detect) |
-s, --split-size SIZE_IN_KB | size of each parts (default: 16384) |
-o, --output DIR | output directory |
Example
td bulk_import:prepare_parts logs/*.csv --format csv \
--columns time,uid,price,count --time-column "time" -o parts/Uploads or re-uploads files into a bulk import session
Usage
td bulk_import:upload_parts <name> <files...>| Options | Description |
|---|---|
-P, --prefix NAME | add prefix to parts name |
-s, --use-suffix COUNT --auto-perform --parallel NUM | use COUNT number of . (dots) in the source file name to the parts name perform bulk import job automatically perform uploading in parallel (default: 2; max 8) |
-O, --pool-name NAME | specify resource pool by name |
Example
Delete uploaded files from a bulk import session
Usage
td bulk_import:delete_parts <name> <ids...>| Options | Description |
|---|---|
-P, --prefix NAME | add prefix to parts name |
Example
td bulk_import:delete_parts logs_201201 01h 02h 03hStart to validate and convert uploaded files
Usage
td bulk_import:perform <name>| Options | Description |
|---|---|
-w, --wait -f, --force -O, --pool-name NAME | wait for finishing the job force start performing specify resource pool by name |
Example
td bulk_import:perform logs_201201Show records which did not pass validations
Usage
td bulk_import:error_records <name>Example
td bulk_import:error_records logs_201201Start to commit a performed bulk import session
Usage
td bulk_import:commit <name>| Options | Description |
|---|---|
-w, --wait | wait for finishing the commit |
Example
td bulk_import:commit logs_201201Delete a bulk import session
Usage
td bulk_import:delete <name>Example
td bulk_import:delete logs_201201Block the upload to a bulk import session
Usage
td bulk_import:freeze <name>Example
td bulk_import:freeze logs_201201Unfreeze a frozen bulk import session
Usage
td bulk_import:unfreeze <name>Example
td bulk_import:unfreeze logs_201201You can use the command line to list, create, show, and delete results.
Show list of result URLs
Usage
td result:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td result:list
td resultsDescribe information of a result URL.
Usage
td result:show <name>Example
td result nameCreate a result URL
Usage
td result:create <name> <URL>| Options | Description |
|---|---|
-u, --user NAME | set user name for authentication |
-p, --password | ask password for authentication |
Example
td result:create name mysql://my-server/mydbDelete a result URL.
Usage
td result:delete <name>Example
td result:delete nameYou can use the command line to schedule, update, delete, and list queries.
- td sched:list
- td sched:create
- td sched:delete
- td sched:update
- td sched:history
- td sched:run
- td sched:result
Show list of schedules
Usage
td sched:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td sched:list
td schedsCreate a schedule
Usage
td sched:create <name> <cron> [sql]| Options | Description |
|---|---|
-d, --database DB_NAME | use the database (required) |
-t, --timezone TZ | name of the timezone. Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option |
-D, --delay SECONDS | delay time of the schedule |
-r, --result RESULT_URL | write result to the URL (see also result:create subcommand) |
-u, --user NAME | set user name for the result URL |
-p, --password | ask password for the result URL |
-P, --priority PRIORITY | set priority |
-q, --query PATH | use file instead of inline query |
-R, --retry COUNT | automatic retrying count |
-T, --type TYPE | set query type (hive) |
Example
td sched:create sched1 "0 * * * *" -d example_db \
"select count(*) from table1" -r rset1
td sched:create sched1 "0 * * * *" \
-d example_db -q query.txt -r rset2Delete a schedule
Usage
td sched:delete <name>Example
td sched:delete sched1Modify a schedule
Usage
td sched:update <name>| Options | Description |
|---|---|
-n, --newname NAME | change the schedule's name |
-s, --schedule CRON | change the schedule |
-q, --query SQL | change the query |
-d, --database DB_NAME | change the database |
-r, --result RESULT_URL | change the result target (see also result:create subcommand) |
-t, --timezone TZ | name of the timezone. Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option |
-D, --delay SECONDS | change the delay time of the schedule |
-P, --priority PRIORITY | set priority |
-R, --retry COUNT | automatic retrying count |
-T, --type TYPE --engine-version ENGINE_VERSION | set query type (hive) specify query engine version by name |
Example
td sched:update sched1 -s "0 */2 * * *" -d my_db -t "Asia/Tokyo" -D 3600Show history of scheduled queries
Usage
td sched:history <name> [max]| Options | Description |
|---|---|
-p, --page PAGE | skip N pages |
-s, --skip N | skip N schedules |
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td sched sched1 --page 1Run scheduled queries for the specified time
Usage
td sched:run <name> <time>| Options | Description |
|---|---|
-n, --num N | number of jobs to run |
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td sched:run sched1 "2013-01-01 00:00:00" -n 6Show status and result of the last job ran. --last [N] enables showing the result before N from the last. The other options are identical to those of the job:show command.
Usage
td sched:result <name>| Options | Description |
|---|---|
-v, --verbose | show logs |
-w, --wait | wait for finishing the job |
-G, --vertical | use vertical table to show results |
-o, --output PATH | write result to the file |
-l, --limit ROWS | limit the number of result rows shown when not outputting to file |
-c, --column-header | output of the columns' header when the schema is available for the table (only applies to tsv and csv formats) |
-x, --exclude --null STRING | do not automatically retrieve the job result null expression in csv or tsv |
-f, --format FORMAT --last [Number] | format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz) show the result before N from the last. default: 1 |
Example
td sched:result NAME | sched:result NAME --last | sched:result NAME --last 3Use the command line to work with schema in a table.
Show schema of a table
Usage
td schema:show <db> <table>Example
td schema example_db table1Set new schema on a table
Usage
td schema:set <db> <table> [columns...]Example
td schema:set example_db table1 user:string size:intAdd new columns to a table.
Usage
td schema:add <db> <table> <columns...>Example
td schema:add example_db table1 user:string size:intRemove columns from a table
Usage
td schema:remove <db> <table> <columns...>Example
td schema:remove example_db table1 user sizeYou can use the command line to control several elements related to connectors.
- td connector:guess
- td connector:preview
- td connector:issue
- td connector:list
- td connector:create
- td connector:show
- td connector:update
- td connector:delete
- td connector:history
- td connector:run
Run guess to generate a connector configuration file. Using the connector's credentials, this command examines the data and attempts to determine the file type, delimiter character, and column names. This "guess" is then written to the configuration file for the connector. This command is useful for file-based connectors.
Usage
td connector:guess [config]| Options | Description |
|---|---|
-type[=TYPE] | (obsoleted) |
-access-id ID | (obsoleted) |
-access-secret SECRET | (obsoleted) |
-source SOURCE | (obsoleted) |
-o, --out FILE_NAME | output file name for connector:preview |
-g, --guess NAME,NAME,... | specify list of guess plugins that users want to use |
Example
td connector:guess seed.yml -o config.ymlExample seed.yml
in:
type: s3
bucket: my-s3-bucket
endpoint: s3-us-west-1.amazonaws.com
path_prefix: path/prefix/to/import/
access_key_id: ABCXYZ123ABCXYZ123
secret_access_key: AbCxYz123aBcXyZ123
out:
mode: appendShow a subset of possible data that the data connector fetches
Usage
td connector:preview <config>| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td connector:preview td-load.ymlRuns connector execution one time only
Usage
td connector:issue <config>| Options | Description |
|---|---|
-database DB_NAME | destination database |
-table TABLE_NAME | destination table |
-time-column COLUMN_NAME | data partitioning key |
-w, --wait | wait for finishing the job |
-auto-create-table | Create table and database if doesn't exist |
Example
td connector:issue td-load.ymlShows a list of connector sessions
Usage
td connector:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td connector:listCreates a new connector session
Usage
td connector:create <name> <cron> <database> <table> <config>| Options | Description |
|---|---|
-time-column COLUMN_NAME | data partitioning key |
-t, --timezone TZ | name of the timezone. Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option |
-D, --delay SECONDS | delay time of the schedule |
Example
td connector:create connector1 "0 * * * *" \
connector_database connector_table td-load.ymlShows the execution settings for a connector such as name, timezone, delay, database, table
Usage
td connector:show <name>Example
td connector:show connector1Modify a connector session
Usage
td connector:update <name> [config]| Options | Description |
|---|---|
-n, --newname NAME | change the schedule's name |
-d, --database DB_NAME | change the database |
-t, --table TABLE_NAME | change the table |
-s, --schedule [CRON] | change the schedule or leave blank to remove the schedule |
-z, --timezone TZ | name of the timezone. Only extended timezones like 'Asia/Tokyo', 'America/Los_Angeles' are supported, (no 'PST', 'PDT', etc...). When a timezone is specified, the cron schedule is referred to that timezone. Otherwise, the cron schedule is referred to the UTC timezone. E.g. cron schedule '0 12 * * *' will execute daily at 5 AM without timezone option and at 12PM with the -t / --timezone 'America/Los_Angeles' timezone option |
-D, --delay SECONDS | change the delay time of the schedule |
-T, --time-column COLUMN_NAME | change the name of the time column |
-c, --config CONFIG_FILE | update the connector configuration |
--config-diff CONFIG_DIFF_FIL | update the connector config_diff |
Example
td connector:update connector1 -c td-bulkload.yml -s '@daily' ...Delete a connector session
Usage
td connector:delete <name>Example
td connector:delete connector1Show the job history of a connector session
Usage
td connector:history <name>| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td connector:history connector1Run a connector session for the specified time option.
Usage
td connector:run <name> [time]| Options | Description |
|---|---|
-w, --wait | wait for finishing the job |
Example
td connector:run connector1 "2016-01-01 00:00:00"You can use the command line to control several elements related to users.
- td user:list
- td user:show
- td user:create
- td user:delete
- td user:apikey:list
- td user:apikey:add
- td user:apikey:remove
Show a list of users.
Usage
td user:list| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Example
td user:list
td user:list -f csvShow a user.
Usage
td user:show <name>Example
td user:show "Roberta Smith"Create a user. As part of the user creation process, you will be prompted to provide a password for the user.
Usage
td user:create <first_name> --email <email_address>Example
td user:create "Roberta" --email "roberta.smith@acme.com"Delete a user.
Usage
td user:delete <email_address>Example
td user:delete roberta.smith@acme.comShow API keys for a user.
| Options | Description |
|---|---|
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Usage
td user:apikey:list <email_address>Example
td user:apikey:list roberta.smith@acme.com
td user:apikey:list roberta.smith@acme.com -f csvAdd an API key to a user.
Usage
td user:apikey:add <email_address>Example
td user:apikey:add roberta.smith@acme.comRemove an API key from a user.
Usage
td user:apikey:remove <email_address> <apikey>Example
td user:apikey:remove roberta.smith@acme.com 1234565/abcdefgYou can create or modify workflows from the CLI using the following commands. The command wf can be used interchangeably with workflow.
Reset the workflow module
Usage
td workflow:resetUpdate the workflow module
Usage
td workflow:update [version]Show workflow module version
Usage
td workflow:versionYou can use the following commands to locally initiate changes to workflows.
Usage
td workflow <command> [options...]| Options | Description |
|---|---|
init <dir> | create a new workflow project |
r[un] <workflow.dig> | run a workflow |
c[heck] | show workflow definitions |
sched[uler] | run a scheduler server |
migrate(run|check) | migrate database |
selfupdate | update CLI to the latest version |
Secrets for local mode use the following command:
td workflow secrets --local
You can use the following commands to initiate changes to workflows from the server.
Usage
td workflow <command> [options...]| Options | Description |
|---|---|
server | start server |
You can use the following commands to initiate changes to workflows from the client.
Usage
td workflow <command> [options...]| Options | Description |
|---|---|
push <project-name> | create and upload a new revision |
download <project-name> | pull an uploaded revision |
start <project-name> <name> | start a new session attempt of a workflow |
retry <attempt-id> | retry a session |
kill <attempt-id> | kill a running session attempt |
backfill <schedule-id> | start sessions of a schedule for past times |
backfill <project-name> <name> | start sessions of a schedule for past times |
reschedule <schedule-id> | skip sessions of a schedule to a future time |
reschedule <project-name> <name> | skip sessions of a schedule to a future time |
projects [name] | show projects |
workflows [project-name] [name] | show registered workflow definitions |
schedules | show registered schedules |
disable <schedule-id> | disable a workflow schedule |
disable <project-name> | disable all workflow schedules in a project |
disable <project-name> <name> | disable a workflow schedule |
enable <schedule-id> | enable a workflow schedule |
enable <project-name> | enable all workflow schedules in a project |
enable <project-name> <name> | enable a workflow schedule |
sessions | show sessions for all workflows |
sessions <project-name> | show sessions for all workflows in a project |
sessions <project-name> <name> | show sessions for a workflow |
session <session-id> | show a single session |
attempts | show attempts for all sessions |
attempts <session-id> | show attempts for a session |
attempt <attempt-id> | show a single attempt |
tasks <attempt-id> | show tasks of a session attempt |
delete <project-name> | delete a project |
secrets --project <project-name> | manage secrets |
version | show client and server version |
| parameter | Description |
|---|---|
-L, --log PATH | output log messages to a file (default: -) |
-l, --log-level LEVEL | log level (error, warn, info, debug or trace) |
-X KEY=VALUE | add a performance system config |
-c, --config PATH.properties | Configuration file (default: /Users/<user_name>/.config/digdag/config) |
--version | show client version |
client options:
| parameter | Description |
|---|---|
-e, --endpoint URL | Server endpoint |
-H, --header KEY=VALUE | Additional headers |
--disable-version-check | Disable server version check |
--disable-cert-validation | Disable certificate verification |
--basic-auth <user:pass> | Add an Authorization header with the provided username and password |
You can view status and results of jobs, view lists of jobs and delete jobs using the CLI.
Show status and results of a job.
Usage
td job:show <job_id>Example
td job:show 1461| Options | Description |
|---|---|
-v, --verbose | show logs |
-w, --wait | wait for finishing the job |
-G, --vertical | use vertical table to show results |
-o, --output PATH | write results to the file |
-l, --limit ROWS | limit the number of result rows shown when not outputting to file |
-c, --column-header | output of the columns' header when the schema is available for the table (only applies to tsv and csv formats) |
-x, --exclude | do not automatically retrieve the job result |
--null STRING | null expression in csv or tsv |
-f, --format FORMAT | format of the result to write to the file (tsv, csv, json, msgpack, and msgpack.gz) |
Show status progress of a job.
Usage
td job:status <job_id>Example
td job:status 1461td job:list [max][max] is the number of jobs to show.
Example
| Options | Description |
|---|---|
-p, --page PAGE | skip N pages |
-s, --skip N | skip N jobs |
-R, --running | show only running jobs |
-S, --success | show only succeeded jobs |
-E, --error | show only failed jobs |
--slow [SECONDS] | show slow queries (default threshold: 3600 seconds) |
-f, --format FORMAT | format of the result rendering (tsv, csv, json or table. default is table) |
Kill or cancel a job.
Usage
td job:kill <job_id>Example
td job:kill 1461