Visit our new documentation site! This documentation page is no longer updated.

Job Management

When you issue a Data Input Transfer or Query on Treasure Data, it issues a Job. The status and result of each job can be viewed in the console, or with CLI commands.

This article will explain how to issue, list, and see the details of each job. Both via GUI, or via CLI

Table of Contents

Job Types

Job Type Definition
Presto A query issued using the Presto engine
Hive A query issued using the Hive engine
BulkLoadJob A job created out of an Input Transfer.
Bulk Import Job A job created from the Bulk Import command in TD Toolbelt
Result Export A job issued from a query with a Result Export configured
PartialDeleteJob A job created from the Partial Delete command in TD Toolbelt.

GUI Based Job Management

View Jobs

If you visit https://console.treasuredata.com/app/jobs, you can see all the current & past jobs from your account

Show Job Details and Job Results

In the GUI, simply click into the job of interest, in order to view the job logs & , if applicable, the Presto/Hive job results.

CLI Based Job Management

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.

Issue a SQL Query Job

The ‘td query’ subcommand issues jobs. Jobs are executed in parallel on the cloud: the degree of parallelism is chosen automatically based on your plan, your remaining capacity, and the size of our cluster at that time.

$ td query -w -d testdb "SELECT COUNT(1) FROM www_access"
Job 702 is started.

If you issue a query WITHOUT the -w option, the command will exit immediately after submitting the job; it will not wait for the job to complete.

$ td query -d testdb "SELECT COUNT(1) FROM www_access"
Job 704 is started.
Use 'td job 704' to show the status.

You can output the job results into local disk as CSV format, instead of STDOUT.

$ td query -o test.csv --format csv -w -d testdb 'SELECT COUNT(1) FROM www_access'
Status     : success
Result     :
written to test.csv in csv format

We currently support the Hive query languages. The example above uses Hive.

Show Jobs

The ‘td jobs’ subcommand lists your submitted jobs. The most recent 20 jobs are shown by default. The command shows the JobID, status, starting time, elapsed time, and the submitted query for each job.

$ td jobs
+-------+---------+-------------------------+----------------+------------------------------------+
| JobID | Status  | Start                   | Elapsed        | Query                              |
+-------+---------+-------------------------+----------------+------------------------------------+
| 784   | error   | 2011-08-01 18:48:31 UTC |       3sec     | SELECT COUNT(1) FROM www_access    |
| 783   | success | 2011-08-01 06:37:15 UTC |      23sec     | SELECT COUNT(1) FROM www_access    |
| 782   | success | 2011-08-01 02:28:15 UTC |      23sec     | SELECT COUNT(1) FROM www_access    |
| 781   | success | 2011-07-31 20:18:30 UTC | 1h  28m  30sec | SELECT COUNT(time) FROM wikistats3 |
| 780   | success | 2011-07-31 19:56:26 UTC | 1h  24m  9sec  | SELECT COUNT(time) FROM wikistats3 |
| 779   | success | 2011-07-31 19:12:20 UTC | 1h  6m  4sec   | SELECT COUNT(time) FROM wikistats3 |
| 778   | success | 2011-07-31 18:48:58 UTC | 1h  7m  21sec  | SELECT COUNT(time) FROM wikistats3 |
| 777   | success | 2011-07-31 18:06:19 UTC | 1h  5m  55sec  | SELECT COUNT(time) FROM wikistats3 |
| 776   | success | 2011-07-31 17:43:42 UTC | 1h  5m  7sec   | SELECT COUNT(time) FROM wikistats3 |
| 775   | success | 2011-07-31 16:59:39 UTC | 1h  6m  35sec  | SELECT COUNT(time) FROM wikistats3 |
| 774   | success | 2011-07-31 16:37:07 UTC | 1h  6m  28sec  | SELECT COUNT(time) FROM wikistats3 |
| 773   | success | 2011-07-31 15:53:44 UTC | 1h  5m  48sec  | SELECT COUNT(time) FROM wikistats3 |
| 772   | success | 2011-07-31 15:32:45 UTC | 1h  4m  12sec  | SELECT COUNT(time) FROM wikistats3 |
| 771   | success | 2011-07-31 14:46:50 UTC | 1h  6m  46sec  | SELECT COUNT(time) FROM wikistats3 |
| 770   | success | 2011-07-31 14:27:44 UTC | 1h  4m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 769   | success | 2011-07-31 13:40:47 UTC | 1h  5m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 768   | success | 2011-07-31 13:18:33 UTC | 1h  9m  7sec   | SELECT COUNT(time) FROM wikistats3 |
| 767   | success | 2011-07-31 12:35:07 UTC | 1h  5m  34sec  | SELECT COUNT(time) FROM wikistats3 |
| 766   | success | 2011-07-31 12:13:32 UTC | 1h  4m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 765   | success | 2011-07-31 11:28:02 UTC | 1h  7m  1sec   | SELECT COUNT(time) FROM wikistats3 |
+-------+---------+-------------------------+----------------+------------------------------------+
20 rows in set

To see older jobs, please specify the page number as an argument.

$ td jobs -p 0 # The most recent 20 jobs
$ td jobs -p 1 # Jobs 21-40
$ td jobs -p 2 # Jobs 41-60

Show Job Details and Job Results

The td job sub-command shows a job’s detailed information; the job is specified by the job_id argument. The command shows the job’s status, query content, and result.

$ td job 783
JobID      : 783
Status     : success
Query      : SELECT COUNT(1) FROM www_access
Result     :
+-----+
| 0   |
+-----+
| 300 |
+-----+
1 row in set
Use '-v' option to show detailed messages.

The ‘-v’ option will show additional information, including execution logs.

Job Priority

Scheduled jobs must sometimes be set as high or low priority items. For example, adhoc queries should not prevent mission critical daily jobs from running. The td query -P <priority> <query> command lets you specify the job priority. The priority is between -2 (VERY LOW) and 2 (VERY HIGH), where 0 is the default.

# VERY HIGH
$ td query -P  2 <query>
# HIGH
$ td query -P  1 <query>
# NORMAL (Default)
$ td query -P  0 <query>
# LOW
$ td query -P -1 <query>
# VERY LOW
$ td query -P -2 <query>
Untitled-3
The 'VERY HIGH' or 'HIGH' options allow users to submit jobs to the hadoop cluster at a higher priority than jobs specified with a normal priority (or 'VERY LOW', 'LOW'). These settings can also help you prevent normal priority jobs from running in parallel with the high priority jobs. Resources will be assigned to higher priority jobs on the hadoop cluster.

Last modified: Mar 20 2018 04:56:04 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.