Job Management

Queries are known as Jobs on the cloud. The status and result of each job can be viewed with CUI commands.

Untitled-3
It's also possible to schedule and run queries directly on the console. Go to https://console.treasure-data.com for more information.

This article will explain how to issue, list, and see the details of each job.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.

Issue Jobs

The ‘td query’ subcommand issues jobs. Jobs are executed in parallel on the cloud: the degree of parallelism is chosen automatically based on your plan, your remaining capacity, and the size of our cluster at that time.

$ td query -w -d testdb "SELECT COUNT(1) FROM www_access"
Job 702 is started.

If you issue a query WITHOUT the -w option, the command will exit immediately after submitting the job; it will not wait for the job to complete.

$ td query -d testdb "SELECT COUNT(1) FROM www_access"
Job 704 is started.
Use 'td job 704' to show the status.

You can output the job results into local disk as CSV format, instead of STDOUT.

$ td query -o test.csv --format csv -w -d testdb 'SELECT COUNT(1) FROM www_access'
Status     : success
Result     :
written to test.csv in csv format

We currently support the Hive and Pig query languages. The example above uses Hive.

Show Jobs

The ‘td jobs’ subcommand lists your submitted jobs. The most recent 20 jobs are shown by default. The command shows the JobID, status, starting time, elapsed time, and the submitted query for each job.

$ td jobs
+-------+---------+-------------------------+----------------+------------------------------------+
| JobID | Status  | Start                   | Elapsed        | Query                              |
+-------+---------+-------------------------+----------------+------------------------------------+
| 784   | error   | 2011-08-01 18:48:31 UTC |       3sec     | SELECT COUNT(1) FROM www_access    |
| 783   | success | 2011-08-01 06:37:15 UTC |      23sec     | SELECT COUNT(1) FROM www_access    |
| 782   | success | 2011-08-01 02:28:15 UTC |      23sec     | SELECT COUNT(1) FROM www_access    |
| 781   | success | 2011-07-31 20:18:30 UTC | 1h  28m  30sec | SELECT COUNT(time) FROM wikistats3 |
| 780   | success | 2011-07-31 19:56:26 UTC | 1h  24m  9sec  | SELECT COUNT(time) FROM wikistats3 |
| 779   | success | 2011-07-31 19:12:20 UTC | 1h  6m  4sec   | SELECT COUNT(time) FROM wikistats3 |
| 778   | success | 2011-07-31 18:48:58 UTC | 1h  7m  21sec  | SELECT COUNT(time) FROM wikistats3 |
| 777   | success | 2011-07-31 18:06:19 UTC | 1h  5m  55sec  | SELECT COUNT(time) FROM wikistats3 |
| 776   | success | 2011-07-31 17:43:42 UTC | 1h  5m  7sec   | SELECT COUNT(time) FROM wikistats3 |
| 775   | success | 2011-07-31 16:59:39 UTC | 1h  6m  35sec  | SELECT COUNT(time) FROM wikistats3 |
| 774   | success | 2011-07-31 16:37:07 UTC | 1h  6m  28sec  | SELECT COUNT(time) FROM wikistats3 |
| 773   | success | 2011-07-31 15:53:44 UTC | 1h  5m  48sec  | SELECT COUNT(time) FROM wikistats3 |
| 772   | success | 2011-07-31 15:32:45 UTC | 1h  4m  12sec  | SELECT COUNT(time) FROM wikistats3 |
| 771   | success | 2011-07-31 14:46:50 UTC | 1h  6m  46sec  | SELECT COUNT(time) FROM wikistats3 |
| 770   | success | 2011-07-31 14:27:44 UTC | 1h  4m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 769   | success | 2011-07-31 13:40:47 UTC | 1h  5m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 768   | success | 2011-07-31 13:18:33 UTC | 1h  9m  7sec   | SELECT COUNT(time) FROM wikistats3 |
| 767   | success | 2011-07-31 12:35:07 UTC | 1h  5m  34sec  | SELECT COUNT(time) FROM wikistats3 |
| 766   | success | 2011-07-31 12:13:32 UTC | 1h  4m  54sec  | SELECT COUNT(time) FROM wikistats3 |
| 765   | success | 2011-07-31 11:28:02 UTC | 1h  7m  1sec   | SELECT COUNT(time) FROM wikistats3 |
+-------+---------+-------------------------+----------------+------------------------------------+
20 rows in set

To see older jobs, please specify the page number as an argument.

$ td jobs -p 0 # The most recent 20 jobs
$ td jobs -p 1 # Jobs 21-40
$ td jobs -p 2 # Jobs 41-60

Show Job Details and Job Results

The td job sub-command shows a job’s detailed information; the job is specified by the job_id argument. The command shows the job’s status, query content, and result.

$ td job 783
JobID      : 783
Status     : success
Query      : SELECT COUNT(1) FROM www_access
Result     :
+-----+
| 0   |
+-----+
| 300 |
+-----+
1 row in set
Use '-v' option to show detailed messages.

The ‘-v’ option will show additional information, including execution logs.

Job Priority

Scheduled jobs must sometimes be set as high or low priority items. For example, adhoc queries should not prevent mission critical daily jobs from running. The td query -P <priority> <query> command lets you specify the job priority. The priority is between -2 (VERY LOW) and 2 (VERY HIGH), where 0 is the default.

# VERY HIGH
$ td query -P  2 <query>
# HIGH
$ td query -P  1 <query>
# NORMAL (Default)
$ td query -P  0 <query>
# LOW
$ td query -P -1 <query>
# VERY LOW
$ td query -P -2 <query>
Untitled-3
The 'VERY HIGH' or 'HIGH' options allow users to submit jobs to the hadoop cluster at a higher priority than jobs specified with a normal priority (or 'VERY LOW', 'LOW'). These settings can also help you prevent normal priority jobs from running in parallel with the high priority jobs. Resources will be assigned to higher priority jobs on the hadoop cluster.

Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.