You can control Treasure Data using the public REST API.

Description

API Call (or REST Call)

return a list of databases

database/list

return a list of jobs

job/list

show a table

table/show/:database/:table

return a list of tables

table/list/:database

swap the content of two tables

swap/:database/:table1/:table2

issue a query

job/issue/:type/:database

show the status of a specific job

job/status/:job_id

show status and logs of a specific job

job/show/:job_id

kill a running job

job/kill/:job_id

returns the result of a specific job

job/result/:job_id?format=msgpack.gz


Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.

  • A table with some data.

  • Basic knowledge of TD query language.

Endpoint

All API requests should be sent to the appropriate endpoint for your region. 

Authorization

Every request to Treasure Data must contain authentication information, provided by the special ‘AUTHORIZATION’ HTTP headers. The header format is as follows:

"AUTHORIZATION: TD1 " + YourApiKeyHere

Retrieve your API key for authentication.

The request looks like the following example when using the cURL command. (The -H option adds the AUTHORIZATION HTTP header to the request)

$ curl -H "AUTHORIZATION: TD1 <YourApiKeyHere>" \
  "https://api.treasuredata.com/v3/job/result/2162"

Limitation

Treasure Data does not throttle access but might limit access to our API, if deemed necessary.

database/list

This command returns a list of your databases.

Verb

URI

GET

/v3/database/list

URI Parameters

No parameters are required.

Output

Option

Description

databases

an array of your databases

Example Result

{
  "databases": [
    {
      "name": "database1",
      "count": 5000,
      "created_at": "2013-11-01 16:48:41 -0700",
      "updated_at": "2013-11-01 16:48:41 -0700",
      "organization": null
    },
    {
      "name": "database2",
      "count": 5000,
      "created_at": "2013-11-08 17:47:22 -0800",
      "updated_at": "2013-11-08 17:47:22 -0800",
      "organization": null
    }
  ]
}


job/list

This command returns a list of your jobs.

Verb

URI

GET

/v3/job/list

URI Parameters

The following parameters are optional (Ex. GET /v3/job/list?from_id=:from_id&to_id=:to_id&status=:job_status).

Option

Description

from

Gets the Job from the nth index in the list.

to

Gets the Job up to the nth index in the list.

status

return only jobs that are in status job_status.

Supported statuses:

  • running

  • queued

  • success

  • error

Output

  • a JSON structure containing the number of jobs, to, from, and jobs an array of the jobs.

Example Result

{
  "count": 2,
  "from": null,
  "to": null,
  "jobs": [
    {
      "status": "success",
      "job_id": "12345",
      "created_at": "2013-11-13 19:39:19 UTC",
      "updated_at": "2013-11-13 19:39:20 UTC",
      "start_at": "2013-11-13 19:39:19 UTC",
      "end_at": "2013-11-13 19:39:20 UTC",
      "query": null,
      "type": "hive",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "https://console.treasuredata.com/jobs/215782",
      "user_name": "owner",
      "organization": null,
      "database": "database1"
    },
    {
      "status": "success",
      "job_id": "56789",
      "created_at": "2013-11-13 19:32:45 UTC",
      "updated_at": "2013-11-13 19:32:46 UTC",
      "start_at": "2013-11-13 19:32:45 UTC",
      "end_at": "2013-11-13 19:32:46 UTC",
      "query": null,
      "type": "bulk_import_perform",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "https://console.treasuredata.com/jobs/215781",
      "user_name": "owner",
      "organization": null,
      "database": "database2"
    }
  ]
}

table/show/:database/:table

This command shows your table.

Verb

URI

GET

/v3/table/show/:database/:table

Parameters

No parameters are required.

Output

Option

Description

id

table id

name

table name

estimated_storage_size

estimated storage size for this table

counter_updated_at

timestamp of last record addition

last_log_timestamp

timestamp of last log

created_at

timestamp of table creation

updated_at

timestamp of table update

type

always return log

count

number of records

schema

table schema in JSON string

expire_days

expiration days if enabled

Example Result

{
  "id": 12345,
  "name": "tbl1",
  "estimated_storage_size": 5684827493,
  "counter_updated_at": "2017-05-10T11:40:59Z",
  "last_log_timestamp": "2017-05-10T19:54:46Z",
  "created_at": "2013-10-22 05:24:15 UTC",
  "updated_at": "2017-05-10 19:51:20 UTC",
  "type": "log",
  "count": 10,
  "schema": "[[\"col1\",\"string\"]]",
  "expire_days": null
}

table/list/:database

This command returns a list of your tables.

Verb

URI

GET

/v3/table/list/:database

Parameters

Option

Description

database

the name of the database


Output

Option

Description

database

the name of the database

tables

an array of your tables within the database

Example Result

{
  "database": "db0",
  "tables": [
    {
      "name": "access_log",
      "count": 13123233
    },
    {
      "name": "payment_log",
      "count": 331232
    }
  ]
}


table/swap/:database/:table1/:table2

This command swaps the contents of two tables.

Verb

URI

POST

/v3/table/swap/:database/:table1/:table2

Parameters

Option

Description

database

database name

table1

table name (before)

table2

table name (after)

Output

Option

Description

database

database name

table1

table name (before)

table2

table name (after)

Example Result

{
  "database": "db1",
  "table1": "tbl1",
  "table2": "tbl2"
}


job/issue/:type/:database

This command issues queries.

Verb

URI

POST

/v3/job/issue/:type/:database

Parameters

Option

Description

database

name of database

type

the job type ('hive' or ‘presto’)

query

query string

priority

priority of the job (optional)

  • 2 (Very High)

  • 1 (High)

  • 0 (Default)

  • -1 (Low)

  • -2 (Very Low)

domain_key

idempotency domain key (optional). Review Job Request Idempotency


Output

Option

Description

database

the name of the database

job_id

the id of the job

Example Result

{
  "job": "12345",
  "database": "www_access",
  "job_id": "12345",
}


Job Request Idempotency

This API is not idempotent. that is executing the same exact API call twice will cause two distinct queries to be executed on Treasure Data.

This not only causes double the consumption of processing resources in your account but can have other side effects: if the query writes the result to a table or a 3rd party system (see the ‘Data Delivery and Activation’ section), the duplicate queries could cause duplicate results which could cause additional side effect in your downstream pipeline or reporting. This is not normally an issue but becomes important in the event of API trouble, either caused by network impairments or by Treasure Data performing a Scheduled Maintenance.

To help to deal with this issue, the /v3/job/issue/:type/:database API supports domain keys to provide idempotency. Adding a domain key when submitting a query is essentially like assigning a unique ID to the query submission and enables you to completely safely retry the API request as many times as necessary without the risk of ever causing query duplication on Treasure Data.

On Treasure Data, when we receive a job submission API request carrying a domain key used before and corresponding to a job previously successfully dispatched (as input into execution, regardless of whether the query successfully executed or not), the request will be rejected. In case the job submission API request failed in the middle of its execution, leaving the client unsure on whether it was successfully received by Treasure Data and handled, resubmitting the same request with the same domain key, immediately or after a while and for as long as necessary, will ensure that the query eventually gets to Treasure Data.

To modify your data application to leverage request idempotency:

  • generate and add a unique request identifier as the domain_key parameter in the body of the request; the value can be a string of up to 255 characters.

  • error handling:

    • retry the request if the API returns a 500 HTTP status code or greater or no response is received at all (e.g. timeout, netsplit, etc) until the response code is 200 or 409.

    • when the API returns a ‘409 Conflict’:

      • with a ‘Record Not Unique’ error message (see Note 2 below), retry the request until a ‘409 Conflict’ with an error message ‘Domain key has already been taken’ is received. Alternatively, query the GET /v3/job/status_by_domain_key/:domain_key API: see Note 3 below for more details about this API.

      • with a ‘Domain key has already been taken’ error message (see Note 1 below), retrieve the original job ID from the response.

Example 1

This is an example of the response in the case of a typical ‘409 Conflict’:

{
  "error": "[\"Domain key has already been taken\"]",
  "text": "[\"Domain key has already been taken\"]",
  "severity": "error",
  "details": {
    "conflicts_with": 3272709
  }
}

The details.conflicts_with field shows the ID of the job that was first received, accepted, and processed and carrying the same domain key used for this request.

Example 2

In rare cases, when two jobs are submitted simultaneously with the same domain key, the application could receive slightly different ‘409 Conflict’ error message:

{
  "error": "[\"Record Not Unique\"]",
  "text": "[\"Record Not Unique\"]",
  "severity": "error"
}

For all intents and purposes, the two types of errors have the same meaning, although the latter does not carry information about the conflicting job ID.

Example 3

Additionally, when an idempotent request is submitted but no request is received due to API issues, your application can use the GET /v3/job/status_by_domain_key/:domain_key API to retrieve the ID corresponding to the job whose request was successfully accepted (job_id parameter). An example response is:

{
  "status": "queued",
  "cpu_time": null,
  "result_size": null,
  "duration": null,
  "job_id": "3273158",
  "created_at": "2017-10-15 07:25:38 UTC",
  "updated_at": "2017-10-15 07:25:38 UTC",
  "start_at": "",
  "end_at": "",
  "num_records": null
}

Typically, extracting this information from the ‘409 Conflict’ API response is best practice.

Leverage domain keys with the CLI.

job/status/:job_id

This command shows the status of a specific job. It is faster and more robust than the /v3/job/show/:job_id command.

Verb

URI

GET

/v3/job/status/:job_id

Parameters

Option

Description

job_id

the specified job_id

Output

Option

Description

job_id

the specified job_id

status

the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘killed’, ‘success’, or ‘error’

created_at

the job creation time

start_at

the job starting time

end_at

the job ending time


Examples

{
  "job_id":"860329",
  "status":"success",
  "created_at":"2012-09-17 21:00:00 UTC",
  "start_at":"2012-09-17 21:00:01 UTC",
  "end_at":"2012-09-17 21:00:52 UTC"
}

job/show/:job_id

This command shows the status and logs of a specific job.

The resulting logs can be large. Because of the large result logs using a large timeout value (i.e. several minutes) is recommended.

Verb

URI

GET

/v3/job/show/:job_id

Parameters

Option

Description

job_id

the specified job_id


Output

Option

Description

job_id

the specified job_id

type

the job type (‘hive’ or ‘presto’)

query

the query

database

the name of the database

status

the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘success’, or ‘error’

created_at

the job creation time

updated_at

the latest job update time

debug

  • stderr

  • cmdout


stderr logs, including MapReduce job logs

stdout logs


Examples

{
  "type": "hive",
  "query": "SELECT * FROM ACCESS",
  "job_id": "12345",
  "status": "success",
  "url": "https://console.treasuredata.com/jobs/12345",
  "created_at":"Sun Jun 26 17:39:18 -0400 2011",
  "updated_at":"Sun Jun 26 17:39:54 -0400 2011",
  "debug": {
    "stderr": "...",
    "cmdout": "..."
  }
}

job/kill/:job_id

This command kills the running job. The kill operation is performed asynchronously.

Verb

URI

POST

/v3/job/kill/:job_id

Parameters

Option

Definition

job_id

job id


Output

Option

Definition

  • former_status


job_id

the specified job_id

  • former_status: current status of the given job

  • job_id: the specified job id

Example Result

{
  "former_status": "running",
  "job_id": "12345"
}

job/result/:job_id?format=msgpack.gz

This command returns the result of a specific job. Before issuing this command, confirm that the job has completed successfully via the /v3/job/show/:job_id command.

The resulting logs can be large. Because of the large result logs using a large timeout value (i.e. several minutes) is recommended.

Verb

URI

GET

/v3/job/result/:job_id?format=msgpack.gz

Parameters

Option

Description

job_id

the specified job_id

format

the result format: ‘tsv’, ‘csv’, ‘json’, ‘msgpack’ or ‘msgpack.gz’

Output

The result in the specified format.

Example Result

# URL: http://api.treasuredata.com/v3/job/result/2162?format=tsv
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc


# URL: http://api.treasuredata.com/v3/job/result/2162?format=csv
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc


# URL: http://api.treasuredata.com/v3/job/result/2162?format=json
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]

The format returned by this endpoint is JSONL (JSON Lines) http://jsonlines.org/. JSONL does not follow the standard JSON syntax notation, such as opening and closing brackets, commas separating records. The difference in notation can cause syntax errors with systems that expect standard JSON.


# URL: http://api.treasuredata.com/v3/job/result/2162?format=msgpack
MessagePack format of results

Libraries

There are several wrapper libraries for the REST API. The following libraries are developed by Treasure Data:

3rd party developers are also actively developing language bindings:

  • No labels