Visit our new documentation site! This documentation page is no longer updated.

REST API

You can control Treasure Data using the public REST API. This article will explain how to use the public REST API.

Table of Contents

Prerequisites

Endpoint

Send all API requests to api.treasuredata.com.

Authorization

Every request to Treasure Data must contain authentication information, provided by the special ‘AUTHORIZATION’ HTTP headers. The header format is as follows:

"AUTHORIZATION: TD1 " + YourApiKeyHere

The API key is your authentication key. Refer to this retrieve your API key information.

The request looks like the following example when using the curl command. (The -H option adds the AUTHORIZATION HTTP header to the request)

$ curl -H "AUTHORIZATION: TD1 YourApiKeyHere" \
  "https://api.treasuredata.com/v3/job/result/2162"

Limitation

Currently, Treasure Data does not throttle access but might limit access to our API, if deemed necessary.

GET /v3/database/list

This command returns a list of your databases.

Parameters

No parameters are required.

Output

  • databases: an array of your databases

Example Result

{
  "databases": [
    {
      "name": "database1",
      "count": 5000,
      "created_at": "2013-11-01 16:48:41 -0700",
      "updated_at": "2013-11-01 16:48:41 -0700",
      "organization": null
    },
    {
      "name": "database2",
      "count": 5000,
      "created_at": "2013-11-08 17:47:22 -0800",
      "updated_at": "2013-11-08 17:47:22 -0800",
      "organization": null
    }
  ]
}

GET /v3/job/list

This command returns a list of your jobs.

Parameters

No parameters are required.

Output

  • a json structure containing the number of jobs, to, from, and jobs an array of the jobs.

Example Result

{
  "count": 2,
  "from": null,
  "to": null,
  "jobs": [
    {
      "status": "success",
      "job_id": "12345",
      "created_at": "2013-11-13 19:39:19 UTC",
      "updated_at": "2013-11-13 19:39:20 UTC",
      "start_at": "2013-11-13 19:39:19 UTC",
      "end_at": "2013-11-13 19:39:20 UTC",
      "query": null,
      "type": "hive",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "https://console.treasuredata.com/jobs/215782",
      "user_name": "owner",
      "organization": null,
      "database": "database1"
    },
    {
      "status": "success",
      "job_id": "56789",
      "created_at": "2013-11-13 19:32:45 UTC",
      "updated_at": "2013-11-13 19:32:46 UTC",
      "start_at": "2013-11-13 19:32:45 UTC",
      "end_at": "2013-11-13 19:32:46 UTC",
      "query": null,
      "type": "bulk_import_perform",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "https://console.treasuredata.com/jobs/215781",
      "user_name": "owner",
      "organization": null,
      "database": "database2"
    }
  ]
}

GET /v3/table/show/:database/:table

This command shows your table.

Parameters

No parameters are required.

Output

  • id: table id
  • name: table name
  • estimated_storage_size: estimated storage size for this table
  • counter_updated_at: timestamp of last record addition
  • last_log_timestamp: timestamp of last log
  • created_at: timestamp of table creation
  • updated_at: timestamp of table update
  • type: always return “log”
  • count: number of records
  • schema: table schema in JSON string
  • expire_days: expiration days if enabled

Example Result

{
  "id": 12345,
  "name": "tbl1",
  "estimated_storage_size": 5684827493,
  "counter_updated_at": "2017-05-10T11:40:59Z",
  "last_log_timestamp": "2017-05-10T19:54:46Z",
  "created_at": "2013-10-22 05:24:15 UTC",
  "updated_at": "2017-05-10 19:51:20 UTC",
  "type": "log",
  "count": 10,
  "schema": "[[\"col1\",\"string\"]]",
  "expire_days": null
}

GET /v3/table/list/:database

This command returns a list of your tables.

Parameters

  • database: the name of the database

Output

  • database: the name of the database
  • tables: an array of your tables within the database

Example Result

{
  "database": "db0",
  "tables": [
    {
      "name": "access_log",
      "count": 13123233
    },
    {
      "name": "payment_log",
      "count": 331232
    }
  ]
}

POST /v3/table/swap/:database/:table1/:table2

This command swaps the contents of two tables.

Parameters

  • database: database name
  • table1: table name (before)
  • table2: table name (after)

Output

  • database: database name
  • table1: table name (before)
  • table2: table name (after)

Example Result

{
  "database": "db1",
  "table1": "tbl1",
  "table2": "tbl2"
}

POST /v3/job/issue/:type/:database

This command issues queries.

Parameters

  • database: name of the database
  • type: the job type (‘hive’ or ‘presto’)
  • query: query string
  • priority (optional): priority of the job. -2 (VERY LOW) to 2 (VERY HIGH). The default is 0 (NORMAL)
  • domain_key (optional): idempotency domain key. See the Job Request Idempotency section below.

Output

  • database: the name of the database
  • job_id: the id of the job

Example Result

{
  "job": "12345",
  "database": "www_access",
  "job_id": "12345",
}

Job Request Idempotency

This API is not idempotent. that is executing the same exact API call twice will cause two distinct queries to be executed on Treasure Data.

This not only causes double the consumption of processing resources in your account but can have other side effects: if the query writes the result to a table or a 3rd party system (see the ‘Data Delivery and Activation’ section), the duplicate queries could cause duplicate results which could cause additional side effect in your downstream pipeline or reporting. This is not normally an issue but becomes important in the event of API trouble, either caused by network impairments or by Treasure Data performing a Scheduled Maintenance.

To help dealing with this issue, the /v3/job/issue/:type/:database API supports domain keys to provide idempotency. Adding a domain key when submitting a query is essentially like assigning a unique ID to the query submission and enables you to completely safely retry the API request as many times as necessary/desired without risk of ever causing query duplication on Treasure Data.

On Treasure Data, when we receive a job submission API request carrying a domain key used before and corresponding to a job previously successfully dispatched (as in put into execution, regardless of whether the query successfully executed or not), the request will be rejected. On the other hand, in case the job submission API request failed in the middle of its execution, leaving the client unsure on whether it was successfully received by Treasure Data and handled, resubmitting the same request with the same domain key, immediately or after a while and for as long as necessary, will ensure that the query eventually gets to Treasure Data.

To modify your data application to leverage request idempotency:

  • generate and add a unique request identifier as the domain_key parameter in the body of the request; the value can be a string of up to 255 characters.
  • error handling:
    • retry the request if the API returns a 500 HTTP status code or greater or no response is received at all (e.g. timeout, netsplit, etc) until the response code is 200 or 409.
    • when the API returns a ‘409 Conflict’:
      • with a ‘Record Not Unique’ error message (see Note 2 below), retry the request until a ‘409 Conflict’ with error message ‘Domain key has already been taken’ is received. Alternatively, query the GET /v3/job/status_by_domain_key/:domain_key API: see Note 3 below for more details about this API.
      • with a ‘Domain key has already been taken’ error message (see Note 1 below), retrieve the original job ID from the response.

Note 1

This is an example of the response in the case of a typical ‘409 Conflict’:

{
  "error": "[\"Domain key has already been taken\"]",
  "text": "[\"Domain key has already been taken\"]",
  "severity": "error",
  "details": {
    "conflicts_with": 3272709
  }
}

The details.conflicts_with field shows the ID of the job that was first received, accepted, and processed and carrying the same domain key used for this request.

Note 2

In rare cases, when two jobs are submitted simultaneously with the same domain key, the application could receive slightly different ‘409 Conflict’ error message:

{
  "error": "[\"Record Not Unique\"]",
  "text": "[\"Record Not Unique\"]",
  "severity": "error"
}

For all intents and purposes, the two types of errors have the same meaning, although the latter does not carry information about the conflicting job ID.

Note 3

Additionally, when an idempotent request is submitted but no request is received due to API issues, your application can use the GET /v3/job/status_by_domain_key/:domain_key API to retrieve the ID corresponding to the job whose request for got successfully accepted (job_id parameter). An example response is:

{
  "status": "queued",
  "cpu_time": null,
  "result_size": null,
  "duration": null,
  "job_id": "3273158",
  "created_at": "2017-10-15 07:25:38 UTC",
  "updated_at": "2017-10-15 07:25:38 UTC",
  "start_at": "",
  "end_at": "",
  "num_records": null
}

But it’s generally better to just extract these information from the ‘409 Conflict’ API response as explained above.

To leverage domain keys with the CLI, please refer to this article.

GET /v3/job/status/:job_id

This command shows the status of a specific job. It is faster and more robust than the /v3/job/show/:job_id command.

Parameters

  • job_id: the specified job_id

Output

  • job_id: the specified job_id
  • status: the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘killed’, ‘success’, or ‘error’
  • created_at: the job creation time
  • start_at: the job starting time
  • end_at: the job end time

Examples

{
  "job_id":"860329",
  "status":"success",
  "created_at":"2012-09-17 21:00:00 UTC",
  "start_at":"2012-09-17 21:00:01 UTC",
  "end_at":"2012-09-17 21:00:52 UTC"
}

GET /v3/job/show/:job_id

This command shows the status and logs of a specific job.

Untitled-3
The resulting logs can be large, therefore using a large timeout value (i.e. several minutes) is recommended when using this command.

Parameters

  • job_id: the specified job_id

Output

  • job_id: the specified job_id
  • type: the job type (‘hive’ or ‘presto’)
  • query: the query
  • database: the name of the database
  • status: the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘success’, or ‘error’
  • created_at: the job creation time
  • updated_at: the latest job update time
  • debug
    • stderr: stderr logs, including MapReduce job logs
    • cmdout: stdout logs

Examples

{
  "type": "hive",
  "query": "SELECT * FROM ACCESS",
  "job_id": "12345",
  "status": "success",
  "url": "https://console.treasuredata.com/jobs/12345",
  "created_at":"Sun Jun 26 17:39:18 -0400 2011",
  "updated_at":"Sun Jun 26 17:39:54 -0400 2011",
  "debug": {
    "stderr": "...",
    "cmdout": "..."
  }
}

POST /v3/job/kill/:job_id

This command kills the currently running job. The kill operation is performed asynchronously.

Parameters

  • job_id: job id

Output

  • former_status: current status of the given job
  • job_id: the specified job id

Example Result

{
  "former_status": "running",
  "job_id": "12345"
}

GET /v3/job/result/:job_id?format=msgpack.gz

This command returns the result of a specific job. Before issuing this command, confirm that the job has completed successfully via the /v3/job/show/:job_id command.

Untitled-3
The resulting logs can be large, therefore using a large timeout value (i.e. several minutes) is recommended when using this command.

Parameters

  • job_id: the specified job_id
  • format: the result format: ‘tsv’, ‘csv’, ‘json’, ‘msgpack’ or ‘msgpack.gz’

Output

  • the result in the specified format.

Example Result

# URL: http://api.treasuredata.com/v3/job/result/2162?format=tsv
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=csv
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=json
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=msgpack
MessagePack format of above result

Libraries

There are several wrapper libraries for the REST API. The following libraries are developed by Treasure Data, Inc:

3rd party developers are also actively developing language bindings:


Last modified: Jan 29 2018 21:23:45 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.