REST API

The user can control Treasure Data using the public REST API. This article will explain how to use the public REST API.

Table of Contents

Prerequisites

Endpoint

All API requests should be sent to api.treasuredata.com.

Authorization

Every request to Treasure Data must contain authentication information, provided by the special ‘AUTHORIZATION’ HTTP headers. The header format is as follows:

"AUTHORIZATION: TD1 " + YourApiKeyHere

The API key is your authentication key. Please refer here to retrieve your API key.

The request looks like this when using the curl command. (The -H option adds the AUTHORIZATION HTTP header to the request)

$ curl -H "AUTHORIZATION: TD1 YourApiKeyHere" \
  "http://api.treasuredata.com/v3/job/result/2162"

Limitation

We do not throttle access at the moment. However, we may limit access to our API as deemed necessary.

GET /v3/database/list

This command returns a list of your databases.

Parameters

No parameters are required.

Output

  • databases: an array of your databases

Example Result

{
  "databases": [
    {
      "name": "database1",
      "count": 5000,
      "created_at": "2013-11-01 16:48:41 -0700",
      "updated_at": "2013-11-01 16:48:41 -0700",
      "organization": null
    },
    {
      "name": "database2",
      "count": 5000,
      "created_at": "2013-11-08 17:47:22 -0800",
      "updated_at": "2013-11-08 17:47:22 -0800",
      "organization": null
    },
  ]
}

GET /v3/job/list

This command returns a list of your jobs.

Parameters

No parameters are required.

Output

  • a json structure containing the number of jobs, to, from, and jobs an array of the jobs.

Example Result

{
  "count": 2,
  "from": null,
  "to": null,
  "jobs": [
    {
      "status": "success",
      "job_id": "12345",
      "created_at": "2013-11-13 19:39:19 UTC",
      "updated_at": "2013-11-13 19:39:20 UTC",
      "start_at": "2013-11-13 19:39:19 UTC",
      "end_at": "2013-11-13 19:39:20 UTC",
      "query": null,
      "type": "hive",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "http://console.treasuredata.com/jobs/215782",
      "user_name": "owner",
      "organization": null,
      "database": "database1"
    },
    {
      "status": "success",
      "job_id": "56789",
      "created_at": "2013-11-13 19:32:45 UTC",
      "updated_at": "2013-11-13 19:32:46 UTC",
      "start_at": "2013-11-13 19:32:45 UTC",
      "end_at": "2013-11-13 19:32:46 UTC",
      "query": null,
      "type": "bulk_import_perform",
      "priority": 0,
      "retry_limit": 0,
      "hive_result_schema": null,
      "result": "",
      "url": "http://console.treasuredata.com/jobs/215781",
      "user_name": "owner",
      "organization": null,
      "database": "database2"
    },
  ]
}

GET /v3/table/list/:database

This command returns a list of your tables.

Parameters

  • database: the name of the database

Output

  • database: the name of the database
  • tables: an array of your tables within the database

Example Result

{
  "database": "db0",
  "tables": [
    {
      "name": "access_log",
      "count": 13123233
    },
    {
      "name": "payment_log",
      "count": 331232
    }
  ]
}

POST /v3/table/swap/:database/:table1/:table2

This command swaps the contents of two tables.

Parameters

  • database: database name
  • table1: table name (before)
  • table2: table name (after)

Output

  • database: database name
  • table1: table name (before)
  • table2: table name (after)

Example Result

{
  "database": "db1",
  "table1": "tbl1",
  "table2": "tbl2",
}

POST /v3/job/issue/hive/:database

This command issues queries.

Parameters

  • database: name of the database
  • query: query string
  • priority (optional): priority of the job. -2 (VERY LOW) to 2 (VERY HIGH). The default is 0 (NORMAL).

Output

  • database: the name of the database
  • job_id: the id of the job

Example Result

{
  "job_id": "12345",
  "type": "hive",
  "database": "www_access",
  "url": "http://console.treasuredata.com/will-be-ready"
}

GET /v3/job/status/:job_id

This command shows the status of a specific job. It is faster and more robust than the /v3/job/show/:job_id command.

Parameters

  • job_id: the specified job_id

Output

  • job_id: the specified job_id
  • status: the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘success’, or ‘error’
  • created_at: the job creation time
  • start_at: the job starting time
  • end_at: the job end time

Examples

{
  "job_id":"860329",
  "status":"success",
  "created_at":"2012-09-17 21:00:00 UTC",
  "start_at":"2012-09-17 21:00:01 UTC",
  "end_at":"2012-09-17 21:00:52 UTC"
}

GET /v3/job/show/:job_id

This command shows the status and logs of a specific job.

The resulting logs can be large, so using a large timeout value (i.e. several minutes) is recommended when using this command.

Parameters

  • job_id: the specified job_id

Output

  • job_id: the specified job_id
  • type: the job type (only ‘hive’ at the moment)
  • query: the query
  • database: the name of the database
  • status: the job status. The status can be ‘queued’, ‘booting’, ‘running’, ‘success’, or ‘error’
  • created_at: the job creation time
  • updated_at: the latest job update time
  • debug
    • stderr: stderr logs, including MapReduce job logs
    • cmdout: stdout logs

Examples

{
  "type": "hive",
  "query": "SELECT * FROM ACCESS",
  "job_id": "12345",
  "status": "success",
  "url": "http://console.treasuredata.com/jobs/12345",
  "created_at":"Sun Jun 26 17:39:18 -0400 2011",
  "updated_at":"Sun Jun 26 17:39:54 -0400 2011",
  "debug": {
    "stderr": "...",
    "cmdout": "..."
  }
}

POST /v3/job/kill/:job_id

This command kills the currently running job. The kill operation is performed asynchronously.

Parameters

  • job_id: job id

Output

  • former_status: current status of the given job
  • job_id: the specified job id

Example Result

{
  "former_status": "running",
  "job_id": "12345"
}

GET /v3/job/result/:job_id?format=msgpack.gz

This command returns the result of a specific job. Before issuing this command, please confirm that the job has been completed successfully via the /v3/job/show/:job_id command.

The resulting logs can be large, so using a large timeout value (i.e. several minutes) is recommended when using this command.

Parameters

  • job_id: the specified job_id
  • format: the result format: ‘tsv’, ‘csv’, ‘json’, ‘msgpack’ or ‘msgpack.gz’

Output

  • the result in the specified format.

Example Result

# URL: http://api.treasuredata.com/v3/job/result/2162?format=tsv
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc
aaaa    bbbb    cccc

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=csv
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc
aaaa,bbbb,cccc

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=json
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]
[aaaa,bbbb,cccc]

 

# URL: http://api.treasuredata.com/v3/job/result/2162?format=msgpack
MessagePack format of above result

Libraries

There are several wrapper libraries for the REST API. The following libraries are developed by Treasure Data, Inc:

3rd party developers are also actively developing language bindings:


If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels. Live chat with our staffs also work well.