Legacy Bulk Import From Json File
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

This article explains how to import data from JSON files to Treasure Data.

Install Bulk Loader

Install the Toolbelt, which includes our bulk loader program, on your computer.

After the installation, the td command will be installed on your computer. Open up the terminal, and type td to execute the command.

Make sure you have JAVA.

Execute td import:jar_update to download the up-to-date version of our bulk loader:

$ td
usage: td [options] COMMAND [args]
$ java
Usage: java [-options] class [args...]
$ td import:jar_update
Installed td-import.jar 0.x.xx into /path/to/.td/java

Authenticate

Log into your Treasure Data account.

$ td account -f
Enter your Treasure Data credentials.
Email: xxxxx
Password (typing will be hidden): 
Authenticated successfully.
Use 'td db:create db_name' to create a database.

Importing Data from a JSON

If you have a file called data.json and its content looks like the following example:

$ head -n 1 data.json
{"host":"224.225.147.72","user":"-","method":"GET","path":"/category/electronics","code":200,"referer":"-","size":43,"agent":"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)","date_time":"2004-03-07 16:05:49"}

Execute the following command to upload the JSON file:

$ td db:create my_db
$ td table:create my_db my_tbl
$ td import:auto \
  --format json \
  --time-column date_time \
  --time-format "%Y-%m-%d %H:%M:%S" \
  --auto-create my_db.my_tbl \
  ./data.json


Because `td import:auto` executes MapReduce jobs to check the invalid rows, it'll take at least 1-2 minutes.

| --- |

In the preceding command, we assumed that:

The data file is called data.json and is located in the current directory (hence ./data.json)
The JSON format typically specifies the column names. If it does not, you must specify the column names with the --columns options (and optionally the column types with --column-types option), or use the --column-types for each column in the file.
The time field is called “date_time” and it’s specified with the --time-column option
The time format is %Y-%m-%d %H:%M:%S and it’s specified with the --time-format option

Handling Nested JSON Records

Nested JSON records can be parsed by using Hive’s native get_json_object UDF function or Presto’s native JSON functions. However, we recommend that you maintain a flat JSON structure to avoid additional CPU performance overhead.

Legacy Bulk Import From Json FileCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

Install Bulk Loader

Authenticate

Importing Data from a JSON

Handling Nested JSON Records

Was this helpful?

Legacy Bulk Import From Json File
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code