Bulk Import Overview
This article explains how to bulk-import data using the
td import command for version 0.10.84 and above.
Table of Contents
- Basic knowledge of Treasure Data, including the toolbelt.
- Java runtime (6 and above).
Why Bulk Import?
Because Treasure Data is a cloud service, the data needs to be transferred via an Internet network connection. This can get tricky once the data size gets big (> 100MB). Consider a couple of cases:
- If the network becomes unstable, the import could fail halfway through the data transfer. There’s no easy way to pick up from where you left off, and you will need to restart the upload from scratch.
- Your company sometimes has bandwidth limits for transferring huge data with a single stream. Also, the limitations of the TCP/IP protocol make it difficult for applications to saturate a network connection.
We designed our bulk import feature to overcome these problems. You can now break your larger data sets into smaller chunks and upload them in parallel. If the upload of a particular chunk fails, you can restart the upload for that chunk only. This parallelism will improve your overall upload speed.
Because bulk import is a complex way to achieve performance reliability, you can use these short cuts to achieve your goal.
- Bulk Import from CSV file
- Bulk Import from TSV file
- Bulk Import from JSON file
- Bulk Import from Amazon S3
- Bulk Import from MySQL
- Bulk Import from PostgreSQL
- Bulk Import from MongoDB
To understand the bulk import internals or tips and tricks, please refer the documents below.
Last modified: Jun 20 2015 01:03:44 UTC