Treasure Data recommends using Embulk instead of the td import command. Legacy Bulk, executed by the td import command is maintained but is not updated or enhanced

You can bulk-import data using the td import command for td version 0.10.84 and above.

This topic includes:


Prerequisites

  • Basic knowledge of Treasure Data, including the Toolbelt.

  • Java runtime (6 and above).


Why Bulk Import?



Because Treasure Data is a cloud service, the data needs to be transferred via an Internet network connection. This can get tricky once the data size gets big (> 100MB). Consider a couple of cases:

  • If the network becomes unstable, the import could fail halfway through the data transfer. There’s no easy way to pick up from where you left off, and you will need to restart the upload from scratch.

  • Your company sometimes has bandwidth limits for transferring huge data with a single stream. Also, the limitations of the TCP/IP protocol make it difficult for applications to saturate a network connection.

We designed our bulk import feature to overcome these problems. You can now break your larger data sets into smaller chunks and upload them in parallel. If the upload of a particular chunk fails, you can restart the upload for that chunk only. This parallelism will improve your overall upload speed.


Examples

Because bulk import is a complex way to achieve performance reliability, you can use these short cuts to achieve your goal.

To understand the bulk import internals or tips and tricks, refer to the documents below.



  • No labels