Bulk Import Overview

This article explains how to bulk-import data using the td import command for version 0.10.84 and above.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • Java runtime (6 and above).

Why Bulk Import?



Because Treasure Data is a cloud service, the data needs to be transferred via an Internet network connection. This can get tricky once the data size gets big (> 100MB). Consider a couple of cases:

  • If the network becomes unstable, the import could fail halfway through the data transfer. There‚Äôs no easy way to pick up from where you left off, and you will need to restart the upload from scratch.
  • Your company sometimes has bandwidth limits for transferring huge data with a single stream. Also, the limitations of the TCP/IP protocol make it difficult for applications to saturate a network connection.

We designed our bulk import feature to overcome these problems. You can now break your larger data sets into smaller chunks and upload them in parallel. If the upload of a particular chunk fails, you can restart the upload for that chunk only. This parallelism will improve your overall upload speed.

Examples

Because bulk import is a complex way to achieve performance reliability, you can use these short cuts to achieve your goal.

To understand the bulk import internals or tips and tricks, please refer the documents below.


Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.