Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Treasure Data’s bulk-export feature enables you to dump data into your Amazon S3 bucket.


Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including TD Toolbelt.

  • Amazon AWS account and Amazon S3 bucket.

    • This feature requires Amazon S3 Permissions for Object Operations

      • s3:PutObject

      • s3:GetBucketLocation

Limitations

  • Data from one region cannot be exported to a different region.

  • The Bulk Export command no longer supports the partitioning of exported data. This is to optimize the speed of export, which was too slow to meet requirements.

If you do require partitioning, we recommend using this command to export 1-hour segments at a time – automating the process with a script.

  • Exporting float type columns is not supported. If you try to run a table export job with float type columns in the schema, you might see the error message:
    invalid schema: unexpected type: float" A workaround is to manually change the schema of the table to double

  • Bulk export capability is limited to the following regions:  

    Users ofCodeRegion Name
    US Regionus-east-1 US East (N. Virginia) S3 bucket
    Tokyoap-northeast-1 

    Asia Pacific (Tokyo) region S3 bucket


Exporting Your Data to an Amazon S3 Bucket

We highly recommend that you use jsonl.gz or tsv.gz format, for specific performance optimizations.

The dump is performed through MapReduce jobs. The location of the bucket is expressed as an S3 path with the AWS public and private access keys embedded in it.

The td table:export command dumps all the data uploaded to Treasure Data into your Amazon S3 bucket.

  1. From a machine where your TD Toolbelt is installed, open a command line terminal.

  2. Optionally, use the following syntax to validate the latest usage information for the td table:export command.

    Code Block
    linenumberstrue
    td table:export -help


  3. Use the bulk export command to start the bulk export. Specify the database and table from which to dump your data.

    Code Block
    linenumberstrue
    td table:export <db> <table>


  4. Optionally, enter values for the options that you want to use. For example, options are:

    Excerpt Include
    TD Table Export Command Reference
    TD Table Export Command Reference
    nopaneltrue

    Amazon Resource Names (ARNs) uniquely identify AWS resources.

Examples

A simple bulk export syntax might look like:

Code Block
linenumberstrue
td table:export example_db table1 \
 --s3-bucket mybucket \
 -k KEY_ID \
 -s SECRET_KEY


Typical bulk export syntax most likely contains the following options:


Code Block
linenumberstrue
td table:export <database_name> <table_name> \
   --s3-bucket <S3_BUCKET_NAME> \
   --prefix <S3_FILE_PREFIX> \
   --aws-key-id <AWS_KEY> \
   --aws-secret-key <AWS_SECRET_KEY> \
   --file-format jsonl.gz


See also:

Children Display