You can import files from your AWS S3 bucket to Treasure Data using embulk-input-s3 input plugin.
Continue to the following topics:
- Basic knowledge of Treasure Data.
- Basic knowledge of Embulk.
- Follow the instructions in Installing Bulk Data Import.
- Embulk and embulk-output-td plugin installed on your machine.
To install embulk-input-s3 plugin, run the following command:
embulk gem install embulk-input-s3Using your favorite text editor, create Embulk config file (for eg:seed.yml) defining input(S3) and ouput(TD) parameters. Example:
in:
type: s3
bucket: s3bucket
path_prefix: path/to/sample_file # path of *.csv or *.tsv file on your s3 bucket
access_key_id: xxxxxxxxxx
secret_access_key: xxxxxxxxxxx
out:
type: td
apikey: xxxxxxxxxxxx
endpoint: api.treasuredata.com
database: dbname
table: tblname
time_column: datecolumn
mode: replace
#by default mode: append is used, if not defined.
# Imported records are appended to the target table with this mode.
#mode: replace, replaces existing target table
default_timestamp_format: '%d/%m/%Y'For further details about additional parameters available for embulk-input-s3, see Embulk Input S3
Embulk guess option uses seed.yml to read the target file and automatically guesses the column types/settings and creates a new file load.yml with this information.
embulk guess seed.yml -o load.ymlAdd the "auto_create_table: true" parameter to the load.yml, so that tables that do not exist are automatically.
This is a sample of the auto_create_table parameter in a .yml file.
out:
type: td
apikey: your apikey
endpoint: api.treasuredata.com
database: dbname
table: tblname
time_column: created_at
auto_create_table: true
mode: appendYou must create the database and table in TD, prior to executing the load job.
If you either: 1) must add a database or 2) do not add the auto_create_table parameter in a .yml file and must add a table, run the following TD commands:
td database:create dbname
td table:create dbname tblnameYou can also create the database and table using TD Console.
You can preview the data using embulk preview load.yml command. If any of the column types or data seems incorrect you may edit load.yml file directly and preview again to verify. If guess option doesn’t yield satisfactory results, you may change parameters in load.yml according to your requirement manually using CSV/TSV parser plugin options.
Create the database and table in TD, using the TD Console or from the command line:
$ td database:create dbname
$ td table:create dbname tblnameRun the import job using the following command:
embulk run load.ymlIt may take few mins to hours for the job to complete, depending on the size of the data.