# Embulk Bulk Import From Aws S3 You can import files from your AWS S3 bucket to Treasure Data using embulk-input-s3 input plugin. Continue to the following topics: ## Prerequisites - Basic knowledge of Treasure Data. - Basic knowledge of [Embulk](http://www.embulk.org/docs/). - Follow the instructions in [Installing Bulk Data Import](https://docs.treasuredata.com/smart/project-product-documentation/installing-bulk-data-import). - [Embulk and embulk-output-td](http://www.embulk.org/docs/) plugin installed on your machine. ## Install embulk-input-s3 Plugin To install embulk-input-s3 plugin, run the following command: ```bash embulk gem install embulk-input-s3 ``` ## Create a Seed Configuration File Using your favorite text editor, create Embulk config file (for eg:seed.yml) defining input(S3) and ouput(TD) parameters. Example: ```yaml in: type: s3 bucket: s3bucket path_prefix: path/to/sample_file # path of *.csv or *.tsv file on your s3 bucket access_key_id: xxxxxxxxxx secret_access_key: xxxxxxxxxxx out: type: td apikey: xxxxxxxxxxxx endpoint: api.treasuredata.com database: dbname table: tblname time_column: datecolumn mode: replace #by default mode: append is used, if not defined. # Imported records are appended to the target table with this mode. #mode: replace, replaces existing target table default_timestamp_format: '%d/%m/%Y' ``` For further details about additional parameters available for embulk-input-s3, see [Embulk Input S3](https://github.com/embulk/embulk-input-s3) ## Guess Fields (Generate load.yml) Embulk guess option uses `seed.yml` to read the target file and automatically guesses the column types/settings and creates a new file `load.yml` with this information. ```bash embulk guess seed.yml -o load.yml ``` Add the "auto_create_table: true" parameter to the load.yml, so that tables that do not exist are automatically. This is a sample of the auto_create_table parameter in a .yml file. ```yaml out: type: td apikey: your apikey endpoint: api.treasuredata.com database: dbname table: tblname time_column: created_at auto_create_table: true mode: append ``` You must create the database and table in TD, prior to executing the load job. If you either: 1) must add a database or 2) do not add the auto_create_table parameter in a .yml file and must add a table, run the following TD commands: ```bash td database:create dbname td table:create dbname tblname ``` You can also create the database and table using TD Console. You can preview the data using `embulk preview load.yml` command. If any of the column types or data seems incorrect you may edit `load.yml` file directly and preview again to verify. If `guess` option doesn’t yield satisfactory results, you may change parameters in `load.yml` according to your requirement manually using [CSV/TSV parser plugin options](http://www.embulk.org/docs/built-in.md#csv-parser-plugin). Create the database and table in TD, using the TD Console or from the command line: ```bash $ td database:create dbname $ td table:create dbname tblname ``` ## Execute Load Job Run the import job using the following command: ```bash embulk run load.yml ``` It may take few mins to hours for the job to complete, depending on the size of the data.