# Embulk Bulk Import From Aws S3

You can import files from your AWS S3 bucket to Treasure Data using embulk-input-s3 input plugin.

Continue to the following topics:

## Prerequisites

- Basic knowledge of Treasure Data.
- Basic knowledge of [Embulk](http://www.embulk.org/docs/).
- Follow the instructions in [Installing Bulk Data Import](https://docs.treasuredata.com/smart/project-product-documentation/installing-bulk-data-import).
- [Embulk and embulk-output-td](http://www.embulk.org/docs/) plugin installed on your machine.


## Install embulk-input-s3 Plugin

To install embulk-input-s3 plugin, run the following command:


```bash
embulk gem install embulk-input-s3
```

## Create a Seed Configuration File

Using your favorite text editor, create Embulk config file (for eg:seed.yml) defining input(S3) and ouput(TD) parameters. Example:


```yaml
in:
  type: s3
  bucket: s3bucket
  path_prefix: path/to/sample_file    # path of *.csv or *.tsv file on your s3 bucket
  access_key_id: xxxxxxxxxx
  secret_access_key: xxxxxxxxxxx
out:
  type: td
  apikey: xxxxxxxxxxxx
  endpoint: api.treasuredata.com
  database: dbname
  table: tblname
  time_column: datecolumn
  mode: replace
  #by default mode: append is used, if not defined.
  # Imported records are appended to the target table with this mode.
  #mode: replace, replaces existing target table
  default_timestamp_format: '%d/%m/%Y'
```

For further details about additional parameters available for embulk-input-s3, see [Embulk Input S3](https://github.com/embulk/embulk-input-s3)

## Guess Fields (Generate load.yml)

Embulk guess option uses `seed.yml` to read the target file and automatically guesses the column types/settings and creates a new file `load.yml` with this information.


```bash
embulk guess seed.yml -o load.yml
```

Add the "auto_create_table: true" parameter to the load.yml, so that tables that do not exist are automatically.

This is a sample of the auto_create_table parameter in a .yml file.


```yaml
out:
  type: td
  apikey: your apikey
  endpoint: api.treasuredata.com
  database: dbname
  table: tblname
  time_column: created_at
  auto_create_table: true
  mode: append
```

You must create the database and table in TD, prior to executing the load job.

If you either: 1) must add a database or 2) do not add the auto_create_table parameter in a .yml file and must add a table, run the following TD commands:


```bash
td database:create dbname
td table:create dbname tblname
```

You can also create the database and table using TD Console.

You can preview the data using `embulk preview load.yml` command. If any of the column types or data seems incorrect you may edit `load.yml` file directly and preview again to verify. If `guess` option doesn’t yield satisfactory results, you may change parameters in `load.yml` according to your requirement manually using [CSV/TSV parser plugin options](http://www.embulk.org/docs/built-in.html#csv-parser-plugin).

Create the database and table in TD, using the TD Console or from the command line:


```bash
$ td database:create dbname
$ td table:create dbname tblname
```

## Execute Load Job

Run the import job using the following command:


```bash
embulk run load.yml
```

It may take few mins to hours for the job to complete, depending on the size of the data.