# Embulk Bulk Import From Csv Files

You can import data from CSV files into Treasure Data using Embulk, an open-source bulk data loader. Embulk enables you to transfer data between various databases, storage locations, file formats, and cloud services.

![](/assets/image-20191021-194315.1d20dc43b99774785560764470f9f3522e77b6b3188edfe480ebf89b39071dba.75c1e439.png)

You can also import data from CSV files using using the Bulk Import program (td-import). Be advised that the td-import is not actively maintained and is a candidate for deprecation. Therefore, we strongly recommend using Embulk.

## Prerequisites

- Basic knowledge of Treasure Data.
- Basic Knowledge of [Embulk](http://www.embulk.org/docs/).
- Embulk is a Java application. Make sure that Java is installed.
- Follow the instructions in [Installing Bulk Data Import](https://docs.treasuredata.com/smart/project-product-documentation/installing-bulk-data-import).


## Create a Seed Configuration File

Using your favorite text editor, create an Embulk config file (for eg:seed.yml) defining the input file and output Treasure Data parameters.

**Example**


```yaml
in:
  type: file
  path_prefix: /path/to/files/sample_
out:
  type: td
  apikey: xxxxxxxxxxxx
  endpoint: api.treasuredata.com
  database: dbname
  table: tblname
  time_column: time
  mode: replace
  default_timestamp_format: '%Y-%m-%d %H:%M:%S'
```

This is Sample Data.


```
id,account,time,purchase,comment
1,32864,2015-01-27 19:23:49,20150127,embulk
2,14824,2015-01-27 19:01:23,20150127,embulk jruby
3,27559,2015-01-28 02:20:02,20150128,"Embulk ""csv"" parser plugin"
4,11270,2015-01-29 11:54:36,20150129,NULL
```

For further details about additional parameters available for embulk-local-file-input, refer to [Embulk Local file input](http://www.embulk.org/docs/built-in.md#local-file-input-plugin) Also details about embulk-output-td, refer to the [TD output plugin for Embulk](https://github.com/treasure-data/embulk-output-td#td-output-plugin-for-embulk).

## Guess Fields (Generate load.yml)

Embulk guess option uses `seed.yml` to read the target file and automatically guesses the column types and settings, and creates a new file `load.yml` with this information.


```bash
embulk guess seed.yml -o load.yml
```

Generated load.yml file.


```yaml
in:
  type: file
  path_prefix: /path/to/files/sample_
  'last_path:': /path/to/files/sample_02.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    null_string: 'NULL'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: id, type: long}
    - {name: account, type: long}
    - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S'}
    - {name: purchase, type: timestamp, format: '%Y%m%d'}
    - {name: comment, type: string}
out: {type: td, apikey: xxxxx, endpoint: api.treasuredata.com, database: dbname, table: tblname, time_column: time, mode: replace, default_timestamp_format: '%Y-%m-%d %H:%M:%S'}
```

Best Practice: Add the "auto_create_table: true" parameter to the load.yml, so that tables that do not exist are automatically.

This is a sample of the auto_create_table parameter in a .yml file.


```yaml
out:
  type: td
  apikey: your apikey
  endpoint: api.treasuredata.com
  database: dbname
  table: tblname
  time_column: created_at
  auto_create_table: true
  mode: append
```

You must create the database and table in TD, prior to executing the load job.

- Alternative: If you:
  - must add a database
  - do not add the auto_create_table parameter in a .yml file and must add a table


run the following TD commands:


```bash
td database:create dbname
td table:create dbname tblname
```

You can also create the database and table using [TD Console](https://console.treasuredata.com/app/databases).

You can preview the data using `embulk preview load.yml` command. If any of the column types or data seems incorrect you can edit `load.yml` file directly and preview again to verify. If the `guess` option doesn’t yield satisfactory results, you can change parameters in `load.yml` according to your requirements manually by using [CSV/TSV parser plugin options](http://www.embulk.org/docs/built-in.md#csv-parser-plugin).


```bash
embulk preview load.yml
```

# Execute Load Job

Issue the import job by running the following command:


```bash
embulk run load.yml
```

It may take a few minutes to several hours for the job to complete, depending on the size of the data.