# Legacy Bulk Import For Aws S3 This article explains how to import data directly from Amazon S3 to Treasure Data. # Install Bulk Import First, install the [Toolbelt](https://support.treasuredata.com/hc/en-us/articles/command-line), which includes bulk loader program, on your computer. ## Downloads - [Toolbelt Installer for Windows](https://toolbelt.treasuredata.com/win) - [Toolbelt Installer for Mac OS X](https://toolbelt.treasuredata.com/mac) - [Toolbelt Installer for Linux](https://toolbelt.treasuredata.com/linux) After the installation, the `td` command will be installed on your computer. Open up the terminal and type `td` to execute the command. Also, make sure you have `java` as well. Run `td import:jar_update` to download the up-to-date version of our bulk loader: ``` $ td usage: td [options] COMMAND [args] $ java Usage: java [-options] class [args...] $ td import:jar_update Installed td-import.jar 0.x.xx into /path/to/.td/java ``` # Authenticate Log in to your Treasure Data account. ``` $ td account -f Enter your Treasure Data credentials. Email: xxxxx Password (typing will be hidden): Authenticated successfully. Use 'td db:create db_name' to create a database. ``` # Importing data from Amazon S3 The bulk loader can read data from files stored in Amazon S3 in all three supported file formats: - CSV (default) - JSON - TSV Suppose you have a file called `data.csv` on Amazon S3 with these contents: ``` "host","log_name","date_time","method","url","res_code","bytes","referer","user_agent" "64.242.88.10","-","2004-03-07 16:05:49","GET","/twiki/bin/edit/Main/Double_bounce_sender?topicparent=Main.ConfigurationVariables",401,12846,"","" "64.242.88.10","-","2004-03-07 16:06:51","GET","/twiki/bin/rdiff/TWiki/NewUserTemplate?rev1=1.3&rev2=1.2",200,4523,"","" "64.242.88.10","-","2004-03-07 16:10:02","GET","/mailman/listinfo/hsdivision",200,6291,"","" "64.242.88.10","-","2004-03-07 16:11:58","GET","/twiki/bin/view/TWiki/WikiSyntax",200,7352,"","" ``` Execute the following commands to upload the CSV file: ``` $ td db:create my_db $ td table:create my_db my_tbl $ td import:auto \ --format csv --column-header \ --time-column date_time \ --time-format "%Y-%m-%d %H:%M:%S" \ --auto-create my_db.my_tbl \ "s3://s3_access_key:s3_secret_key@/my_bucket/path/to/data.csv" ``` where the location of the file is expressed as an S3 path with the AWS public and private access keys embedded in it. Because `td import:auto` executes MapReduce jobs to check the invalid rows, it'll take at least **1-2 minutes**. If the column chosen for `--time-column` is in epoch timestamp (or unix time), you don't need the `--time-format` flag. In the above command, we assumed that: - The CSV files are located on Amazon S3, within a bucket called `my_bucket` under this path/key `/path/to/`. - The first line in the file indicates the column names, hence we specify the `--column-header` option. If the file does not have the column names in the first row, you will have to specify the column names with the `--columns` option (and optionally the column types with `--column-types` option), or use the `--column-types` for each column in the file. - The time field is called “date_time” and it’s specified with the `--time-column` option - The time format is `%Y-%m-%d %H:%M:%S` and it’s specified with the `--time-format` option ### Wildcards The source files to be imported by the bulk loader can be specified as full Amazon S3 paths or using wildcards. Here are some examples: - `s3://my_bucket/path/to/data*` All files under `my_bucket/path/to/` with prefix `data`; - `s3://my_bucket/path/to/data*.csv` All files under `my_bucket/path/to/` with prefix `data` and extension `.csv`; - `s3://my_bucket/path/to/*.csv` All files under `my_bucket/path/to/` with extension `.csv`; - `s3://my_bucket/path/to/*` All files under `my_bucket/path/to/`; - `s3://my_bucket/path/to/*/*.csv` All files in the direct subfolders of `my_bucket/path/` with extension `.csv`; - `s3://my_bucket/**/*.csv` All files in all subfolders of `my_bucket/path/` with extension `.csv`; For further details, check the following pages: - [Bulk Import Internals](/int/legacy-bulk-import-internals) - [Bulk Import Tips and Tricks](/int/legacy-bulk-import-tips-and-tricks)