# Legacy Bulk Import Tips And Tricks This article describes some tips and tricks for bulk import. ## Solving Error: There Was a Problem Accessing the Remote XML Resource This error may have occurred before v0.16.7 versions. When you encounter the following error in `td import:jar_update`, use one of the following solutions to resolve the error. ### Error message: ``` Error: There was a problem accessing the remote XML resource 'http://central.maven.org/maven2/com/treasuredata/td-import/maven-metadata.xml' (TreasureData::Command::UpdateError: An error occurred when fetching from 'http://central.maven.org/maven2/com/treasuredata/td-import/maven-metadata.xml'.) ``` ### Solution 1: Update TD Toolbelt Version The error is solved in v0.16.8 or [later versions](http://docs.treasuredata.com/display/PD/Installing+and+Updating+the+TD+Toolbelt+and+Treasure+Agent). ### Solution 2: Set Variable Setting an environment variable can avoid the error. ``` $ export TD_TOOLBELT_JARUPDATE_ROOT=https://repo1.maven.org ``` ## Using a Proxy Server If you cannot upload your data, verify that your network is using a proxy. You can set the proxy by setting the environment variables: | **Operating System** | **Option 1** | **Option 2** | | --- | --- | --- | | Windows | $ set HTTP_PROXY=http://proxy_host:8080 | $ set HTTP_PROXY=http://user:password@proxy_host:8080 | | Other | $ export HTTP_PROXY="proxy_host:8080" | $ export HTTP_PROXY="user:password@proxy_host:8080" | ## Increasing Performance through Parallelism `td import:auto` supports two options to tune parallelism: `--parallel` and `--prepare-parallel`. See the [TD Toolbelt Command Reference](https://docs.treasuredata.com/articles/project-product-documentation/td-toolbelt+Reference) for the full syntax reference. ``` $ td import:auto session name --parallel NUM --prepare-parallel NUM ``` - Parallel specifies how many threads are to be used for uploading the data. If you observe that the bulk import tool is not saturating your network, you can increase the value of the `--parallel` option. Default is 2, maximum is 8. - Prepare parallel specifies the number of threads to be used to compress the data locally. Normally, this number should match the number of CPU cores on your machine. Default is 2, maximum is 96. ## Specifying a Time Column for Maximum Query Performance Don’t specify ‘`0`’ if you don’t have a time column. Treasure Data partitions the data by time by default. See [Data Partitioning](https://docs.treasuredata.com/smart/project-product-documentation/data-partitioning-in-treasure-data). It is recommended to always specify the time column, or specify the current time. ## Selecting Enable or Disable Auto Jar_Update The option to select Enable/Disable auto jar_update can be included in td v0.11.2 and later versions An environment variable hook: TD_TOOLBELT_JAR_UPDATE. ### JAR auto-update is enabled by default or enabled if the variable is 1: ``` $ td import:prepare $ TD_TOOLBELT_JAR_UPDATE=1 td import:prepare ``` ### JAR auto-update is disabled then variable is set to 0: ``` $ TD_TOOLBELT_JAR_UPDATE=0 td import:prepare ``` but this setting does not affect td import:jar_update, which always updates the JAR file. ## Confirming Time Zone The bulk import tool uses a TZ environment variable. If you think your bulk import time zone is wrong, check your TZ environment variable. ## Encoding Shift_JIS When you encode shift_jis you should set the encoding option to '-e Windows-31J'. ## Using Time-Format If you want to assign the data source’s time format to bulk import, you can use --time-format in accordance with the following correspondence table. | **Letter** | **Date or Time Component** | **Presentation** | **Examples** | | --- | --- | --- | --- | | Y,G | Year with Century | Year | 1996; 2006 | | y,g | The last 2 digits of Year | Year | 96; 06 | | m | Month in year | Month | 01..12 | | B,b | The full/abbreviated month name | Month | January; Jan | | d,e | Day in a month, zero/blank padded | Number | 01..31; 1…31 | | V | Week number of the week-based Year | Number | 01..53 | | j | Day in year | Number | 0-365 | | A,a | The full/abbreviated day name in the week | Text | Tuesday; Tue | | H,k | Hour in day | Number | 00-23; 0-23 | | I,l | Hour in day | Number | 00-11; 0-11 | | M | Minute in hour | Number | 00-59 | | S | Second in minute | Number | 00-59 | | L | Millisecond | Number | 000-999 | | P,p | AM/PM; am/pm marker | Text | AM; PM; am; pm | | Z,z | Time zone | General time zone | GMT-08:00; -0800 | | c | Year to second | Text | Tue Jan 1 14:00:00 2016 | | D,x | Year to date | Text | 01/01/16 | | F | Year to date | Text | 2016-01-01 | | T,X | Hour to second | Text | 14:00:00 | | r | Hour to second am/pm | Text | 02:00:00 pm | | R | Hour to minute | Text | 14:00 | | n | Newline character | LF | \n | | t | Tab character | Tab | \t | | % | Literal % character | % | % |