Skip to content
Last updated

Legacy Bulk Import Tips And Tricks

This article describes some tips and tricks for bulk import.

Solving Error: There Was a Problem Accessing the Remote XML Resource

This error may have occurred before v0.16.7 versions.

When you encounter the following error in td import:jar_update, use one of the following solutions to resolve the error.

Error message:

Error: There was a problem accessing the remote XML resource
'http://central.maven.org/maven2/com/treasuredata/td-import/maven-metadata.xml'
(TreasureData::Command::UpdateError: An error occurred when fetching
from 'http://central.maven.org/maven2/com/treasuredata/td-import/maven-metadata.xml'.)

Solution 1: Update TD Toolbelt Version

The error is solved in v0.16.8 or later versions.

Solution 2: Set Variable

Setting an environment variable can avoid the error.

$ export TD_TOOLBELT_JARUPDATE_ROOT=https://repo1.maven.org

Using a Proxy Server

If you cannot upload your data, verify that your network is using a proxy. You can set the proxy by setting the environment variables:

Operating SystemOption 1Option 2
Windows$ set HTTP_PROXY=http://proxy_host:8080$ set HTTP_PROXY=http://user:password@proxy_host:8080
Other$ export HTTP_PROXY="proxy_host:8080"$ export HTTP_PROXY="user:password@proxy_host:8080"

Increasing Performance through Parallelism

td import:auto supports two options to tune parallelism: --parallel and --prepare-parallel.

See the TD Toolbelt Command Reference for the full syntax reference.

$ td import:auto session name <files...>
--parallel NUM
--prepare-parallel NUM
  • Parallel specifies how many threads are to be used for uploading the data. If you observe that the bulk import tool is not saturating your network, you can increase the value of the --parallel option. Default is 2, maximum is 8.
  • Prepare parallel specifies the number of threads to be used to compress the data locally. Normally, this number should match the number of CPU cores on your machine. Default is 2, maximum is 96.

Specifying a Time Column for Maximum Query Performance

Don’t specify ‘0’ if you don’t have a time column. Treasure Data partitions the data by time by default. See Data Partitioning. It is recommended to always specify the time column, or specify the current time.

Selecting Enable or Disable Auto Jar_Update

The option to select Enable/Disable auto jar_update can be included in td v0.11.2 and later versions

An environment variable hook: TD_TOOLBELT_JAR_UPDATE.

JAR auto-update is enabled by default or enabled if the variable is 1:

$ td import:prepare
$ TD_TOOLBELT_JAR_UPDATE=1 td import:prepare

JAR auto-update is disabled then variable is set to 0:

$ TD_TOOLBELT_JAR_UPDATE=0 td import:prepare

but this setting does not affect td import:jar_update, which always updates the JAR file.

Confirming Time Zone

The bulk import tool uses a TZ environment variable. If you think your bulk import time zone is wrong, check your TZ environment variable.

Encoding Shift_JIS

When you encode shift_jis you should set the encoding option to '-e Windows-31J'.

Using Time-Format

If you want to assign the data source’s time format to bulk import, you can use --time-format in accordance with the following correspondence table.

LetterDate or Time ComponentPresentationExamples
Y,GYear with CenturyYear1996; 2006
y,gThe last 2 digits of YearYear96; 06
mMonth in yearMonth01..12
B,bThe full/abbreviated month nameMonthJanuary; Jan
d,eDay in a month, zero/blank paddedNumber01..31; 1…31
VWeek number of the week-based YearNumber01..53
jDay in yearNumber0-365
A,aThe full/abbreviated day name in the weekTextTuesday; Tue
H,kHour in dayNumber00-23; 0-23
I,lHour in dayNumber00-11; 0-11
MMinute in hourNumber00-59
SSecond in minuteNumber00-59
LMillisecondNumber000-999
P,pAM/PM; am/pm markerTextAM; PM; am; pm
Z,zTime zoneGeneral time zoneGMT-08:00; -0800
cYear to secondTextTue Jan 1 14:00:00 2016
D,xYear to dateText01/01/16
FYear to dateText2016-01-01
T,XHour to secondText14:00:00
rHour to second am/pmText02:00:00 pm
RHour to minuteText14:00
nNewline characterLF\n
tTab characterTab\t
%Literal % character%%