Skip to content
Last updated

Installing Bulk Data Import

You can import data using Treasure Data’s open-source bulk data loader Embulk. Embulk helps transfer data between various databases, storage locations, file formats, and cloud services. For more information, see the Embulk documentation.

Contents

Prerequisites

  • Basic knowledge of Treasure Data
  • Basic knowledge of Embulk
  • Java installed (Embulk is a Java application)
  • JRuby installed and configured (Embulk v0.10.50 and v0.11.0 do not include JRuby; see the "JRuby" section of Embulk v0.11 is coming soon)

Installing Embulk from the Command Line

PlatformSteps
Linux / macOS / BSD (UNIX)Run:
curl --create-dirs -o ~/.embulk/bin/embulk -L "http://dl.embulk.org/embulk-latest.jar"
chmod +x ~/.embulk/bin/embulk
echo 'export PATH="$HOME/.embulk/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Windows (PowerShell)|Run:

Invoke-WebRequest http://dl.embulk.org/embulk-latest.jar -OutFile embulk.bat

Installing the Embulk Treasure Data Plugin

Embulk plugins load data to or from various systems and file formats. See the list of Embulk plugins.

Install the embulk-output-td plugin (imports records to Treasure Data):

embulk gem install embulk-output-td

Using a Proxy Server

If you cannot upload, verify whether your network uses a proxy. Set the proxy with command-line options:

Linux:

embulk -J-Dhttp.proxyHost=HOST -J-Dhttp.proxyPort=PORT -J-Dhttp.proxyUser=USER -J-Dhttp.proxyPassword=PASS run config.yml

Windows:

embulk.bat "-J-Dhttps.proxyHost=HOST" "-J-Dhttps.proxyPort=PORT" "-J-Dhttp.proxyUser=USER" "-J-Dhttp.proxyPassword=PASS" run config.yml

Or run via Java directly:

java -Dhttps.proxyHost=HOST -Dhttps.proxyPort=PORT -jar embulk.bat run config.yml