Pentaho Data Integration with JDBC Driver

This article will explain how to use Treasure Data with Pentaho Data Integration by using our JDBC driver. By combining Treasure Data with Pentaho, users can scale their existing Pentaho Data Integration environment to handle huge volumes of data.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data. The Quickstart Guide is a good place to start.
Untitled-3
Don't have time to setup Pentaho + Treasure Data? Leverage our Setup Consultation Service.

Download Pentaho Data Integration (Kettle)

You can download Pentaho Data Integration (Kettle) from the link below. Version 4.3.0 was tested for this article

Download JDBC Driver

You can download the driver from the link below. The driver is still in beta, so any feedback would be appreciated.

Untitled-3
This driver works only with Treasure Data. It does not work with other environments such as your local Hadoop/Hive cluster.

Copy JDBC Driver Jar to Pentaho Data Integration

Before starting Pentaho Data Integration, please copy the Treasure Data JDBC driver to the libext/JDBC directory specified by Pentaho Data Integration.

$ cp td-jdbc-VERSION.jar \
  <pentaho-data-integration-install>/data-integration/libext/JDBC/

Create Treasure Data Database Connection

Let’s add Treasure Data as Pentaho Data Integration’s database connection and make a new transformation. Please follow the procedure below.

Step 1: Create a New Transform

Select Transforms as shown in the figure below. To create a new transform, right-click New in the context menu.

Step 2: Create New Database Connection

Navigating to Tools > Wizard > Create database connection... will bring up a pop-up window to create a new database connection. Select the database name and type as shown below.



Step 3: Specify the Driver Class Name

Specify the connection’s URL and name of the driver class as shown below.



Step 4: Set the Username and Password

Specify your username and password as shown below. Please use your Treasure Data credentials for these fields (Your User Name is the email address used to register on Treasure Data). Press the Finish button to create the database connection on your Pentaho Data Integration environment.



Use Treasure Data Database as Table input

Step 1: Specify Table Input

Select Table input from the transform’s Input context menu. Drag and drop onto the workspace as shown below.

Step 2: Edit the Table Input

Right-click the Table input icon on the workspace. Select Edit step from the context menu. Configure your Table input as shown below.

We cover the query above in the JasperSoft iReport with JDBC Driver article.

Step 3: Confirm the Table Input

To confirm the behavior of the Table input, write out your data on Treasure Data to JSON output as shown below.

Tips: How can I use Pentaho through a proxy?

Check ‘start-pentaho.bat’ or ‘start-pentaho.sh’, and add the following parameters to CATALINA_OPTS option:

CATALINA_OPTS="-Dhttp.proxyHost=<proxy address> -Dhttp.proxyPort=<proxy port>"


Last modified: Jul 07 2016 19:00:56 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.