Workflows Introductory Tutorial

Table of Contents


In this tutorial we will run a workflow that includes importing data to Treasure Data, running a presto query that results in the output being sent out of Treasure Data.

Data Connector

For importing data into Treasure Data, will will be utilizing Data Connectors. If you don’t know much about this functionality, you can read our overview. If you’d like to know more about what sources we can currently pull data from, you can see the list from “Data Collection (Bulk)” section.

For this tutorial, we will be using our SFTP Data Connector.

Result Export

For exporting data out of Treasure Data, we will be using our Result Output (Please see “Data Delivery and Activation” section) functionality. This allows you to run a query and submit it’s results to another service.

We will using Result Output to SFTP.

TD Workflows Introductory Tutorial

If you haven’t already, be sure to go through the Introductory Tutorial to install TD Workflows and submit a sample workflow project to Treasure Data.


Download Project & Review the Workflow

Download Project

# Download the sample project
$ cd ~/Downloads
$ curl -o ftp_pull_push.tar.gz -L
    # downloads the sample project

# Extract the downloaded project
$ tar xzvf ftp_pull_push.tar.gz

# Enter the workflow project directory
$ cd ftp_pull_push

The workflow you download is as follows:

    database: workflow_temp

  td_load>: imports/sftp_load.yml
  table: pull_push_tutorial

  td>: queries/count_rows.sql
  result_url: ftp://wf_test:***
This Data Connection yml configuration is not operational. You will need to replace it with your own configuration to run this project yourself.

Data Connector Operator

The first difference you may notice is the use of the td_load> operator. This runs a Data Connector job with a precreated yml config.

The td_load> operator needs both the database: & table: defined somewhere in the workflow file.

  td_load>: imports/sftp_load.yml
  table: pull_push_tutorial

Create required tables

In order to run any Data Connector job in Treasure Data, you must first have the database & table you’re loading data into created. Use this command to do so.

$ td table:create workflow_temp pull_push_tutorial

TD Query Operator with Result Output

For exporting data out of Treasure Data, we are running a normal presto query using the td> operator, and then also including the parameter result_url: for defining what 3rd party source to send the result of the query to.

  td>: queries/count_rows.sql
  result_url: ftp://wf_test:***
You can also use a saved result output (a "favorite") that you created in the console! This instead of the URL notation included in this example. See how below.

If you use the save query, the configuration would instead look like this:

The other

  td>: queries/count_rows.sql
  result_url: my_saved_output_name

You can create a saved output In order to create a saved output you: 1. Go to create a new query 2. Select “add” in the result export section 3. Choose the service that you want to send your data to under “Export to:” 4. Then put in your credentials and add the name under 5. Click “Add to Favorites”

Now you can use the text string of your new favorite output within your workflow!

Submit Workflow to Treasure Data

$ td wf push ftp_pull_push

Execute the Workflow on Treasure Data

When learning to use TD Workflows, it can be helpful to also follow along on the jobs page in Treasure Data while running your workflows.

Run the following command to execute the workflow already submitted to Treasure Data

$ td wf start ftp_pull_push pull_push --session now

And that’s it! Now try creating your own.


If you have any feedback we welcome hearing your thoughts on our TD Workflows ideas forum.

Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!

Last modified: May 08 2017 04:52:45 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.