In this tutorial, using the command line interface, you run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.

Update TD Toolbelt and Install TD Workflow

Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.

  1. Complete the instructions in https://docs.treasurInstalling and Updating the TD Toolbelt and Treasure Agent.

Create the workflow_temp Database

Create the following database in your Treasure Data account.

  1. Run this command using TD Toolbelt.

    $ td db:create workflow_temp

The workflow_temp database is created.

Download the Sample Workflow Project

Download your first workflow project directory. The download includes a sample workflow and Presto SQL commands.

  1. Navigate to the download directory.

    $ cd ~/Downloads

  2. Copy and paste the following code to download the sample workflow project.

    $ curl -o -L

  3. The sample workflow project is compressed into a zip file. Use the following code to extract the project.

    $ unzip 

  4. Navigate to the workflow project directory.

    $ cd nasdaq_analysis

Review the Contents of Your Workflow File

  1. Review the contents of the workflow file.

    $ cat nasdaq_analysis.dig

  2. Verify that the workflow contains three sections: timezone, export, and tasks. 

Section One: Timezone

See the timezone and the schedule in which the workflow will run.

timezone: UTC

  daily>: 07:00:00

Section Two: Export

Review the code that defines where the workflow data will export.

    database: workflow_temp

Section Three: Tasks

The workflow has two tasks:

  td>: queries/daily_open.sql
  create_table: daily_open

  td>: queries/monthly_open.sql
  create_table: monthly_open

The + signifies a new task. The text that follows before the : is the name you give the task.

The td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The > signifies that this is where the “action” part of the task is defined—the specific processing to run.

The create_table:___ parameter drops a table if it exists and creates a table that creates the new table based on the output of the task’s query.

Run the Workflow

Typically, when developing your workflow, you start by editing a workflow from your local machine. You can run and iterate on workflow steps that all occur within the TD environment while creating the workflow definition and execution pattern locally.

This workflow creates two tables in the workflow_temp database:

  1. Optionally, before running your first workflow, open your TD Console Job Activities page so you can see the execution when it happens.

  2. Run the sample workflow once from your local machine.

    $ td wf run nasdaq_analysis

  3. Review the TD Console Job Activities page for nasdag_analysis.

  4. Use the command line to verify that the daily_open table was created as expected:

    $ td table:show workflow_temp daily_open

  5. Use the command line to verify that the monthly_open table was created as expected:

    $ td table:show workflow_temp monthly_open

Register and Schedule the Workflow

Scheduling workflows to run on a regular basis is a common task. Your workflow already contains the schedule definition.

  1. Review the scheduling syntax in your workflow:

    timezone: UTC
      daily>: 07:00:00

  2. Register the workflow with Treasure Data.

    $ td wf push nasdaq_analysis

The workflow will run every day at 7 am UTC.

List the Workflows Registered with Treasure Data

From the command line you can list all the workflows defined in your Treasure Data environment.

  1. To retrieve a list of projects and workflows, type the following:

    $ td wf push nasdaq_analysis

  2. Use the following syntax to see the definition of your submitted workflow:

    td workflow workflows <project-name> <name>

For example:

$ td wf workflows nasdaq_analysis nasdaq_analysis