Visit our new documentation site! This documentation page is no longer updated.

Treasure Workflow Quick Start Tutorial for the CLI

Table of Contents

Introduction

In this tutorial, using the command line interface, you will run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.

Let’s get started!

Install TD Toolbelt

Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.

First, visit Installing and Updating the Treasure Data CLI.

# Set up the toolbelt to access your account
$ td account

# Follow prompts for inputting your Treasure Data username & password

If you already have TD Toolbelt installed, update it to the latest version.

$ td update

Now install the TD Toolbelt Workflow module by running the workflow command. Answer Y when prompted.

$ td workflow

The td workflow command can also be abbreviated to td wf. We will be using this shorter form throughout the rest of this tutorial.

Prepare the database on Treasure Data to run this tutorial within

To run this tutorial you’ll need to create the following database in your Treasure Data account. Run this command using TD Toolbelt.

$ td db:create workflow_temp

Create your first workflow project

Download sample workflow project

With this first example, we’ll help you a bit by having you download your first Workflow project directory. This will include a sample workflow & Presto SQL commands that you’ll be running.

# Download the sample project
$ cd ~/Downloads
$ curl -o nasdaq_analysis.zip -L https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip

# Extract the downloaded project
$ unzip nasdaq_analysis.zip

# Enter the workflow project directory
$ cd nasdaq_analysis

Check out the contents of your workflow file

# Print the contents of the workflow file.
$ cat nasdaq_analysis.dig

The workflow that prints is made up of 3 sections, described as follows:

In section 1, you see the definition for on what interval the workflow will run:

timezone: UTC

schedule:
  daily>: 07:00:00

In section 2, you will see how to choose what Treasure Data database the workflow will run against

_export:
  td:
    database: workflow_temp

In section 3, you will see the workflow definition comprised of 2 tasks.

+task1:
  td>: queries/daily_open.sql
  create_table: daily_open

+task2:
  td>: queries/monthly_open.sql
  create_table: monthly_open

The + signifies a new task. The text that follows before the : is the name you give the task.

The td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The > signifies that this is where the “action” part of the task is defined – the specific processing to run.

The create_table:___ parameter will do a “Drop table if exists + create table as” operation, creating the new table based on the output of the task’s query.

Run the workflow

Typically, when developing your workflow you will start by editing a workflow from your local machine. You can run & iterate on a workflow of steps that all occur within the TD cloud environment, while creating the workflow definition & execution pattern locally.

Before running your first workflow, we recommend opening up your jobs page so you can see the execution happen live.

This command lets you run the sample workflow once, from your local machine. It will not be scheduled until you push the workflow to Treasure Data.

$ td wf run nasdaq_analysis
Running workflow “nasdag_analysis”...

You’ve run your first workflow!

Optional: See that your workflow executed.

This workflow created two tables, named daily_open & monthly_open in the database workflow_temp.

You can use td toolbelt to see some basic information on the created tables as follows:

$ td table:show workflow_temp daily_open
$ td table:show workflow_temp monthly_open

Submit the workflow to Treasure Data

Now that you’ve created a workflow, you will often want it to run on a scheduled basis. Remember, that we defined the schedule as follows:

timezone: UTC

schedule:
  daily>: 07:00:00

Run this command to submit the workflow to Treasure Data:

$ td wf push nasdaq_analysis
Submitting workflow "nasdaq_analysis"...

That’s it! Now your workflow of steps will run every day at 7am UTC!

List the workflows registered on Treasure Data

$ td wf workflows

You can also see the definition of your submitted workflow, as pulled from Treasure Data

# This command takes the form of:
#   `td wf workflows <project_name> <workflow_name>`
$ td wf workflows nasdaq_analysis nasdaq_analysis

Find out what workflows are scheduled to run next on Treasure Data

$ td wf schedules

Next steps

Learn more about using Treasure Workflows with the following tutorials:

Feedback

If you have any ideas or feedback on this tutorial, we’d welcome them here!


Last modified: Jan 05 2018 01:23:15 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.