In this tutorial, using the command line interface, you run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.
Update TD Toolbelt and Install TD Workflow
Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.
Complete the instructions in https://docs.treasurInstalling and Updating the TD Toolbelt and Treasure Agent.
Create the workflow_temp Database
Create the following database in your Treasure Data account.
Run this command using TD Toolbelt.
$ td db:create workflow_temp
The workflow_temp database is created.
Download the Sample Workflow Project
Download your first workflow project directory. The download includes a sample workflow and Presto SQL commands.
Navigate to the download directory. X
$ cd ~/Downloads
Download the sample workflow project.
$ curl -o nasdaq_analysis.zip -L https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip
Extract the project.
$ unzip nasdaq_analysis.zip
Navigate to the workflow project directory:
$ cd nasdaq_analysis
Review the Contents of Your Workflow File
Print the contents of the workflow file.
$ cat nasdaq_analysis.dig
Verify that the workflow that prints is made up of 3 sections, timezone, export and tasks. For example:
In section 1, you see the definition for on what interval the workflow will run
timezone: UTC schedule: daily>: 07:00:00
In section 2, you see how to specify the Treasure Data database for which the workflow will run.
_export: td: database: workflow_temp
In section 3, you see that the workflow definition has two tasks.
+task1: td>: queries/daily_open.sql create_table: daily_open +task2: td>: queries/monthly_open.sql create_table: monthly_open
The +
signifies a new task. The text that follows before the :
is the name you give the task.
The td>
signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The >
signifies that this is where the “action” part of the task is defined—the specific processing to run.
The create_table:___
parameter drops a table if it exists and creates a table that creates the new table based on the output of the task’s query.
Run the Workflow
Typically, when developing your workflow, you start by editing a workflow from your local machine. You can run and iterate on workflow steps that all occur within the TD environment while creating the workflow definition and execution pattern locally.
This workflow creates two tables in the workflow_temp database:
daily_open
monthly_open
Optionally, before running your first workflow, open your TD Console Job Activities page so you can see the execution when it happens.
Run the sample workflow once from your local machine.
$ td wf run nasdaq_analysis
Review the TD Console Job Activities page for nasdag_analysis.
Use the command line to verify that the daily_open table was created as expected:
$ td table:show workflow_temp daily_open
Use the command line to verify that the monthly_open table was created as expected:
$ td table:show workflow_temp monthly_open
Register and Schedule the Workflow
Scheduling workflows to run on a regular basis is a common task. Your workflow already contains the schedule definition.
Review the scheduling syntax in your workflow:
timezone: UTC schedule: daily>: 07:00:00
Register the workflow with Treasure Data.
$ td wf push nasdaq_analysis
The workflow will run every day at 7 am UTC.
List the Workflows Registered with Treasure Data
From the command line you can list all the workflows defined in your Treasure Data environment.
To retrieve a list of projects and workflows, type the following:
$ td wf push nasdaq_analysis
Use the following syntax to see the definition of your submitted workflow:
td workflow workflows <project-name> <name>
For example:
$ td wf workflows nasdaq_analysis nasdaq_analysis