In this tutorial, using the command line interface, you run your first workflow of two Treasure Data Presto jobs in a sequence.
You can use TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands from your terminal.
- View and complete the instructions in Installing and Updating TD Toolbelt and Treasure Agent.
Create the following database in your Treasure Data account.
- Run this command using TD Toolbelt.
$ td db:create workflow_tempThe workflow_temp database is created.
Download your first workflow project directory. The download includes a sample workflow and Presto SQL commands.
- Navigate to the download directory.
cd /Downloads- Copy and paste the following code to download the sample workflow project.
curl -o nasdaq_analysis.zip -L \
https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip- The sample workflow project is compressed into a zip file. Use the following code to extract the project.
unzip nasdaq_analysis.zip- Navigate to the workflow project directory.
cd nasdaq_analysis- Review the contents of the workflow file.
cat nasdaq_analysis.dig- Verify that the workflow contains three sections: timezone , export , and tasks.
- See the timezone and the schedule in which the workflow will run.
timezone: UTC
schedule:
daily>: 07:00:00Review the code that defines where the workflow data will export.
_export:
td:
database: workflow_tempThe workflow has two tasks:
- Perform a daily query that creates a table named daily_open.
- Perform a monthly query that creates a table named monthly_open.
+task1:
td>: queries/daily_open.sql
create_table: daily_open
+task2:
td>: queries/monthly_open.sql
create_table: monthly_openThe + signifies a new task. The text that follows before the : is the name you give the task.
The td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The > signifies that this is where the “action” part of the task is defined—the specific processing to run.
The create_table: parameter drops a table if it exists and creates a new table based on the output of the task’s query.
Typically, when developing your workflow, you start by editing a workflow from your local machine. You can run and iterate on workflow steps that all occur within the TD environment while creating the workflow definition and execution pattern locally.
This workflow creates two tables in the workflow_temp database:
daily_open
monthly_open
Optionally, before running your first workflow, open your TD Console Job Activities page so you can see the execution when it happens.
Run the sample workflow once from your local machine.
td wf run nasdaq_analysisReview the TD Console Job Activities page for nasdag_analysis.
Use the command line to verify that the daily_open table was created as expected:
td table:show workflow_temp daily_open- Use the command line to verify that the monthly_open table was created as expected:
td table:show workflow_temp monthly_openScheduling workflows to run on a regular basis is a common task. Your workflow already contains the schedule definition.
- Review the scheduling syntax in your workflow:
timezone: UTC
schedule:
daily>: 07:00:00- Register the workflow with Treasure Data.
td wf push nasdaq_analysisThe workflow will run every day at 7 am UTC.
From the command line, you can list all the workflows defined in your Treasure Data environment. You can use workflow or wd interchangeably.
- To retrieve a list of projects and workflows, type the following:
td wf push nasdaq_analysis- Use the following syntax to see the definition of your submitted workflow:
td workflow workflows <project-name> <name>For example:
td wf workflows nasdaq_analysis nasdaq_analysis