Page tree
Skip to end of metadata
Go to start of metadata

In this tutorial, using the command line interface, you run your first workflow of two Treasure Data Presto jobs, one that runs right after the other.


Update TD Toolbelt and Install TD Workflow

Use the TD Toolbelt to interact with Treasure Data’s many services. If not already installed and configured, complete the following commands in your terminal.

  1. Complete the instructions in Installing and Updating.

Create the workflow_temp Database

Create the following database in your Treasure Data account.

  1. Run this command using TD Toolbelt.

    $ td db:create workflow_temp

The workflow_temp database is created.

Download the Sample Workflow Project

Download your first workflow project directory. The download includes a sample workflow and Presto SQL commands.

  1. Navigate to the download directory. X

    $ cd ~/Downloads
  2. Download the sample workflow project.

    $ curl -o nasdaq_analysis.zip -L https://gist.github.com/danielnorberg/f839e5f2fd0d1a27d63001f4fd19b947/raw/d2d6dd0e3d419ea5d18b1c1e7ded9ec106c775d4/nasdaq_analysis.zip
  3. Extract the project.

    $ unzip nasdaq_analysis.zip 
  4. Navigate to the workflow project directory:

    $ cd nasdaq_analysis

Review the Contents of Your Workflow File

  1. Print the contents of the workflow file.

    $ cat nasdaq_analysis.dig
  2. Verify that the workflow that prints is made up of 3 sections, timezone, export and tasks. For example:

In section 1, you see the definition for on what interval the workflow will run

timezone: UTC

schedule:
  daily>: 07:00:00

In section 2, you see how to specify the Treasure Data database for which the workflow will run.

_export:
  td:
    database: workflow_temp

In section 3, you see that the workflow definition has two tasks.

+task1:
  td>: queries/daily_open.sql
  create_table: daily_open

+task2:
  td>: queries/monthly_open.sql
  create_table: monthly_open

The + signifies a new task. The text that follows before the : is the name you give the task.

The td> signifies that the query that follows will run against Treasure Data. This is automatically set to run a Presto query. The > signifies that this is where the “action” part of the task is defined—the specific processing to run.

The create_table:___ parameter drops a table if it exists and creates a table that creates the new table based on the output of the task’s query.

Run the Workflow

Typically, when developing your workflow, you start by editing a workflow from your local machine. You can run and iterate on workflow steps that all occur within the TD environment while creating the workflow definition and execution pattern locally.

This workflow creates two tables in the workflow_temp database:

  • daily_open

  • monthly_open

  1. Optionally, before running your first workflow, open your TD Console Job Activities page so you can see the execution when it happens.

  2. Run the sample workflow once from your local machine.

    $ td wf run nasdaq_analysis
  3. Review the TD Console Job Activities page for nasdag_analysis.

  4. Use the command line to verify that the daily_open table was created as expected:

    $ td table:show workflow_temp daily_open
  5. Use the command line to verify that the monthly_open table was created as expected:

    $ td table:show workflow_temp monthly_open

Register and Schedule the Workflow

Scheduling workflows to run on a regular basis is a common task. Your workflow already contains the schedule definition.

  1. Review the scheduling syntax in your workflow:

    timezone: UTC
    
    schedule:
      daily>: 07:00:00
  2. Register the workflow with Treasure Data.

    $ td wf push nasdaq_analysis

The workflow will run every day at 7 am UTC.

List the Workflows Registered with Treasure Data

From the command line you can list all the workflows defined in your Treasure Data environment.

  1. To retrieve a list of projects and workflows, type the following:

    $ td wf push nasdaq_analysis
  2. Use the following syntax to see the definition of your submitted workflow:

    td workflow workflows <project-name> <name>


For example:

$ td wf workflows nasdaq_analysis nasdaq_analysis
  • No labels