Page tree
Skip to end of metadata
Go to start of metadata

You use Treasure Workflow syntax both in the CLI and GUI. You use the syntax in definition files to create workflow of, for example, multiple TD saved queries. This topic describes the core syntax for defining dependencies, scheduling workflows, and setting custom alerts.

Treasure Workflow syntax is based on the open source code, Digdag. The extension for TD workflow definition files is .dig. The filename is the name of the workflow. The definition file format is similar to YAML. In the workflow definition file, you specify the tasks and operations to occur. You can also specify schedule, parallel execution, and other parameters.

Workflow Code Indentation

When creating workflows from the CLI, you might have to modify your text editor program to write and read .dig files as .yml files. Otherwise, your text editor might not indent your workflow file correctly.

YAML indentation is 2 spaces, while typically the tab key in text editors is set to 4 or more spaces.

A Basic Workflow

Workflows can define simple sequential dependencies.

An example of Treasure Workflow syntax is as follows:

_export:
  td:
    database: workflow_temp

+data_prep:
  td_run>: <replace_with_1st_saved_query_name>

+analysis_and_export:
  td_run>: <replace_with_2nd_saved_query_name>

In the example, the second task “analysis_and_export” is dependent on the first task “data_prep”. As such, the second task won’t run until the first task is completed.

The example includes an export parameter that specifies the Treasure Data database the workflow will run against. The export parameter td calls a script that accesses a specified Treasure Data database.

For example:

_export:
  td:
    database: workflow_temp

The workflow-specific syntax begins the +.

The + signifies a new task. The text that follows the + and before the : is the task name. You can name a task with any name you’d like.

Operators are signified by >, and should be indented by two spaces to the right of the task name. The td_run> is an operator that, in this first example, allows you to run a named saved query from Treasure Data.

You can think of an operator as the “action” part of a workflow task, representing the specific processing to occur.

When using the 'td_run' operator, you may want to run saved queries with names that contain a space. Surround the text with double quotations in such cases.

Workflow with More Complex Dependencies

Workflows can be complex, managing the running of 100s of parallel tasks at once across multiple compute services.

A slightly more complex dependency pattern is represented in Treasure Workflow code syntax as follows:

_export:
  td:
    database: workflow_temp

+data_prep:
  _parallel: true

  +prep_1:
    td_run>: <replace_with_1st_prep_saved_query_name>

  +prep_2:
    td_run>: <replace_with_2nd_prep_saved_query_name> 

+analysis_and_export:
  td_run>: <replace_with_2nd_saved_query_name>

In this example, tasks “data_prep.prep_1” and “data_prep.prep_2” run at the same time. When both tasks have completed, the task “analysis_and_export” runs.

The _parallel parameter enables parallel execution.

At the beginning of the example, the timezone parameter is used to configure the time zone of the workflow and affects session timestamp variables and scheduling.

More Syntax

Because Treasure Data workflow syntax is based on the open source code, Digdag, much of the syntax is the same.

Variables

See the Digdag document Using ${variables}. Particularly, Treasure Workflow uses the following syntax:

  • Calculating Variables

  • Defining Variables

Operators and Plugins

See Operators and Plugins in the Digdag Concepts document. Also, for more detail, see https://docs.digdag.io/operators.html.

Export and Store Parameters

See Export and Store in the Digdag Concepts document.


  • No labels