Workflows Syntax

You use Treasure Workflow syntax both in the CLI and GUI. You use the syntax in definition files to create workflow of, for example, multiple TD saved queries. This topic describes the core syntax for defining dependencies, scheduling workflows, & setting custom alerts.

Table of Contents

Introduction

Treasure Workflow syntax is based on the open source code, Digdag. The extension for TD workflow definition files is .dig. The filename is the name of the workflow. The definition file format is .yml-like. In the workflow definition file, you specify the tasks and operations to occur. You can also specify schedule, parallel execution, and other parameters.

Tip: When creating workflows from the CLI: You might have to modify your text editor program to write and read .dig files as .yml files. Otherwise, your text editor might not indent your workflow file correctly. yml indentation is 2 spaces, while typically the tab key in text editors is set to 4 or more spaces.

A Basic Workflow

Workflows can be quite simple, defining simple sequential dependencies.

Single dependency Workflow

An example of Treasure Workflow syntax is as follows:

_export:
  td:
    database: workflow_temp

+data_prep:
  td_run>: <replace_with_1st_saved_query_name>

+analysis_and_export:
  td_run>: <replace_with_2nd_saved_query_name>

In the example, the second task “analysis_and_export” is dependent on the first task “data_prep”. As such, the second task won’t run until the first task is completed.

The example includes an export parameter that specifies the Treasure Data database the workflow will run against. The export parameter td calls a script that accesses a specified Treasure Data database.

Let’s take a look at each statement:

_export:
  td:
    database: workflow_temp

The workflow-specific syntax begins the +.

The + signifies a new task. The text that follows the + and before the : is the task name. You can name a task with any name you’d like.

Operators are signified by >, and should be indented by two spaces to the right of the task name. The td_run> is an operator that, in this first example, allows you to run a named saved query from Treasure Data.

You can think of an operator as the “action” part of a workflow task, representing the specific processing to occur.

Untitled-3
When using the 'td_run' operator, you may want to run saved queries with names that contain a space. Surround the text with double quotations in such cases.

Workflow with More Complex Dependencies

Workflows can be complex, managing the running of 100s of parallel tasks at once across multiple compute services.

A slightly more complex dependency pattern is represented in Treasure Workflow code syntax as follows:

2-to-1 dependency Workflow

_export:
  td:
    database: workflow_temp

+data_prep:
  _parallel: true

  +prep_1:
    td_run>: <replace_with_1st_prep_saved_query_name>

  +prep_2:
    td_run>: <replace_with_2nd_prep_saved_query_name> 

+analysis_and_export:
  td_run>: <replace_with_2nd_saved_query_name>

In this example, tasks “data_prep.prep_1” and “data_prep.prep_2” run at the same time. When both tasks have completed, the task “analysis_and_export” runs.

The _parallel parameter enables parallel execution.

At the beginning of the example, the timezone parameter is used to configure the time zone of the workflow and affects session timestamp variables and scheduling.

Scheduling Workflows

Add schedule information to your workflow

To add a schedule to your workflow, add the following text to the top of your workflow file.

timezone: UTC

schedule:
  daily>: 07:00:00

Setting Timezone

The default value is UTC. Specify timezone using tz database time zones. Some examples of valid time zones are: America/Los_Angeles, Europe/Berlin, Asia/Tokyo

Schedule Syntax

You can choose one of following options:

Syntax Description Example
minutes_interval>: M Run this job every M minutes minutes_interval>: 30

This example specifies that the job runs every 30 minutes. For example, if the job started at 6:10 am., then the job runs again at 6:40, 7:10, 7:40 and so on.
hourly>: MM:SS Run this job every hour at this MM:SS

Hourly, +MM mins SS secs
hourly>: 25:00
OR
hourly>:25

Hourly, +25 minutes

This example specifies that the job runs every hour, 25 minutes into the hour. For example, 8:25, 9:25, 10:25 and so on.
daily>: HH:MM:SS Run this job every day at this HH:MM:SS

Daily, @HH:MM:SS AM/PM
daily>: 13:30:00
OR
daily>: 13:30

This example specifies that the job runs every day at 1:30 p.m.

Tip: If you want to run your job at midnight each day, you specify 00:00. If you want to specify 30 minutes past midnight, you enter 00:30. If you want to specify 30 minutes after the noon hour, you enter 12:30.
weekly>: DDD,HH:MM:SS Run this job every week on DDD at HH:MM:SS

Every DDD, @HH:MM:SS AM/PM
weekly>: Sun,09:00:00
OR
weekly>: Sun,09:00
OR
weekly>: Sun,09

This example specifies every week on Sunday, run the job at 9:00 a.m.
monthly>: D,HH:MM:SS Run this job every month on D at HH:MM:SS

every D of month, @HH:MM:SS AM/PM
monthly>: 1,09:15:00
OR
monthly>: 1,09:15

This example specifies on the first day of each month, run the job at 9:15 a.m. If you wanted to specify 9:15 p.m., you type:
monthly>: 1,21:15
cron> CRON Use cron format for complex scheduling cron>: 42 4 1 * *

This example specifies 42 minutes, 4 hours and day 1 of the month.

Tip: You are not required to specify hours, minutes, or seconds (HH, MM or SS). You might even save some processing time if you omit HH, MM and SS. For example, if you specify daily then the job runs once per day. The job runs and then 24 hours later, runs again. If you specify weekly then the job runs once per week. The job runs and then 7 days later, runs again at the same time of day that the job ran initially.

Notifications for Workflows

By default, all workflows will send an email notification upon failure. These failure notifications are sent to the last person who edited the workflow.

There are additional custom notifications you can define in your workflow. They allow you to get notified:

  • At a custom email address when your workflow fails
  • When your workflow succeeds
  • When your workflow takes longer than expected

Workflow Failure Custom Notification

To get an email notification when any part of your workflow fails while running on Treasure Data, add the following at the top of your workflow file.

_error:
  mail>: body.txt
  subject: this workflow failed
  to: [me@example.com]

Workflow Success Notification

To get an email every time your workflow succeeds, simply can add a final task that sends an email upon success.

+success_notification:
  mail>: body.txt
  subject: workflow succeeded!
  to: [me@example.com]

Long Running Workflow Notification

To get notified if you workflow does not complete by a specified time of day, add the following at the top of your workflow file.

sla:
  # triggers this task at 02:00
  time: 09:00
  +notice:
    mail>: ...

Typically, this is used in coordination with a scheduled time. As such, if your workflow starts at 7am, you may want to get notified if it hasn’t completed by 9am.

By default, the time zone used here will be the same one used for scheduling workflows as well.

Untitled-3
In the future we will add functionality to allow the SLA time to be set as number of minutes or hours after the scheduled start time. If you agree with this approach, or would prefer another, please let us know on the related idea.

More Syntax

Because Treasure Data workflow syntax is based on the open source code, Digdag, much of the syntax is the same.

Variables

See the Digdag document Using ${variables}. Particularly, Treasure Workflow uses the following syntax:

  • Calculating Variables
  • Defining Variables

Operators and Plugins

See Operators and Plugins in the DigDag Concepts document. Also, for more detail, see https://docs.digdag.io/operators.html.

Export and store parameters

See Export and Store in the DigDag Concepts document.


Last modified: Jan 10 2018 21:33:40 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.