Skip to content
Last updated

Grouping Tasks in Workflow and Enabling Parallel Execution

Grouping tasks can be helpful as a way to organize the business logic of a workflow, so it’s easier for other teammates to understand your intention, and for enabling parallel execution of certain tasks in a workflow.

Grouping tasks is useful for enabling parallel execution and for organizing a workflow into similar steps that represent a part of your data flow being executed.

For example, you might organize many of your workflows into the following groups:

  • Ingestion

  • Data Preparation

  • Analysis

  • Export

In TD Workflow, you can have tasks run in parallel. By default, tasks are run sequentially. To have TD Workflow tasks run in parallel, you must specify the parallel parameter as _parallel: True. It is also recommended that you use the +group syntax to group the tasks that you want to have run in parallel.

You can define as many tasks as you want to run in parallel, however, TD can only run up to 10 separate processing threads at a given time. Tasks can have one of four different states, only one of which is Running. As long as only ten tasks have concurrent states of running each of those tasks is executed in parallel.

Read or complete the TD Workflows example to understand the context of the following information. As a reminder, here is the workflow from that example. In this workflow, task1 executes first followed by task2.

We are using the suffix '.dig' for our YAML-like workflow configuration files. Your text editor might not automatically color and indent your workflow file correctly. YAML indentation is 2 spaces, while typically it's automatically set to 4. Most text editor programs allow you to set .dig to automatically be written and read like a YAML file. We recommend that you make that modification.

_export:
  td:
    database: workflow_temp
    
+task1:
  td>: queries/daily_open.sql
  create_table: daily_open
    
+task2:
  td>: queries/monthly_open.sql
  create_table: monthly_open

This workflow executes the tasks in a top-to-bottom sequential order.

Grouping Tasks in a Workflow

Let's create a group task, a task that consists of other sub-tasks. Grouping of tasks is done by indenting the subtasks under a label in your workflow.

  1. Open your workflow dig file for editing or use the TD Console Workflow editor.

  2. Locate the task or tasks you would like to have grouped.

  3. Add the following syntax to your workflow:
    +groupname:

Where groupname can be any name that you want to use for the grouping.

  1. Add or indent existing task syntax under +groupname>:. For example, my_group_task has task1 and task2 indented 2 spaces to indicate that the tasks are within this new group task.
_export:
  td:
    database: workflow_temp
    
+my_group_task:
  +task1:
    td>: queries/daily_open.sql
    create_table: daily_open
    
  +task2:
    td>: queries/monthly_open.sql
    create_table: monthly_open
    
  +output:
    td_run>: <place the name of a saved query here>  

We’ve also added a final task, output, which is not part of the my_group_task.

Enabling Parallel Task Execution for a Workflow

In TD Workflow, you can have tasks run in parallel. By default, tasks are run sequentially. To have TD Workflow tasks run in parallel, you must specify the parallel parameter as _parallel: True.

  1. Open your workflow dig file for editing or use the TD Console Workflow editor.

  2. Locate the task or tasks you would like to have grouped.

  3. Add the following syntax to your workflow:
    +groupname:
    Where groupname can be any name that you want to use for the grouping.

  4. Directly under the groupname add the following parallel parameter syntax:

_parallel: True  
  1. Add or indent existing task syntax. For example, my_group_task has task1 and task2indented 2 spaces to indicate that the tasks are within this parallel group task.
_export:
  td:
    database: workflow_temp
+my_group_task:
  _parallel: true
  +task1:
    td>: queries/daily_open.sql
    create_table: daily_open
  +task2:
    td>: queries/monthly_open.sql
    create_table: monthly_open
+output:
  td_run>: REPLACE_WITH_YOUR_QUERY_NAME

All the tasks in my_group_task run in parallel, followed by the outputtask.