Grouping tasks can be helpful as a way to organize the business logic of a workflow, so it’s easier for other teammates to understand your intention, and for enabling parallel execution of certain tasks in a workflow.
Grouping tasks is useful for enabling parallel execution and for organizing a workflow into similar steps that represent a part of your data flow being executed.
For example, you might organize many of your workflows into the following groups:
Ingestion
Data Preparation
Analysis
Export
In TD Workflow, you can have tasks run in parallel. By default, tasks are run sequentially. To have TD Workflow tasks run in parallel, you must specify the parallel parameter as _parallel: True. It is also recommended that you use the +group syntax to group the tasks that you want to have run in parallel.
You can define as many tasks as you want to run in parallel, however, TD can only run up to 10 separate processing threads at a given time. Tasks can have one of four different states, only one of which is Running. As long as only ten tasks have concurrent states of running each of those tasks is executed in parallel.
Read or complete the TD Workflows example to understand the context of the following information. As a reminder, here is the workflow from that example. In this workflow, task1 executes first followed by task2.
We are using the suffix '.dig' for our YAML-like workflow configuration files. Your text editor might not automatically color and indent your workflow file correctly. YAML indentation is 2 spaces, while typically it's automatically set to 4. Most text editor programs allow you to set .dig to automatically be written and read like a YAML file. We recommend that you make that modification.
_export:
td:
database: workflow_temp
+task1:
td>: queries/daily_open.sql
create_table: daily_open
+task2:
td>: queries/monthly_open.sql
create_table: monthly_openThis workflow executes the tasks in a top-to-bottom sequential order.
Let's create a group task, a task that consists of other sub-tasks. Grouping of tasks is done by indenting the subtasks under a label in your workflow.
Open your workflow dig file for editing or use the TD Console Workflow editor.
Locate the task or tasks you would like to have grouped.
Add the following syntax to your workflow:
+groupname:
Where groupname can be any name that you want to use for the grouping.
- Add or indent existing task syntax under
+groupname>:. For example,my_group_taskhastask1andtask2indented 2 spaces to indicate that the tasks are within this new group task.
_export:
td:
database: workflow_temp
+my_group_task:
+task1:
td>: queries/daily_open.sql
create_table: daily_open
+task2:
td>: queries/monthly_open.sql
create_table: monthly_open
+output:
td_run>: <place the name of a saved query here> We’ve also added a final task, output, which is not part of the my_group_task.
In TD Workflow, you can have tasks run in parallel. By default, tasks are run sequentially. To have TD Workflow tasks run in parallel, you must specify the parallel parameter as _parallel: True.
Open your workflow dig file for editing or use the TD Console Workflow editor.
Locate the task or tasks you would like to have grouped.
Add the following syntax to your workflow:
+groupname:
Wheregroupnamecan be any name that you want to use for the grouping.Directly under the
groupnameadd the following parallel parameter syntax:
_parallel: True - Add or indent existing task syntax. For example,
my_group_task hastask1andtask2indented 2 spaces to indicate that the tasks are within this parallel group task.
_export:
td:
database: workflow_temp
+my_group_task:
_parallel: true
+task1:
td>: queries/daily_open.sql
create_table: daily_open
+task2:
td>: queries/monthly_open.sql
create_table: monthly_open
+output:
td_run>: REPLACE_WITH_YOUR_QUERY_NAMEAll the tasks in my_group_task run in parallel, followed by the outputtask.