Visit our new documentation site! This documentation page is no longer updated.

Grouping Tasks And Enabling Parallel Execution

Table of Contents

Introduction

In this tutorial we will create groups of workflow tasks & enable parallel execution.

Grouping tasks can be helpful as a way to organize the business logic of a workflow, so it’s easier for other teammates to understand your intention, and for enabling parallel execution of certain tasks in a workflow.

Pre-requisites

Introductory Tutorial

If you haven’t already, please start by reading the TD Workflows introductory tutorial. This will provide you the context needed to understand the lesson below.

Tutorial

Review Workflow

As a reminder, here is the workflow from the introductory tutorial. In this workflow, task1 executes first followed by task2.

_export:
  td:
    database: workflow_temp

+task1:
  td>: queries/daily_open.sql
  create_table: daily_open

+task2:
  td>: queries/monthly_open.sql
  create_table: monthly_open

This workflow will execute the tasks in a top-to-bottom sequential order. But, what if we wanted to run these tasks in parallel?

Step 1: Grouping Tasks

Here, we will create a “group task”, a task that consists of other sub-tasks. In the example below, you will see it named as my_group_task, with the original tasks, task1 and task2 indented 2 spaces to indicate they are within this new group task.

We’ve also added a final task, output, which is not part of the my_group_task.

_export:
  td:
    database: workflow_temp

+my_group_task:
  +task1:
    td>: queries/daily_open.sql
    create_table: daily_open

  +task2:
    td>: queries/monthly_open.sql
    create_table: monthly_open

+output:
  td_run>: <place the name of a saved query here>

Grouping tasks is incredibly useful for enabling parallel execution and for organizing a workflow into similar steps that represent a part of your data flow being executed.

For example, you might organize many of your workflows into the following groups:

  • Ingestion
  • Data Preparation
  • Analysis
  • Export

Step 2: Enable Parallel Execution

For every “group task” there is a hidden digdag parameter called _parallel. This parameter is set to False by default, but can be set to True as shown below.

_export:
  td:
    database: workflow_temp

+my_group_task:
  _parallel: True

  +task1:
    td>: queries/daily_open.sql
    create_table: daily_open

  +task2:
    td>: queries/monthly_open.sql
    create_table: monthly_open

+output:
  td_run>: <place the name of a saved query here>

And, that’s it! If you change your initial workflow as shown above, you will now have all the tasks in my_group_task run in parallel, followed by the output task.

Untitled-3
As we are using the suffix '.dig' for our yml-like workflow configuration files, **your text editor may not automatically color & indent your workflow file correctly**. yml indentation is 2 spaces, while typically it's automatically set to 4. Most text editor programs will allow you to set '.dig' to automatically be written & read like a '.yml' file. We recommend you to make that modification.

Feedback

We would love to hear your feedback! Please share your thoughts on our TD Workflows ideas forum.

Also, if you have any ideas or feedback on the tutorial itself, we’d welcome them here!


Last modified: Feb 18 2017 04:40:27 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.