Visit our new documentation site! This documentation page is no longer updated.

Treasure Workflow

Use Treasure Data Workflow to build repeatable data processing pipelines. You can schedule and manage complex tasks, automatically run, and monitor your job flows. Treasure Workflow extends and enhances the capabilities of the highly reputable open source workflow program, Digdag.

Untitled-3
On March 27th we are releasing General Availability our new Workflow User Interface, during which this new user interface will be made available for all accounts. On March 1st, we are turning on the new UI for all current Workflow users.

Read more about what is changing and what to expect during General Availability of Workflow UI.

Table of Contents

Introduction

Workflow is a key aspect of Treasure Data’s CDP. You create workflows to run efficient queries against your customer data and schedule tasks that feed into audience identification, profiling, and tracking.

Integrate with and organize your organization’s data, run SQL analysis across that data regardless the scale, and then create repeatable insight by saving queries that disseminate data.

Features and Benefits

Treasure Workflow allows you to do the following:

  • Create a workflow, which defines the order in which processing tasks will run
  • Design with scheduled processing flows in mind.
  • Parameterize for easy cloning, sharing, & re-use
  • Develop locally, push to Treasure Data to run on a scheduled basis
  • Manage error handling more easily
  • Configure tasks that can operate nearly every part of the TD system, including:
    • Importing data batch jobs using Data Connector
    • Running Presto & Hive queries
    • Create or append to tables
    • Result export to other systems

With Treasure Data, you can improve your ability to create internal Data Applications and gain the following benefits:

Organizing your team’s work

As your number of scheduled queries or CRON jobs increase, it becomes harder for organizations to keep track of “what is this one doing?”. Being able to define tasks into organized workflows and projects allows you to immediately know the context that a given query is operating in.

Managing error handling and establishing automated notifications

Many times we see significantly large queries and scripts operating in our customer’s systems. These can be 100s, or even 1000s, of lines long. When errors occur in these SQL queries, it can be incredibly difficult to debug. By breaking your large queries into workflows of smaller dependent queries, it becomes much easier to figure out which part of your logic has broken.

You can to receive notifications when any part of your workflow fails, and thus quickly fix. You can also specify to receive notification of successful workflow runs, or workflow runs that do not complete within specified time boundaries.

Reducing end-to-end latency

By ensuring that dependencies between steps are properly kept, you can create processing pipelines for live data use cases such as reducing KPI updates from daily to hourly to more frequently.

Improving Collaboration and Re-Usability

Parameterization is deeply embedded within Treasure Workflow, so that, as an analyst, you can create a reusable workflow template. You can use your template workflow for future, additional analysis. Stop re-creating SQL statements for similar requests, and start templating your work for easier re-use.

Also, by organizing your queries, it becomes much easier to onboard new employees into your organization or into an ongoing project. You can use Treasure Workflow to group tasks together. New collaborators can more quickly understand the general “why” of a query before digging into its specific logic.

Command line interface and User interface

Work in your preferred environment. You can access Treasure Workflow from a command line interface or from with the Treasure Workflow UI

CLI UI
‘$ mkdir wf_of_saved_queries

*//creates a local directory in which you can create your workflow. When you are ready to push your workflow to Treasure Data, you can also create a project folder. *//


And


$ cat > saved_queries.dig

*//creates a workflow definition file; in Treasure Data, the workflow is a .dig file *//‘
Specify Workflow Name, first.


Then


Accept the default or specify a Project name, as a place to store all files associated with your workflow. Several workflows can be saved in a project.
  Select a Workflow Template. A blank template is an empty workspace in which you can enter SQL or another type of script. Treasure Data provides ‘starter’ templates, with placeholder text as well.
*//Enter the content of the workflow file *//

_export:
    td:
        database: workflow_temp

+data_load_task:
    td_load>:

+query_task:
    td_run>:
Enter SQL or another type of script in the edit box for the workflow definition file.

_export:
    td:
        database: workflow_temp

+data_load_task:
    td_load>:

+query_task:
    td_run>:
*//Optional: you can create a project folder for your workflows.
Use the command: `td wf workflows `*//
Click Save & Commit.
Click Run.
Specify the Session. A session specifies the date of the data. The workflow is run against the session.
$ td wf run saved_queries Click Run.

Implement both locally and in the cloud

You have a variety of development approaches that you can take as you develop workflows.

Develop locally with TD CLI > Push into Treasure Data > Manage in Treasure Data GUI

It’s not unusual to create workflows in your local environment and run the same workflows in Treasure Data’s environment. You can store your data in Treasure Data’s cloud-based database and query the data either locally or from within the Treasure Data platform. You might want to create queries and workflows in the cloud but perform analysis, using in house tools, locally. Treasure Data makes it easy for you to move between the two interface options with ease and continuum.

Develop locally with TD CLI > Manage on Github > Autodeploy to Treasure Data GUI to view and monitor

Refer to Continuous Deployment of Workflow definitions from GitHub to Treasure Data

Develop, view and manage in Treasure Data Console and Workflow UI

Refer to the Quick Start using the Workflow UI and Quick Start using the Workflow CLI

What you must know and do

Run through the QuickStart guide to set up and complete your first workflow. Review workflow syntax in order to understand how to configure tasks and build repeatable workflows.

QuickStart

You can quickly set up your workflow environment and create a workflow either from the command line interface or the workflow user interface (GUI).

Syntax

You use a set of code pieces repeatedly when building and managing workflows: Refer to Syntax.

How to use Treasure Workflow

You can learn how to use Treasure Workflow using the documentation links below:

Building Workflows of TD Processing Steps

Workflow Functionality

Managing Submitted Workflows to Treasure Data

Experimental Functionality

Digdag vs Treasure Workflow

Treasure Workflow currently allows for most of the functionality that Digdag, the underlying open source project, allows. But, there are a few exceptions. The following Digdag operators and functionality are not yet enabled when you submit workflows to Treasure Workflow cloud environment:

First, you can not currently run any arbitrary code scripts. These include:

  • sh> for running shell scripts
  • py> for running python scripts
  • rb> for running ruby scripts

Additionally, the following options are not allowed because shared processing and local disk are used:

  • embulk> for running arbitrary embulk jobs (but you can use td_load> for importing bulk data into Treasure Data)
  • download_file: parameter with the td> & other operators for downloading files locally. Instead you can use the normal Treasure Data result export functionality

We are considering adding the functions to our hosted version of Digdag, Treasure Workflow. If you are interested in the functions, let us know!

Feedback and Feature Requests

We look forward to hearing your ideas on how to improve Treasure Workflows. You can submit your ideas by speaking reaching out to us the private beta Slack channel, or by submitting it on the Treasure Workflows Idea Forum.

Support

If you a questions, contact our support team.


Last modified: Feb 22 2018 04:45:18 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.