Visit our new documentation site! This documentation page is no longer updated.

Treasure Workflow Quick Start Tutorial for the GUI

Table of Contents

Introduction

In this tutorial, using the graphical user interface, you run a workflow that consists of two Treasure Data-provided jobs, one that runs right after the other.

Prerequisite:

  • A test database, ‘workflow_temp,’ is a sample database provided by Treasure Data
  • Optional: Saved queries that can run against the test database

Access the Workflow space in the Treasure Data console

Access the console, click Workflow from the navigation on the left side of the console.

Create your first workflow and project

Click Create & Edit



Enter a workflow name, for example, “mywftest“

In the Project field, you can use the generated project name or change the project name. A project is a container for workflows and files used by a set of workflows.

Tip: Include “prj“ in the project name to quickly identify it in the console and workflow interface.

Select Simple Workflow as the template workflow.



For this tutorial, you do not need to edit the template. The database is provided by Treasure Data, and serves as a place to experiment with data, queries and workflows. The workflow template also provides sample queries that you can use for your first workflow. The provided queries are not saved.

Optionally, you can edit the template to any applicable saved queries that you create.

Untitled-3
Saved queries are not stored in a workflow project. Saved queries are stored in the console and are viewable on the Queries pane.

Check out the contents of your workflow definition file

The template that you selected in this tutorial, accomplishes the following actions:

Specifies Treasure Data database the workflow will run against

_export:
  td:
    database: workflow_temp

Runs the query that is provided by Treasure Data. The daily_open query aggregates the average opening and closing price of Nasdaq stocks, per day, as captured in the sample_datasets.nasdaq provided by Treasure Data.

+daily:
  td>: queries/daily_open.sql

Executes a “Drop table if exists + create table as” operation, creating the new table based on the output of the task’s query. The table that is created contains the aggregated price.

create_table: daily_open

Similarly, the second specified task, runs a query that is provided by Treasure Data. The monthly_open query aggregates the average opening and closing price of Nasdaq stocks, per month.

+monthly:
  td>: queries/monthly_open.sql

Creates a table that contains the aggregated price

create_table: monthly_open

The + signifies a new task. The text that follows before the : is the name you give the task.

The td> signifies that the query that follows will run in the Treasure Data environment. This is automatically set to run a Presto query. The > signifies that this is where the “action” part of the task is defined – the specific processing to run.

Edit the workflow template

Click Create & Edit. In the next pane, you can make additional edits to the workflow.

  • You can enter syntax, for example to add notifications and specify schedules.
  • You can add parameters, for example to specify databases and tables to include as part of the workflow.
  • You can add resources files to the workflow project.

For examples of syntax that you can include in your workflow, see Workflow samples.

However, in this tutorial, you are not required to edit the workflow definition.

Click Save & Commit to save the workflow definition and commit all content to the Project.



In the next pane, you can see that your workflow is saved.



Run the workflow

You have the option to run the workflow that you just created, by clicking ‘New Run’, or you can customize the workflow by specifying a session. The default session is the current day and time.

If you click ‘Customize Details’, you see your current workflow instance listed. You can see the workflow revision number. Each workflow instance (each time you make a change in a project and click Save & Commit) is called a ‘revision’. You can specify a session, a time against which the workflow runs. A change of session does not change the workflow revision. You are running your current workflow revision against different session time.

Untitled-3
More information about sessions: A session time is required. The session ensures all queries within a workflow are acting against the same data. Sessions do not interfere with specified timestamps in a query. In this quickstart tutorial, the provided SQL contains the variable TD_TIME_RANGE. Timestamps in the query are applied.

Combined with the workflow definition, a session uniquely defines a workflow by identifying the data set that the workflow acts upon. Treasure Data ensures integrity by not permitting a workflow to have more than one session.

If you choose to customize, enter session information and then click Run.



You ran your first workflow!

Optional: See the output of the tutorial workflow

This workflow created two tables, named daily_open & monthly_open in the database workflow_temp.

You can go to the database view in the console and view the created tables.

Review your workflow

You can view:

  • run log – you can view as the workflow is running and after each run and view workflow details such as:
  • session – date range of the data used in the workflow
  • workflow definition – queries, tasks
  • project – files associated with the workflow or shared within the project

The following tasks describe how to view information about the workflow you created:

  • Access the workflow-level run log Use search or scroll to your saved workflow and project. Click to open. Click Run History View the color coded dot in the first column for a quick status check Optional: click on run row to open a detailed view
  • View the session specified Use search or scroll to your saved workflow and project. Click to open. Click Run History The session information is in the second column. You can also click on the job run and in the next window, you can see more information about the session. *View the workflow definition file and project files Use search or scroll to your saved workflow and project. Click to open. Click Workflow Definition Optional: click the edit pen icon to view the project files

Reruns, Deletes, Scheduling, and other Tips

Typically, when developing your workflow you iterate, perhaps running each task separately to verify outcome before adding the next task step.

Workflows run as jobs in the Treasure Data console. As you run workflows, you can view the status of jobs as the run in the workflow Run History page or on separate tab of your browser you can open the Treasure Data console, and open the Jobs page so you can see the execution happen live.

Rerun a workflow

The steps to rerun a workflow are as follows:

  • Use search or scroll to your saved workflow and project. Click to open.
  • Click Run History
  • Place your cursor on one of the run rows, and you can see the “Run’ icon next to workflows that can be run again
  • Click Run

You also rerun just a part of a workflow

You might have multiple queries as part of a workflow. You can select the query that you want to rerun without having to run the entire workflow.

Delete a workflow

The steps to delete a workflow are as follows:

  • Use search or scroll to your saved workflow and project. Click to open.
  • Click Workflow Definition and click the … (More) icon.
  • Click the trash icon.

Editing a workflow that previously ran

When you edit a workflow, you are creating a new revision of the workflow. The steps to edit a workflow definition file and project files are as follows:

  • Use search or scroll to your saved workflow and project. Click to open.
  • Click Workflow Definition
  • Edit the definition file to add task or queries, and add project files
  • Click Save & Commit
  • Click New Run

You have the option to change the session before running the edited workflow.

Tip: You can manually copy the content of the workflow definition file into the newly created workflow. Keep this newly created workflow in the same project if you want to use the same project files.

Reviewing all workflows in a project

Currently, You cannot sort workflows by project. You select a workflow and then can view the project associated with the selected workflow.

Schedule a workflow

You will often want to run a workflow on a scheduled basis. The following is an example of the syntax you add to your workflow definition.

Specify the timezone in which the workflow will run:

timezone: UTC

Specify the frequency of the workflow run:

schedule:
  daily>: 07:00:00

Click Save & Commit. Now your workflow will run every day at 7am UTC.

Untitled-3
When you add a schedule to your workflow, you might receive an error when you attempt to click **New Run** to run the workflow immediately. The workflow must run at the specified scheduled time.

List the workflows registered on Treasure Data

In the console, go to the workflow page and you can see the list of workflows.

Find out what workflows are scheduled to run next on Treasure Data

In the console, go to the workflow page and click the Next Run column. The list of workflows will reorder according to scheduled workflow..

Next steps

To learn more about using Treasure Workflows, you can complete tutorials:

Feedback

If you have any ideas or feedback on the tutorial itself, we’d welcome them here!


Last modified: Mar 15 2018 01:59:55 UTC

If this article is incorrect or outdated, or omits critical information, let us know. For all other issues, access our support channels.