Review these Treasure Workflow prerequisites and limitations to understand what you can and cannot accomplish with workflows.
Before you start creating a Treasure Workflow, you must have a database and table in Treasure Data.
Treasure Workflow is based on Digdag, but following features are not supported:
td> operator's download_file parameter for downloading query results locally. Instead, use the Treasure Data result export functionality
sh> for running shell scripts
rb> for running ruby scripts
embulk> for running arbitrary Embulk jobs (but you can use td_load> for importing bulk data into Treasure Data)
emr> for running Amazon EMR jobs
param_get> for getting persistent data from ParamServer and set it as a value of store parameters
param_set> for setting a value into a ParamServer as persistent data
py> is the only supported custom script.
TD Workflow is designed to provide a scalable and flexible solution for managing your data pipelines in a cloud-hosted environment.
To ensure optimal and fair processing, the following limits exist:
A maximum of 30 tasks are running concurrently per account. All other tasks get queued and are issued on a first-come-first-served basis.
The maximum size of a project archive is 10 MB.
12,000 Saved Workflows. Over 12,000, and not all workflows will load within our Workflow UI. They are still accessible via CLI.
The maximum total number of tasks in an attempt is 1,000.
The maximum length of a task's full name is 640 bytes.
Maximum attempts per account are:
US - 200
Tokyo - 200
EU - 300
All other regions - 100
The period of time until a running task is killed is 1 day (24 hours).
The period of time until a running attempt is killed is 7 days.
The maximum response size for a td> task is 4 MB.
The maximum response size for a http> task is 1 MB.
The maximum response size for a http_call> task is 2 MB.
The maximum output size for a py> task (Custom Scripts), including exported variables and any generated tasks, is 36 MB.
The total response size for a td_for_each> task is 16 MB.
For better performance of the TD Console Workflow, try to stay below the following limits:
200 Saved Workflows
400 Tasks in a Workflow
These limits are subject to change if there are changes to the configuration of Treasure Data capacity.