Custom Scripts enable the running of containerized Python scripts from within a Treasure Data Workflow, providing for greater flexibility of custom logic. Typical uses include:

  • Extend the capabilities of data connectors and other integrations.

  • Create efficient data manipulation and processing logic in Python and invoke it from workflows.

  • Productionize your data science work, by enabling Python models to be run as part of regularly scheduled Treasure Workflows.

  • Consolidate your data management into one environment. Use Treasure Workflow to connect multiple data environments.

Also in this article:

Supported Docker Images

  • Digdag-python:3.9 (Current stable)
    • Python 3.9
    • pytd version "1.4.0"
    • td-pyspark 20.12.0 

Example Treasure Workflow Custom Script Syntax

The following snippet is an example from a workflow:

    py>: tasks.printMessage
      image: "digdag/digdag-python:3.9"

Installing Your own Python Libraries

The Python scripts in TD Workflows are managed and run by Treasure Data in isolated Docker containers. Treasure Data provides a number of base Docker images to run in the container.

In addition to the libraries provided by the Docker image, you can install additional 3rd party libraries using the pip install command within the Python script.

You can pick the appropriate Docker image to run your Python script in, based on the Python version and libraries supported by the image.

From within your Python script, add the following syntax to install libraries from the Python script:

os.system(f"{sys.executable} -m pip install asn1") 

Links to Other Articles



Passing parameters to Custom Scripts

You can use environment variables to pass parameters and credentials to the Custom Script using _env.

Executing custom script tasks in parallel within a workflow

Multiple Python scripts can be run in parallel within a workflow, using the _parallel operator.

Python Custom Scripting

To walk through a complete Custom Scripts tutorial.

Treasure Workflow Service Limits

Period of time until an executed custom script is killed is 1 day.

  • No labels