Custom Scripts enable the running of containerized Python scripts from within a Treasure Data Workflow, providing for greater flexibility of custom logic. Typical uses include:
Extend the capabilities of data connectors and other integrations.
Create efficient data manipulation and processing logic in Python and invoke it from workflows.
Productionize your data science work, by enabling Python models to be run as part of regularly scheduled Treasure Workflows.
Consolidate your data management into one environment. Use Treasure Workflow to connect multiple data environments.
Also in this article:
Supported Docker Images
- Digdag-python:3.9 (See migration instructions)
- Python 3.9
- pytd version "1.4.0"
- td-pyspark 20.12.0
Digdag-python:3.7 (Upgrade recommended)
pytd version "0.5.0"
pytd version "0.3.0"
- Digdag-python:3.6.8-stretch (Deprecated)
Python version 3.6.8
Example Treasure Workflow Custom Script Syntax
The following snippet is an example from a workflow:
Installing Your own Python Libraries
The Python scripts in TD Workflows are managed and run by Treasure Data in isolated Docker containers. Treasure Data provides a number of base Docker images to run in the container.
In addition to the libraries provided by the Docker image, you can install additional 3rd party libraries using the pip install command within the Python script.
You can pick the appropriate Docker image to run your Python script in, based on the Python version and libraries supported by the image.
From within your Python script, add the following syntax to install libraries from the Python script:
Deprecation of Old Images
- digdag/digdag-python:3.7 (current stable)
- digdag/digdag-python:3.6.8-stretch (deprecated)
- digdag/digdag-python:3.7.3-stretch (deprecated)
digdag-python:3.7 is the existing stable version and most frequently used image. Because digdag-python:3.7 is used by lots of users, we set a migration period of at least 6 months and are tracking a customer's migration process. We strongly recommend, however, that customers migrate to the new rootless docker images to reduce potential security risks.
The old stretch images (2 and 3) are soon moving towards End of Life (EoL), so if you are using either one, we strongly recommend migrating to the new rootless docker image.
Links to Other Articles
You can use environment variables to pass parameters and credentials to the Custom Script using _env.
Multiple Python scripts can be run in parallel within a workflow, using the _parallel operator.
To walk through a complete Custom Scripts tutorial.
Period of time until an executed custom script is killed is 1 day.