Page tree
Skip to end of metadata
Go to start of metadata

Custom Scripts enable the running of containerized Python scripts from within a Treasure Data Workflow, providing for greater flexibility of custom logic. Typical uses include:

  • Extend the capabilities of data connectors and other integrations.

  • Create efficient data manipulation and processing logic in Python and invoke it from workflows.

  • Productionize your data science work, by enabling Python models to be run as part of regularly scheduled Treasure Workflows.

  • Consolidate your data management into one environment. Use Treasure Workflow to connect multiple data environments.

Also in this article:

Supported Docker Images

  • Digdag-python:3.9 (See migration instructions)
    • Python 3.9
    • pytd version "1.4.0"
    • td-pyspark 20.12.0 
  • Digdag-python:3.7 (Upgrade recommended)

    • Python 3.7.4

    • pytd version "0.5.0"

    • td-pyspark 19.07

  • Digdag-python:3.7.3-stretch (Deprecated)

    • Python 3.7.3

    • pytd version "0.3.0"

  • Digdag-python:3.6.8-stretch (Deprecated) 
    • Python version 3.6.8

Example Treasure Workflow Custom Script Syntax

The following snippet is an example from a workflow:

+py_custom_code:
    py>: tasks.printMessage
    docker:
      image: "digdag/digdag-python:3.9"
		

Installing Your own Python Libraries

The Python scripts in TD Workflows are managed and run by Treasure Data in isolated Docker containers. Treasure Data provides a number of base Docker images to run in the container.

In addition to the libraries provided by the Docker image, you can install additional 3rd party libraries using the pip install command within the Python script.

You can pick the appropriate Docker image to run your Python script in, based on the Python version and libraries supported by the image.

From within your Python script, add the following syntax to install libraries from the Python script:

os.system(f"{sys.executable} -m pip install asn1") 

Deprecation of Old Images

We support the following three images in production:

  1. digdag/digdag-python:3.7 (current stable)
  2. digdag/digdag-python:3.6.8-stretch (deprecated)
  3. digdag/digdag-python:3.7.3-stretch (deprecated)


digdag-python:3.7 is the existing stable version and most frequently used image. Because digdag-python:3.7 is used by lots of users, we set a migration period of at least 6 months and are tracking a customer's migration process. We strongly recommend, however, that customers migrate to the new rootless docker images to reduce potential security risks.

The old stretch images (2 and 3) are soon moving towards End of Life (EoL),  so if you are using either one, we strongly recommend migrating to the new rootless docker image. 

Links to Other Articles

Article

Description

Passing parameters to Custom Scripts

You can use environment variables to pass parameters and credentials to the Custom Script using _env.

Executing custom script tasks in parallel within a workflow

Multiple Python scripts can be run in parallel within a workflow, using the _parallel operator.

Python Custom Scripting

To walk through a complete Custom Scripts tutorial.

Treasure Workflow Service Limits

Period of time until an executed custom script is killed is 1 day.









  • No labels