Python scripts can be run from TD Workflow or Digdag, using the Python operator py>. You can create your workflows for TD using the TD Console or from the command line.

For the workflow to run the Python script, you must specify a Docker image. When the workflow task starts, a new Docker container is created based on the specified Docker image. Docker allows the Python script to execute in the container in an isolated environment.

Running this tutorial takes about 30 minutes and does not require that you have prior experience with Python or Docker images.


Prerequisites

  • Make sure this feature is enabled for your TD account.

  • Download and install the TD Toolbelt and the TD Toolbelt Workflow module. For more information, see TD Workflow Quickstart.

  • Python 3.6.8. Your Python code must be compatible with those versions.

  • Python 3.9. Your Python code must be compatible with those versions.

  • Basic Knowledge of Treasure Workflow's syntax

3rd-Party Python Libraries

The Python scripts in TD Workflows are managed and run by Treasure Data in isolated Docker containers. Treasure Data provides a number of base Docker images to run in the container.

import os
import sys
os.system(f"{sys.executable} -m pip install tensorflow")
import tensorflow

3rd-party Python libraries can be installed from your Python script using the pip install command.

For Docker images compatible with Python 3.6.8:

  1. To add more libraries from within your Python script use:

    pip install --user <package>
  2. Append the PATH environment variable using sys.path.append so that the installed packages can be found.

For Docker images compatible with Python 3.9:

  1. To add more libraries from within your Python script use:

    pip install <package>

Python Examples

See examples, for basics such as:

  • How to call functions

  • How to pass parameters to functions

  • How to use environment variables

  • How to import functions

Reading and Writing Data from Treasure Data

The examples show how to read data in Treasure Data into a Dataframe, manipulate data, and write it back to Treasure Data as a table.

  1. You can copy or clone the entire repository.

  2. Navigate to: treasure-boxes/integration-box/python/simple.dig

  3. From the command line, type ls to verify that you are in the correct directory. You should see the following:
    README.md other_scripts scripts simple.dig

  4. Push the simple examples to your TD environment by typing the following:

    td wf push simple-example


    This runs the simple.dig workflow and uploads the simple-example to TD.

To verify that the sample was added to TD:

  1. Open TD Console.

  2. Navigate to Workflows.

  3. Search for simple.

  4. Double click the simple workflow to open up the editor. For example:


To run the Workflow

  1. Select New Run.



Or to run the sample from the command line

  1. Type
    td wf start simple-example simple --session now


To validate the workflow job run

  1. From the TD Console, navigate to the workflow editor.

  2. Select Run History.


  3. If there are multiple instances of the job, select one to open the job history. From here you can view at what time the job ran, audit logs, and other helpful diagnostic information about the job.













  • No labels