Treasure Data recommends that you hide secret information inside environment variables. This best practice avoids accidentally pushing code where others have access to the secret information.

This topic includes:

Create a Python Virtual Environment on a Local Environment

Python provides venv to create lightweight virtual environments with their own site directories. Alternatively, use pip to create the same environment.

Install Dependencies

1. Navigate to gist
2. Download
    • requirements.txt
    • constraints.txt

For example, from the command line:

$ python -m venv .venv 
$ source .venv/bin/activate 
(.venv)$ pip install -r requirements.txt -c constraints.txt`
3. Using the venv virtual environment, you can develop using the same packages in the local environment.

This approach does not resolve any operating system discrepancies. For example, a production environment that is running on Debian and a development environment that is Windows or macOS X can cause errors when executing OS-dependent commands like apt-get.



Test a Treasure Data Workflow with Python

If you want to run an entire workflow in the local environment, you can use Digdag.

As of March 5, 2020, Treasure Data uses the v0_10 Digdag branch.



Pass Parameters to Python Operators

The following are ways to pass parameters into py>operator:

Digdag Argument Example

Assume you have a Python script named py_scripts/examples.py as follows:

def print_arg(msg):
    print(f"Message is {msg}")

You can pass a message argument from a simple_with_arg task as the following example shows:

+simple_with_arg:
  py>: py_scripts.examples.print_arg
  msg: "Hello World"
  docker:
    image: "digdag/digdag-python:3.9"

To pass multiple arguments, add arguments in your function, and then add them into Digdag arguments as well.

You can pass Digdag arguments to Python seamlessly, however, you might face unintended variables passed by using keyword arguments **kwargs. For example, the Docker variable can be passed as a dictionary {“image”: “digdag/digdag-python:3.9”}. Treasure Data recommends using implicit arguments on Python functions.

There can be unintended conflicts between Digdag and the py> operator. Assuming you set some Digdag variables like the following:

_export:
  td:
    database: my_db

+simple_with_arg2:
  py>: py_scripts.examples.print_arg_td
  msg: "Hello World"
  docker:
   image: "digdag/digdag-python:3.9"

Use the Python function print_arg_td with td argument like the following:

def print_arg_td(msg, td=None):
    print(f"'msg' is {msg} and 'td' is {td}")

The td variable must never be None because the exported td variable should always be passed. For example, {“database”: “my_db”} should be a passed variable. This can cause type mismatches like dictionary and string. Treasure Data recommends avoiding the use of preserved arguments for Digdag, use td variables instead. For example:

  • td.endpoint

  • td.apikey

  • td.use_ssl

  • td.proxy.enabled

  • td.proxy.host

  • td.proxy.port

  • td.proxy.password

  • td.proxy.user

Digdag might convert to an unintended type, for example, an integer from a string. Treasure Data recommends evaluating or explicitly convert the type in a Python function.

See also, http://docs.Digdag.io/workflow_definition.html#using-variables.

Environment Variable

Environment variables can be another option to pass parameters to py> operator. An environment variable is reasonable for passing secure information or secrets.

Secrets and environment variables have a limit of 8192 characters.


For example, if Treasure Data has a simple_with_env task:

+simple_with_env:
  py>: py_scripts.examples.print_env
  _env:
    MY_ENV_VAR: "hello"
  docker:
    image: "digdag/digdag-python:3.9"

Access MY_ENV_VAR through an os.environ, for example:

import os

def print_env():
    print(f'Env var is {os.environ["MY_ENV_VAR"]}')

Using an environment variable is important when you need to use secrets information, for example, Treasure Data API key or AWS secrets key, and so on.

Digdag has a feature to store secrets information. Secrets are stored on Digdag or a Treasure Data Workflow database when executing td workflow secrets subcommands.


Let’s assume you have set a secret named td.apikey. This secret can be passed to py> operator. For example:

+simple_with_env2:
  py>: py_scripts.examples.access_td
  _env:
    TD_API_KEY: ${secret:td.apikey}
  docker: image: "digdag/digdag-python:3.9"

From py_scripts/examples.py as follows:

import os

def access_td():
    apikey = os.environ["TD_API_KEY"]
    # Do awesome execution

If you try to pass secrets from ordinal Digdag arguments, the secrets are never fetched from secrets DB. For example, if you have a task like the following:

+simple_with_env_ng:
  py>: py_scripts.examples.access_td_ng
  apikey: ${secret:td.apikey}
  docker: image: "digdag/digdag-python:3.9"

You should instead use the following script:

def access_td_ng(apikey):
    print(apikey)
    # Always shows "${secret:td.apikey}" insted of actual API key like "1234/XXXX"

Digdag Variable

If you want to read a Digdag variable in a Python script, you can use Digdag.env.params as in the following example:

def read_workflow_env(msg):
    import digdag     print(digdag.env.params["my_msg"])

The import Digdag command can be run only when the script is run as a Digdag py> operator task. If you want to avoid an import error, you should write the “try-except syntax” as follows:

try:
    import digdag     digdag.env.store({"feature_query": feature_query})
except ImportError:
    pass
  • No labels