In some cases including connection details in your Embulk configuration files is not ideal. In situations where you need to hide or mask certain details you can embed environment variables in your configuration file.

Use of environment variables in Embulk is an experimental feature. The feature might change or be removed in future releases.


Prerequisites

Understanding Environment Variable Naming Conventions

You need to replace some environment variables and follow the variable naming convention: {{ env.replaced_detail }}. replaced_detail being the name of environment variable. For example, if you had set the environment variable for your database password and named it DB_PASSWORD, then the value in your configuration file would be:

{{ env.DB_PASSWORD }}

The convention is env. followed by the name of your environment variable in double curly braces {{ }}.

Setting Environment Variables

An environment variable is a dynamic-named value that can be used by a running process to complete its task. For example, a running process can query the value of the DB_HOST environment variable to discover the IP address of the MySQL database, or the API_KEY variable to find the value of the API key to authenticate with Treasure Data. The procedure to set or change environment variables varies from platform to platform. For example, see Environment variables on Mac OS X.

To use variables in your configuration file

  1. Rename the .yml configuration file so that the extension ends with .yml.liquid. For example, if your configuration file was originally named config.yml, renamed it to config.yml.liquid.

  2. Insert the environment variable into the configuration file by replacing the connection details, using the variable naming convention.

  3. Run Embulk in preview mode to validate your changes. For example:

    embulk preview config.yml.liquid
  4. Run Embulk to set the new configuration file details. For example:

    embulk run config.yml.liquid

Example config.yml.liquid File

For example if the original config.yml file was the following:

in:
    type: mysql
    host: localhost
    port: 3306
    user: username
    password: password
    database: mysql_db
    select: "col1, col2, datecolumn"
    where: "col4 != 'a'"
out:
    type: td
    apikey: xxxxxxxxxxxx
    endpoint: api.treasuredata.com
    database: dbname
    table: tblname
    time_column: datecolumn
    mode: replace 
    # by default mode: append is used, if not defined. 
    # Imported records are appended to the target table with this mode.
    # mode: replace, replaces existing target table
    default_timestamp_format: '%d/%m/%Y'

You want to hide the MySQL port, username, password, and database. On the output section, you might want to hide your API key. Using the correct naming conventions: {{ env.replaced_details }}, the file becomes the following:

in:
    type: mysql
    host: {{ env.db_host }}
    port: {{ env.db_port }}
    user: {{ env.db_username }}
    password: {{ env.db_password }}
    database: {{ env.db_name }}
    select: "col1, col2, datecolumn"
    where: "col4 != 'a'"
out:
    type: td
    apikey: {{ env.td_apikey }}
    endpoint: {{ env.api_endpoint }}
    database: {{ env.td_db_name }}
    table: {{ env.td_table_name }}
    time_column: datecolumn
    mode: replace 
    #by default mode: append is used, if not defined. Imported records 
    #are appended to the target table with this mode.
    #mode: replace, replaces existing target table
    default_timestamp_format: '%d/%m/%Y'.



  • No labels