Skip to content
Last updated

Postgresql Export Integration

You can export job results from Treasure Data to your existing PostgreSQL instance. For PostgreSQL data import, see PostgreSQL Import Integration.

This topic includes:

Prerequisites

  • Basic knowledge of Treasure Data, including the TD Toolbelt.
  • PostgreSQL instance.

Static IP Address of Treasure Data Integration

If your security policy requires IP whitelisting, you must add Treasure Data's IP addresses to your allowlist to ensure a successful connection.

Please find the complete list of static IP addresses, organized by region, at the following link:
https://api-docs.treasuredata.com/en/overview/ip-addresses-integrations-result-workers/

Use the TD Console to Create Your Connection

Create a New Connection

  1. Configure the following field values to create a new connection.
  • Host: The host information of the source database, such as an IP address.
  • User: Username to connect to the source database.
  • Password: Password to connect to the source database.
  • Use SSL: Check this box to connect using SSL
    • Require a valid SSL certificate?: Require that a valid SSL certificate is presented on the connection.

Configure Results Export to Your PostgreSQL Instance

Export from Treasure Data uses queries. You can create or reuse a query. In the query, you configure the data connection.

  1. Complete the instructions in Creating a Destination Integration.
  2. Navigate to Data Workbench > Queries.
  3. Select a query for which you would like to export data.
  4. Run the query to validate the result set.
  5. Select Export Results.
  6. Select an existing integration authentication.
  7. Define any additional Export Results details. In your export integration content review the integration parameters. For example, your Export Results screen might be different, or you might not have additional details to fill out:
  8. Select Done.
  9. Run your query.
  10. Validate that your data moved to the destination you specified.

Set the Export Result Parameters

  • Database name: The name of the database you are transferring data to. (Example: your_database_name)
  • Table: The table to which you would like to export the data.
  • Output mode. Different methods to upload the data.
    • Append (default): The append mode is the default mode that is used when no mode option is provided in the URL. In this mode, the query results are appended to the table. If the table does not exist, it is created. This mode is atomic.
    • Replace: The replace mode consists of replacing an existing table's entire content with the query's resulting output. If the table does not exist yet, a new table is created. The replace mode achieves atomicity (so that a consumer of the table always has consistent data) by performing the following three steps in a single transaction:
      1. Create a temporary table;
      2. Write to the temporary table;
      3. Replace the existing table with the temporary table using ALTER TABLE RENAME.
      • Truncate: The system first truncates the existing table, then inserts the query results. If the table does not exist yet, a new table is created. This mode is atomic.
      • Update: A row is inserted unless it would cause a duplicate value in the columns specified in the “unique” parameter: in such case, an update is performed instead. The “unique” parameter is required when using the update mode. This mode is atomic.
      • Insert Method. This option controls how the data is written into the Postgres table. The default method is copy; it is also recommended for most situations.
      • Copy(default): Data is first stored in a temporary file on the server, then written to Postgres using a COPY transaction. This method is faster than INSERT, so it is useful when handling large data.
      • Insert: Data is written to Postgres using ‘INSERT’ statements. This is the most reliable and compatible method, and it is recommended for most situations.
      • Schema: Defines the schema where the target table is located. If not specified, the default schema is to be used. The default schema depends on the user’s “search_path” setting, but it is usually “public”.
      • Foreign Data Wrapper: This option controls whether or not a data wrapper is used to store the data. The default is none and should work in most instances.
      • None (default) - No foreign-data wrapper.
      • Cstore - used when columnar storage is required/enabled on the destination table.

(Optional) Schedule Query Export Jobs

You can use Scheduled Jobs with Result Export to periodically write the output result to a target destination that you specify.

Treasure Data's scheduler feature supports periodic query execution to achieve high availability.

When two specifications provide conflicting schedule specifications, the specification requesting to execute more often is followed while the other schedule specification is ignored.

For example, if the cron schedule is '0 0 1 * 1', then the 'day of month' specification and 'day of week' are discordant because the former specification requires it to run every first day of each month at midnight (00:00), while the latter specification requires it to run every Monday at midnight (00:00). The latter specification is followed.

Scheduling your Job Using TD Console

  1. Navigate to Data Workbench > Queries

  2. Create a new query or select an existing query.

  3. Next to Schedule, select None.

  4. In the drop-down, select one of the following schedule options:

    Drop-down ValueDescription
    Custom cron...Review Custom cron... details.
    @daily (midnight)Run once a day at midnight (00:00 am) in the specified time zone.
    @hourly (:00)Run every hour at 00 minutes.
    NoneNo schedule.

Custom cron... Details

Cron ValueDescription
0 * * * *Run once an hour.
0 0 * * *Run once a day at midnight.
0 0 1 * *Run once a month at midnight on the morning of the first day of the month.
""Create a job that has no scheduled run time.
 *    *    *    *    *
 -    -    -    -    -
 |    |    |    |    |
 |    |    |    |    +----- day of week (0 - 6) (Sunday=0)
 |    |    |    +---------- month (1 - 12)
 |    |    +--------------- day of month (1 - 31)
 |    +-------------------- hour (0 - 23)
 +------------------------- min (0 - 59)

The following named entries can be used:

  • Day of Week: sun, mon, tue, wed, thu, fri, sat.
  • Month: jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec.

A single space is required between each field. The values for each field can be composed of:

Field ValueExampleExample Description
A single value, within the limits displayed above for each field.
A wildcard '*' to indicate no restriction based on the field.'0 0 1 * *'Configures the schedule to run at midnight (00:00) on the first day of each month.
A range '2-5', indicating the range of accepted values for the field.'0 0 1-10 * *'Configures the schedule to run at midnight (00:00) on the first 10 days of each month.
A list of comma-separated values '2,3,4,5', indicating the list of accepted values for the field.0 0 1,11,21 * *'Configures the schedule to run at midnight (00:00) every 1st, 11th, and 21st day of each month.
A periodicity indicator '*/5' to express how often based on the field's valid range of values a schedule is allowed to run.'30 */2 1 * *'Configures the schedule to run on the 1st of every month, every 2 hours starting at 00:30. '0 0 */5 * *' configures the schedule to run at midnight (00:00) every 5 days starting on the 5th of each month.
A comma-separated list of any of the above except the '*' wildcard is also supported '2,*/5,8-10'.'0 0 5,*/10,25 * *'Configures the schedule to run at midnight (00:00) every 5th, 10th, 20th, and 25th day of each month.
  1. (Optional) You can delay the start time of a query by enabling the Delay execution.

Execute the Query

Save the query with a name and run, or just run the query. Upon successful completion of the query, the query result is automatically exported to the specified destination.

Scheduled jobs that continuously fail due to configuration errors may be disabled on the system side after several notifications.

(Optional) You can delay the start time of a query by enabling the Delay execution.

Activate a Segment in Audience Studio

You can also send segment data to the target platform by creating an activation in the Audience Studio.

  1. Navigate to Audience Studio.
  2. Select a parent segment.
  3. Open the target segment, right-mouse click, and then select Create Activation.
  4. In the Details panel, enter an Activation name and configure the activation according to the previous section on Configuration Parameters.
  5. Customize the activation output in the Output Mapping panel.

  • Attribute Columns
    • Select Export All Columns to export all columns without making any changes.
    • Select + Add Columns to add specific columns for the export. The Output Column Name pre-populates with the same Source column name. You can update the Output Column Name. Continue to select + Add Columnsto add new columns for your activation output.
  • String Builder
    • + Add string to create strings for export. Select from the following values:
      • String: Choose any value; use text to create a custom value.
      • Timestamp: The date and time of the export.
      • Segment Id: The segment ID number.
      • Segment Name: The segment name.
      • Audience Id: The parent segment number.
  1. Set a Schedule.

  • Select the values to define your schedule and optionally include email notifications.
  1. Select Create.

If you need to create an activation for a batch journey, review Creating a Batch Journey Activation.

(Optional) Export Integration Using the CLI

If the TD Console is not available or does not meet your needs, you can use the CLI to issue queries and output results. The following instructions show you how to format the query output results using the CLI.

td query Command Usage

To output the result of a single query to a Postgres server, add the --result option to the td query command. After the job is finished, the results are written into your database:

td query -w -d testdb \
--result 'postgresql://user:password@host/database/table' \
"SELECT code, COUNT(1) FROM www_access GROUP BY code"

To create a scheduled query whose output is systematically written to Postgres add the --result option when creating the schedule through td sched:create command:

td sched:create hourly_count_example "0 * * * *" \
-d testdb \
--result 'postgresql://user:password@host/database/table' \
"SELECT COUNT(*) FROM www_access"

Result Output URL Format

The result output target is represented by URL with the following format:

postgresql:``//username:password@hostname:port/database/table

where:

  • postgresql is identified for result output to Postgres;
  • username and password are the credentials to the Postgres server;
  • the hostname is the hostname of the Postgres server;
  • port is the port number through which the Postgres server is accessible. “:” is optional and assumed to be 5432 by default;
  • database is the name of the destination database;
  • table is the name of a table within the above-mentioned database. It may not exist at the moment the query output is executed, in which case a table with the specified name is created for the user.

Options

Result output to Postgres supports various options that can be specified as optional URL parameters. The options are compatible with each other and can be combined. Where applicable, the default behavior is indicated.

SSL Option

ssl option determines whether to use SSL or not for connecting to the Postgres server.

Use SSL from Treasure Data to the Postgres server connection. The Postgres server must be configured to accept an SSL connection.

postgresql:``//user:password@host/database/table?ssl=true

Do not use SSL from Treasure Data to the Postgres server connection.

postgresql:``//user:password@host/database/table?ssl=false

Schema Option

Controls the schema the target table is located. If not specified default schema is to be used. The default schema depends on the user’s “search_path” setting but it is usually “public”.

postgresql:``//user:password@host/database/table?schema=target_schema

Update Mode Option

Controls the various ways of modifying the database data. All 4 supported modes are atomic because they use a temporary table to store the incoming data before attempting to modify the destination table:

  • Append
  • Replace
  • Truncate
  • Update

mode=append (default)

The append mode is the default, used when no mode option is provided in the URL. In this mode, the query results are appended to the table. If the table does not exist, a table is created.

Because mode=append is the default behavior, these two URLs are equivalent:

  • postgresql://user:password@host/database/table
  • postgresql://user:password@host/database/table?mode=append

mode=replace

The replace mode consists of replacing the entire content of an existing table with the result output of the query. If the table does not exist yet, a new table is created. The replace mode achieves atomicity (so that a consumer of the table always has consistent data) by performing the following three steps in a single transaction:

  1. Create a temporary table.
  2. Write to the temporary table.
  3. Replace the existing table with the temporary table using ALTER TABLE RENAME.

Example:

  • postgresql://user:password@host/database/table?mode=replace

mode=truncate

With the truncate mode, the system first truncates the existing table, then inserts the query results. If the table does not exist yet, a new table is created.

Example:

postgresql://user:password@host/database/table?mode=truncate

Unlike replace, the truncate mode retains the indexes of the table.

mode=update

In the update mode, a row is inserted unless the inserted row causes a duplicate value in the columns specified in the “unique” parameter. In such cases, an update to the row is performed instead of an insert. A “unique” parameter is required when using the update mode.

Example:

  1. postgresql://...?mode=update&unique=col1        # single unique column
  2. postgresql://...?mode=update&unique=[col1,col2] # multiple unique columns

Write method Option

The method option controls how the data is written into the Postgres table. You can use:

  • method=insert
  • method=copy

The default method is insertand is the recommended method for most situations.

method=insert (default)

With the insert method, data is written to Postgres using ‘INSERT’ statements and is the most reliable and compatible method.

Because method=insert is the default behavior, these two URLs are equivalent:

  1. postgresql://user:password@host/database/table
  2. postgresql://user:password@host/database/table?method=insert

method=copy

When the copy method is used, the data is first stored in a temporary file on the server, then is written to Postgres using a COPY transaction. This method is faster than INSERT and therefore is useful when handling a large amount of data.

Example:

postgresql://user:password@host/database/table?method=copy

(Optional) Configure Export Results in Workflow

Within Treasure Workflow, you can specify the use of this data connector to output data.

timezone: UTC

_export:
  td:
    database: sample_datasets

+td-result-output-postgresql:
  td>: queries/sample.sql
  result_connection: your_connections_name
  result_settings:
   database: database_name
   table: table_name
   mode: append
   set_role: new_role

Read about using data connectors in a workflow to export data. See an example workflow.