Writing Job Results into PostgreSQL

This article explains how to write job results to your existing PostgreSQL instance.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • A PostgreSQL instance.
  • Treasure Data must have SELECT, INSERT, DELETE, CREATE, DROP and ALTER privileges.

Architecture

A front-end application streams data to be collected in Treasure Data via Treasure Agent. Treasure Data periodically runs jobs on the data, then writes the job results to your Postgres tables.



Shown above is a fairly common architecture. Here are a few examples.

Example 1: Ranking: What are the “Top N of X?”

Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-PostgreSQL” feature enables Treasure Data to find the “top N” data as well.

Example 2: Dashboard Application

If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-PostgreSQL” feature, you can streamline the process and focus on your queries and your visualizations of the query results.

Untitled-3
Our Premium plan includes advanced security features, which includes a list of static IPs Treasure Data is using. You can limit the access to your database by using these IPs. Please contact support@treasuredata.com if you need it too.

Result Output URL Format

Format

The result output target is represented by URL with the following format:

postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>

where:

  • postgresql is identified for result output to Postgres;
  • username and password are the credential to the Postgres server;
  • hostname is the host name of the Postgres server;
  • port is the port number through which the Postgres server is accessible. “:” is optional and assumed to be 5432 by default;
  • database is the name of the destination database;
  • table is the name of a table within the above mentioned database. It may not exist at the moment the query output is execute, in which case a table with the specified name will be created for the user.

Options

Result output to Postgres supports various options that can be specified as optional URL parameters. The options are compatible with each other and can be combined. Where applicable, the default behavior is indicated.

ssl option

ssl option determines whether to use SSL or not for connecting to the Postgres server.

ssl=true

Use SSL from Treasure Data to the Postgres server connection. The Postgres server must be configured to accept SSL connection.

postgresql://user:password@host/database/table?ssl=true
ssl=false (default)

Do not use SSL from Treasure Data to the Postgres server connection.

postgresql://user:password@host/database/table?ssl=false

schema option

Controls the schema the target table is located. If not specified default schema is to be used. The default schema depends on user’s “search_path” setting but it is usually “public”.

postgresql://user:password@host/database/table?schema=target_schema

Update mode option

Controls the various ways of modifying the database data. All 4 supported modes are atomic because they use a temporary table to store the incoming data before attempting to modify the destination table:

  • Append
  • Replace
  • Truncate
  • Update
mode=append (default)

The append mode is the default which is used when no mode option is provided in the URL. In this mode the query results are appended to the table. If the table does not exist, it will be created.

Because mode=append is the default behavior, these two URLs are equivalent:

postgresql://user:password@host/database/table
postgresql://user:password@host/database/table?mode=append
mode=replace

The replace mode consists of replacing the entire content of an existing table with the result output of the query. If the table does not exist yet, a new table will be created. The replace mode achieves atomicity (so that a consumer of the table always has consistent data) by performing the following three steps in a single transaction:

  1. Create a temporary table;
  2. Write to the temporary table;
  3. Replace the existing table with the temporary table using ALTER TABLE RENAME.

Example:

postgresql://user:password@host/database/table?mode=replace
mode=truncate

With the truncate mode the system first truncates the existing table, then inserts the query results. If the table does not exist yet, a new table will be created.

Example:

postgresql://user:password@host/database/table?mode=truncate
Untitled-3
Unlike replace, the truncate mode retains the indexes of the table.
mode=update

In the update mode a row is inserted unless it would cause a duplicate value in the columns specified in the “unique” parameter: in such case an update is performed instead. Please note that the “unique” parameter is required when using the update mode.

Example:

postgresql://...?mode=update&unique=col1        # single unique column
postgresql://...?mode=update&unique=[col1,col2] # multiple unique columns

Write method option

The method option controls how the data is written into the Postgres table.

The default method is insert; it is also the recommended method for most situations.

method=insert (default)

With the insert method, data is written to Postgres using ‘INSERT’ statements. This is the most reliable and compatible method and it is recommended for most situations.

Because method=insert is the default behavior, these two URLs are equivalent:

postgresql://user:password@host/database/table
postgresql://user:password@host/database/table?method=insert
method=copy

When the copy method is used the data is first stored in a temporary file on the server, then written to Postgres using a COPY transaction. This method is faster than INSERT, so it is useful when handling a large amount of data.

Example:

postgresql://user:password@host/database/table?method=copy

Usage

CLI

To output the result of a single query to a Postgres server add the --result option to the td query command. After the job is finished, the results will be written into your database:

$ td query -w -d testdb \
  --result 'postgresql://user:password@host/database/table' \
  "SELECT code, COUNT(1) FROM www_access GROUP BY code"

To create a scheduled query whose output is systematically written to Postgres add the --result option when creating the schedule through td sched:create command:

$ td sched:create hourly_count_example "0 * * * *" -d testdb \
  --result 'postgresql://user:password@host/database/table' \
  "SELECT COUNT(*) FROM www_access"

Last modified: Feb 10 2016 08:37:37 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.