Writing Job Results into your Mongodb Collections.

This article explains how to write job results to your existing Mongodb instance.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • A Mongodb instance.
  • Treasure Data must have proper privileges.

Reference Architecture

A front-end application streamingly collects data to Treasure Data via Treasure Agent. Treasure Data periodically runs jobs on the data, then writes the job results to your Mongodb collections.

Example 1: Ranking: What are the “Top N of X?”

Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-mongodb” feature enables Treasure Data to find the “top N” data as well.

Example 2: Dashboard Application

If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-mongodb” feature, you can streamline the process and focus on your queries and your visualizations of the query results.

Untitled-3
Our Premium plan includes advanced security features, which includes a list of static IPs Treasure Data is using. You can limit the access to your Mongodb instance by using these IPs. Please contact support@treasuredata.com if you need it too.

Basic Usage

For On-demand Jobs

For on-demand jobs, just add the --result / -r option to the td query command. After the job is finished, the results are written into your collection.

$ td query --result 'mongodb://user:password@host:1234/database/collection' \
     -w -d testdb "SELECT code, COUNT(1) FROM www_access GROUP BY code"

For Scheduled Jobs

For scheduled jobs, just add the --result / -r option when scheduling a job. Every time the job runs, the results are written into mytbl.

$ td result:create mydb 'mongodb://user:password@host:1234/database'
$ td sched:create hourly_count_example "0 * * * *" \
     -d testdb "select count(*) from www_access" --result mydb:mycollection

Format

The result output target is represented by URL with the following format:

mongodb://<username>:<password>@<host>/<database>/<collection>
mongodb://<username>:<password>@<host>:<port>/<database>/<collection>

where:

  • mongodb is identified for result output to Mongodb;
  • username and password are the credential to the Mongodb instance;
  • hostname is the host name of the Mongodb instance;
  • port is the port number through which the Mongodb instance is accessible. This is optional;
  • database is the name of the destination database;
  • collection is the name of the destination collection.

Options

Modes

You can add or delete data in four modes:

mongodb://user:password@host:1234/database/collection                          # append
mongodb://user:password@host:1234/database/collection?mode=append              # append
mongodb://user:password@host:1234/database/collection?mode=replace             # replace
mongodb://user:password@host:1234/database/collection?mode=truncate            # truncate
mongodb://user:password@host:1234/database/collection?mode=update&unique=key1  # update

mode=append (default)

This is the default mode. The query results are appended to a collection. If the collection does not exist yet, a new collection will be created.

This method is atomic.

mode=replace

If the collection already exists, the rows of the existing collection are replaced with the query results. If the collection does not exist yet, a new collection will be created.

We achieve atomicity (so that a consumer of the collection always has consistent data) by performing the following three steps in a single transaction.

  1. Create a temporary collection.
  2. Write to the temporary collection.
  3. Replace the existing collection with the temporary collection using RENAME command.

This method is atomic.

mode=truncate

The system first truncates the existing collection, then inserts the query results. If the collection does not exist yet, a new collection will be created.

Untitled-3
Unlike REPLACE, TRUNCATE retains the indexes of your collection.

This method is atomic.

mode=update

This mode uses Mongodb’s find and “upsert” method (see Mongodb’s documentation). In short, a row is inserted unless it would cause a duplicate value in the unique index or primary key, in which case an update is performed. Please make sure you’ve already created unique index on the fields you specified at the arguments. When this mode is used, the unique option is required, see below.

Since Mongodb doesn’t support transactions, this mode cannot guarantee transaction atomicity.

Unique

This option is only relevant and required with the update mode. It takes the name of the unique key or keys (commad separated) column name to use for updating the Mongodb collection.

E.g.

mongodb://user:password@host:1234/database/collection?mode=update&unique=key1
mongodb://user:password@host:1234/database/collection?mode=update&unique=key1,key2,key3

Last modified: Aug 03 2015 00:01:48 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.