Writing Job Results into your Mongodb Collections.
This article explains how to write job results to your existing Mongodb instance.
Table of Contents
- Basic knowledge of Treasure Data, including the toolbelt.
- A Mongodb instance.
- Treasure Data must have proper privileges.
A front-end application streamingly collects data to Treasure Data via Treasure Agent. Treasure Data periodically runs jobs on the data, then writes the job results to your Mongodb collections.
Example 1: Ranking: What are the “Top N of X?”
Every social/mobile application calculates the “top N of X” (ex: top 5 movies watched today). Treasure Data already handles the raw data warehousing; the “write-to-mongodb” feature enables Treasure Data to find the “top N” data as well.
Example 2: Dashboard Application
If you’re a data scientist, you need to keep track of a range of metrics every hour/day/month and make them accessible via visualizations. Using this “write-to-mongodb” feature, you can streamline the process and focus on your queries and your visualizations of the query results.
|Our Premium plan includes advanced security features, which includes a list of static IPs Treasure Data is using. You can limit the access to your Mongodb instance by using these IPs. Please contact email@example.com if you need it too.|
For On-demand Jobs
For on-demand jobs, just add the
--result / -r option to the
td query command. After the job is finished, the results are written into your collection.
$ td query --result 'mongodb://user:password@host:1234/database/collection' \ -w -d testdb "SELECT code, COUNT(1) FROM www_access GROUP BY code"
For Scheduled Jobs
For scheduled jobs, just add the
--result / -r option when scheduling a job. Every time the job runs, the results are written into
$ td result:create mydb 'mongodb://user:password@host:1234/database' $ td sched:create hourly_count_example "0 * * * *" \ -d testdb "select count(*) from www_access" --result mydb:mycollection
The result output target is represented by URL with the following format:
- mongodb is identified for result output to Mongodb;
- username and password are the credential to the Mongodb instance;
- hostname is the host name of the Mongodb instance;
- port is the port number through which the Mongodb instance is accessible. This is optional;
- database is the name of the destination database;
- collection is the name of the destination collection.
You can add or delete data in four modes:
mongodb://user:password@host:1234/database/collection # append mongodb://user:password@host:1234/database/collection?mode=append # append mongodb://user:password@host:1234/database/collection?mode=replace # replace mongodb://user:password@host:1234/database/collection?mode=truncate # truncate mongodb://user:password@host:1234/database/collection?mode=update&unique=key1 # update
This is the default mode. The query results are appended to a collection. If the collection does not exist yet, a new collection will be created.
This method is atomic.
If the collection already exists, the rows of the existing collection are replaced with the query results. If the collection does not exist yet, a new collection will be created.
We achieve atomicity (so that a consumer of the collection always has consistent data) by performing the following three steps in a single transaction.
- Create a temporary collection.
- Write to the temporary collection.
- Replace the existing collection with the temporary collection using RENAME command.
This method is atomic.
The system first truncates the existing collection, then inserts the query results. If the collection does not exist yet, a new collection will be created.
|Unlike REPLACE, TRUNCATE retains the indexes of your collection.|
This method is atomic.
This mode uses Mongodb’s find and “upsert” method (see Mongodb’s documentation). In short, a row is inserted unless it would cause a duplicate value in the unique index or primary key, in which case an update is performed. Please make sure you’ve already created unique index on the fields you specified at the arguments.
When this mode is used, the
unique option is required, see below.
Since Mongodb doesn’t support transactions, this mode cannot guarantee transaction atomicity.
This option is only relevant and required with the
update mode. It takes the name of the unique key or keys (commad separated) column name to use for updating the Mongodb collection.
Last modified: Aug 03 2015 00:01:48 UTC