Data Import from Ruby Apps on Heroku

Table of Contents

Prerequisites

  • Basic knowledge of Ruby, Gems, and Bundler.
  • Basic knowledge of Heroku, including the Heroku toolbelt.
  • Ruby 1.8 or higher (for local testing).

Provisioning the add-on

The Treasure Data addon can be attached to a Heroku application via the CLI:

$ heroku addons:add treasure-data:nano
Adding treasure-data:nano on <your_app_name>... done, v3 (free)

Treasure Data Toolbelt setup

First, please download and install the Treasure Data Toolbelt for your development environment.

If you have... Please refer to...
Mac OS Download Mac Installer
Windows Download Win Installer
Linux Redhat/CentOS, Debian/Ubuntu
Ruby gem `gem install td`

Heroku CLI setup

The heroku-td CLI plugin is also required as a bridge between the heroku CLI and the td CLI. Once you install the CLI plugin, you will be able to execute the heroku td family of commands.

$ heroku plugins:install https://github.com/treasure-data/heroku-td.git
$ heroku td
usage: heroku td [options] COMMAND [args]
Untitled-3
Please make sure to use the Heroku toolbelt. On December 1, 2012, Heroku stopped releasing new updates to the heroku gem. We have also stopped supporting the heroku gem.

Data Import

You can import data from your Heroku Ruby App to Treasure Data in Four Steps: 1) Print to STDOUT, 2) Access the app, 3) Check the data, and 4) Query it.

Step 1: Import Data By Printing to STDOUT (Yes, that simple!)

You can import data to Treasure Data by simply writing to STDOUT in a specific format. The format is:

@[database.table] JSON-IN-ONE-LINE

Thus, in Ruby,

puts "@[development.login] #{{'uid'=>123}.to_json}"
puts "@[development.follow] #{{'uid'=>123, 'from'=>'@TreasureData', 'to'=>'@Heroku'}.to_json}"
puts "@[development.pay] #{{'uid'=>123, 'item_name'=>'Stone of Jordan', 'category'=>'ring', 'price'=>100, 'count'=>1}.to_json}"
Untitled-3
The action name only supports alphanumeric characters and underscores. Hyphens cannot be used (an exception will be thrown).
Untitled-3
If you can't see your logs using the "heroku logs" command, please check if your stdout is being flushed correctly in your code.

Once you have inserted your event-logging code, push the modifications to Heroku!

$ git commit -a -m "Added Treasure Data Plugin"
$ git push heroku master
Untitled-3
In addition to writing to STDOUT, you can also use the logger library to post events to Treasure Data.

Step 2: Access Your Application

First, open your application on Heroku. The recorded events are first buffered locally, then periodically uploaded into the cloud. In the current implementation, the buffered data is uploaded every 5 minutes.

$ heroku open

Step 3: Check Your Uploaded Data

Treasure Data Hadoop’s data structure is like RDBMS: tables inside databases. In order to see the list of available databases, use the command: heroku td dbs.

$ heroku td dbs
+-------------+-------+
| Name        | Count |
+-------------+-------+
| development | 6     |
+-------------+-------+
1 row in set

In order to see the tables inside the available databases, use the command: heroku td tables.

$ heroku td tables
+-------------+--------+------+-------+--------+---------------------------+--------+
| Database    | Table  | Type | Count | Size   | Last import               | Schema |
+-------------+--------+------+-------+--------+---------------------------+--------+
| development | follow | log  | 1     | 0.0 GB | 2013-01-22 20:27:37 -0800 |        |
| development | login  | log  | 1     | 0.0 GB | 2013-01-22 20:27:37 -0800 |        |
| development | pay    | log  | 1     | 0.0 GB | 2013-01-22 20:27:37 -0800 |        |
+-------------+--------+------+-------+--------+---------------------------+--------+
3 rows in set
Untitled-3
If you don't see your data uploaded to TD, then your stdout may be buffered. See here for details.

To confirm that your application data has been uploaded properly, please check the ‘Count’ column. If any of the ‘Count’ entries are non-zero, your event logs been transferred successfully.

Once your data has been uploaded properly, use the ‘heroku td table:tail’ command to see the recent entries of a specific table.

$ heroku td table:tail development follow
{"uid":123,"time":1358915021,"to":"@Heroku","from":"@TreasureData"}

Step 4: Analyze Your Data

Our service lets you analyze your data using a SQL-style language. When your data is sent to us, your logs are imported into a Hadoop/Hive cluster. Hive lets its users query big data using a SQL-like interface.

Please use the heroku td query command to issue queries to TD Hadoop; TD Hadoop will accept and execute queries within the cloud.

The example query below counts the number of ‘follow’ actions that were generated by userid 12345:

$ heroku td query -w -d development \
  "SELECT COUNT(1) FROM follow WHERE uid = 123"
...
(some MapReduce messages)
...
+-----+
| _c0 |
+-----+
| 1   |
+-----+
1 row in set    

The next example query counts the total number of logged actions for each day.

$ heroku td query -w -d development \
  "SELECT to_date(from_unixtime(time)) AS day, count(1) FROM follow GROUP BY to_date(from_unixtime(time)) ORDER BY day"
...
(some MapReduce messages)
...
+------------+-----+
| day        | _c1 |
+------------+-----+
| 2013-01-23 | 2   |
+------------+-----+
1 row in set

The heroku td query –format csv command will return the results in csv format.

Issue Queries from a Ruby Program

Below is an example of issuing a query from a Ruby program. The query API is asynchronous, and you can wait for the query to complete by polling the job periodically (e.g. by issuing job.finished? calls).

require 'td'
require 'td-client'
cln = TreasureData::Client.new(ENV['TREASURE_DATA_API_KEY'])
job = cln.query('testdb', 'SELECT COUNT(1) FROM www_access')
until job.finished?
  sleep 2
  job.update_progress!
end
if job.success?
  job.update_status!  # get latest info
  job.result_each { |row| p row }
end

Does Uploading Data Impact App Performance?

When using the td gem, the posted records are buffered locally at first, and the data is uploaded every 5 minutes. Because a dedicated thread uploads the data into the cloud, it doesn’t affect your application’s response time.

The local buffer also has a size limit. If the local data exceeds this limit, the records will be uploaded immediately.

Next Steps

The Heroku Addon Notes document explains the limitations of Heroku addons. We recommend that you review this information before moving on to other articles.

We offer a schema mechanism that is more flexible than that of traditional RDBMSs. For queries, we leverage the Hive Query Language.

Appendix: Importing Data To Treasure Data Using The Logger Library

First, add the td (‘T’reasure ‘D’ata) gem to your Gemfile.

gem 'td', "~> 0.10.22"

Next, install the gem locally via bundler.

$ bundle install

The ‘td’ gem comes with a built-in library for recording in-app events. Insert code as shown below to record events from your app:

# Initialization
require 'td'
TreasureData::Logger.open('sample_database',
                          :apikey=>ENV['TREASURE_DATA_API_KEY'],
                          :auto_create_table=>true)

# Example1: login event
TD.event.post('login', {:uid=>123})

# Example2: follow event
TD.event.post('follow', {:uid=>123, :from=>'TD', :to=>'Heroku'})

# Example3: pay event
TD.event.post('pay',
              {:uid=>123, :item_name=>'Stone of Jordan',
               :category=>'ring', :price=>100, :count=>1})

Your event-logging code should be placed near its corresponding event-generating code.

Further details regarding the event logger API can be found here.

Development Environment Setup (for TD Logger Library ONLY)

Please set the TREASURE_DATA_API_KEY env variable to enable your development environment.

$ export TREASURE_DATA_API_KEY=`heroku td apikey:show`

Once your API key has been set, please start your application as usual.


Last modified: Feb 24 2017 09:27:52 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.