Data Analytics Solutions for Heroku Applications

Table of Contents

Overview

This article shows how to analyze and understand your user base on Heroku applications.

Tracking Web Behaviors

Page View Tracking

For basic page view metrics, Google Analytics is sufficient. It can track page views, unique users or search keywords on a daily basis. Integration is easy: just place the JS fragment on every page you render.

Web Event Tracking

For metrics beyond page views, MixPanel or KISSmetrics can be used to track events and people, rather than page views. To correctly measure the engagement of your userbase, page views are not always enough. The number of comments is one such key metric. Both MixPanel and KISSmetrics provide JavaScript libraries to track events and people.

Tracking Everything on Your Business

Today’s mobile/web applications are complex. User interactions with the application aren’t restricted to the web. The mobile app often hits the server-side REST APIs.

As you grow, a custom & sophisticated analytics infrastructure will be needed for custom reporting, ad hoc analysis, alerting, etc. The analytics system will need to adapt to varying data sources as well.

Hadoop + RDBMS + BI

To analyze the growing data in a flexible way, we recommend the following architecture on Heroku to analyze user behavior.



Hadoop

Hadoop is designed to capture raw data in a cost effective manner and process it in parallel. All the internet giants now capture as much data possible to send to Hadoop. Hadoop excels at handling large amounts of data, but it is originally designed for batch processing.

RDBMS

RDBMS store data as structured data and return query results in an interactive manner. However, scaling RDBMS is difficult. But by combining Hadoop with RDBMS, we can achieve scalability together with interactive analytics.

Treasure Data Hadoop + Heroku Postgres + Chartio

Implementing the architecture above is easy on Heroku. First, Heroku’s Treasure Data addon enables a cloud-hosted Hadoop solution. Second, Heroku Postgres provides PostgreSQL as-a-service. Finally, Chartio, a cloud-hosted business dashboard, is able to talk directly with your RDBMS.



The following steps will show you how to analyze user behavior within typical social / mobile applications.

Step 0) Heroku + Chartio Account

You will need working accounts for both Heroku and Chartio.

Step 1) Data Injestion to Treasure Data

First, enable the treasure-data addon and install the heroku-td CLI plugin.

$ heroku addons:add treasure-data:nano
$ heroku plugins:install https://github.com/treasure-data/heroku-td.git
$ heroku td
usage: heroku td [options] COMMAND [args]

You can now inject data from Heroku’s logplex by outputting your data to STDOUT. To keep this document general, the example app will only track the following actions: register, login, pay, and access.

# Ruby
puts "@[production.register] #{{'uid'=>123}.to_json}"
puts "@[production.login] #{{'uid'=>123}.to_json}"
puts "@[production.pay] #{{'uid'=>123, 'item_id'=>7326, 'item_category'=>'categoryA', 'price'=>321321}.to_json}"
puts "@[production.access] #{{'uid'=>123, 'url'=>'http://..../'}.to_json}"

Step 2) Scheduled Aggregation to Heroku PostgreSQL

Next, Scheduled Jobs will be used to execute jobs periodically. The job query result can easily be inserted into Heroku PostgreSQL. First, please check your Heroku Postgres connection URL.

$ heroku config | grep POSTGRE
HEROKU_POSTGRESQL_NAVY_URL: postgres://user:pass@host:port/db

Then, create result output and scheduled jobs using the heroku td command. You can limit the processed data by using TD_TIME_RANGE() udf (e.g. to limit the data to the last hour).

$ heroku td result:create heroku_db postgresql://user:pass@host:port/db
$ heroku td sched:create pay_hourly "0 * * * *" \
  -d testdb \
  -D 600 \
  --result heroku_db:payment_history \
  "SELECT \
     TD_SCHEDULED_TIME() AS time, \
     item_category AS category, \
     SUM(CAST(price)) AS sum \
   FROM \
     www_access \
   WHERE \
     TD_TIME_RANGE(time,TD_TIME_ADD(TD_SCHEDULED_TIME(), '-1h'), TD_SCHEDULED_TIME()) \
   GROUP BY \
     TD_SCHEDULED_TIME(), item_category"

Payment data will now be inserted in the payment_history table on an hourly basis as shown below.

 2012-03-01 00, categoryA, 32132
 2012-03-01 00, categoryB, 3133
 2012-03-01 01, categoryA, 310310
 2012-03-01 02, categoryB, 1332

Step 3) Set Up Chartio

Please set up your connection from Chartio to Heroku PostgreSQL as shown in this document.



You can now set up and create your dashboard showing payments per category.




Last modified: Aug 03 2015 00:01:48 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.