Ad-Network Reporting on the Cloud

Leading Ad-Network companies have used Treasure Data to store all of their impression/click/bidding logs to implement their reporting and optimization features.

This article will explain how to record logs from your applications and calculate some basic metrics using Treasure Data. Our platform lets your team focus on building profitable ad networks and ad exchanges instead of spending valuable human/IT resources maintaining a data analytics infrastructure.

Table of Contents

Prerequisites

  • Basic knowledge of Treasure Data, including the toolbelt.
  • Basic knowledge of td-agent.
  • Ruby 1.8 or higher (for local testing).

Architecture



td-agent is a versatile daemon program that can process various kinds of streaming log data. td-agent is developed and maintained by Treasure Data, Inc..

When an application posts logs to td-agent, td-agent continuously 1.) receives the data, 2.) buffers it, 3.) and uploads it into the cloud. The data is automatically compressed before it is uploaded into the cloud.

Installing td-agent

You must first set up td-agent on your application servers. Please refer to the following articles. For Linux systems, we provide deb/rpm packages.

If you have... Please refer to...
MacOS X Installing td-agent on MacOS X
Ubuntu System Installing td-agent for Debian and Ubuntu
RHEL / CentOS System Installing td-agent for Redhat and CentOS
AWS Elastic Beanstalk Installing td-agent on AWS Elastic Beanstalk

Modifying /etc/td-agent/td-agent.conf

You need to specify your authentication key at /etc/td-agent/td-agent.conf. You can check your apikey from the console. Next, set the apikey option in your td-agent.conf file.

Untitled-3
*YOUR_API_KEY* should be your actual API key string.
# Treasure Data Input and Output
<match td.*.*>
  type tdlog
  endpoint api.treasuredata.com
  apikey YOUR_API_KEY
  auto_create_table
  buffer_type file
  buffer_path /var/log/td-agent/buffer/td
  use_ssl true
</match>

Now, restart td-agent to reflect the changes you made above.

$ sudo /etc/init.d/td-agent restart

td-agent is now ready to accept logs from your application.

Logging from Applications

In this example, we will assume that the application is written is Ruby. Let’s first set up the logging code for important events. For other languages, please refer to these documents.

Supported Languages
Ruby or Rails Java Perl
Python PHP Scala
Node.js

Here’s the actual code to log important events. The time is automatically added by the logging library.

# impression
TD.event.post('impressions', {
  :device=>'iPhone',
  :useragent=>'Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile',
  :url=>'https://www.treasuredata.com',
  :publisher_id=>12345,
  :advertiser_id=>7691,
  :campaign_id=>456,
  :creative_id=>9134,
  :user_hash=>'76f868ca5e75d7410d1d4931293a16f1',
  :cpm=>1.34,
}

# click
TD.event.post('clicks', {
  :device=>'iPhone',
  :useragent=>'Mozilla/5.0 (iPhone; CPU iPhone OS 6_1_3 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mobile',
  :url=>'https://www.treasuredata.com',
  :publisher_id=>12345,
  :advertiser_id=>7691,
  :campaign_id=>456,
  :creative_id=>9134,
  :user_hash=>'76f868ca5e75d7410d1d4931293a16f1',
  :cpc=>0.02,
}

Here are a few sample queries that calculate some KPIs.

Daily Impressions

$ td query -w -d production \
  "SELECT
     TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT') AS day,
     COUNT(1) AS cnt
   FROM impressions
   WHERE TD_TIME_RANGE(time, '2013-01-01 PDT', NULL, 'PDT')
   GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT')"

Hourly Impressions

$ td query -w -d production \
  "SELECT
     TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:00:00', 'PDT') AS hour,
     COUNT(1) AS cnt
   FROM impressions
   WHERE TD_TIME_RANGE(time, '2013-01-01 PDT', NULL, 'PDT')
   GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd HH:00:00', 'PDT')"

Daily Clicks

$ td query -w -d production \
  "SELECT
     TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT') AS day,
     COUNT(1) AS cnt
   FROM clicks
   WHERE TD_TIME_RANGE(time, '2013-01-01 PDT', NULL, 'PDT')
   GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT')"

Daily Earnings by CPM per Advertiser

$ td query -w -d production \
  "SELECT
     TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT') AS day,
     advertiser_id,
     SUM(cpm / 1000.0)
   FROM impressions
   WHERE TD_TIME_RANGE(time, '2013-01-01 PDT', NULL, 'PDT')
   GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT'), advertiser_id"

Daily Earnings by CPC per Advertiser

$ td query -w -d production \
  "SELECT
     TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT') AS day,
     advertiser_id,
     SUM(cpc)
   FROM clicks
   WHERE TD_TIME_RANGE(time, '2013-01-01 PDT', NULL, 'PDT')
   GROUP BY TD_TIME_FORMAT(time, 'yyyy-MM-dd', 'PDT'), advertiser_id"

Last modified: Dec 24 2016 00:46:56 UTC

If this article is incorrect or outdated, or omits critical information, please let us know. For all other issues, please see our support channels.