Cost Per Acquisition (CPA) is a key metric that matters to marketers. To reduce costs of online advertising, it is needed to improve Click-Through-Rate (CTR).

Our machine learning solution enables you to predict CTR of each ad session by learning a prediction model from past big data, from millions of attributes and billions of training examples.

Treasure Workflow provides an easy way to predict CTR and Conversion Rate (CVR). For this example, you need to prepare a training table.

Other Learning

Input

For instance, this workflow takes a table of the following form:

rowid

long

label

int

i1

int

i2

int

c1 (e.g., address)

string

c2 (e.g., browser)

string

1

0

23

1

“Ohio”

“Firefox”

2

1

18

4

“New York”

“Google Chrome”

3

0

35

44

“California”

“Safari”

Here, each row represents user’s single impression for an ad. Impressions can be written by a set of int(quantitative) and string (categorical) variables such as users' demographics. A column label shows whether a user clicked an ad.

This template supports tables which have 13 quantitative (i1-i13) and 26 categorical (c1-c26) variables by default.

If you have more or less quantitative and categorical features in a table, you need to modify the following queries appropriately:

Workflow

A basic workflow for CTR prediction:

  • Prepare your data:

    $ ./data.sh
  • Push your workflow to TD:

    $ td wf push ctr-prediction
  • Run the workflow:

    $ td wf start ctr-prediction predict_logress --session now 
      -p apikey=${YOUR_TD_API_KEY}

Where:

Output

The output of the workflow is a table that contains the predicted CTRs for possible future impressions:

rowid

long

predicted_ctr

double

80038

0.487177

80043

0.9583734

80046

0.9104515

Using the Prediction Model in Production System

After the prediction workflow completes successfully, export the prediction model to your own MySQL database:

  1. Configure your MySQL database information in config/secrets.yml.

  2. Load the information to TD workflow:

    $ td wf secrets --project ctr-prediction --set @config/secrets.yml
  3. Export a logress_model table to your MySQL DB:

    td wf start ctr-prediction mysql --session now -p apikey=${YOUR_TD_API_KEY}
  4. Make sure a table logress_model exists on your MySQL DB:

create table logress_model (
  feature varchar(255),
  weight double
);
-- setting index would be better
create unique index logress_model_feature_index on logress_model (feature);

You are ready to predict CTRs for unforeseen impressions.

mysql> mysql_udfs.sql
  • Prediction for single impression can be done by:

select
  sigmoid(sum(m.weight * t2.value)) as prob
from
  logress_model m
  left outer join (
    select
      extract_feature(f) as feature,
      extract_value(f) as value
    from (
      select 'i1:23' as f
      union all
      select 'i2:1' as f
      union all
      ...
      union all
      select 'c1#Ohio' as f
      union all
      select 'c2#Firefox' as f
      union all
      ...
    ) t1
  ) t2 on (m.feature = t2.feature)
;

In particular, when a user visits a site, what your ad server needs to do is:

  • Convert all possible ads to sets of quantitative and categorical features.

  • Construct queries as shown above and compute predicted CTRs.

  • Display highest-scored ads to the target user.

Another option is to predict CTRs programmatically by reading the prediction model from MySQL database, as demonstrated in the following code snippet:

$model = read_model_from_mysql()

def scoring(i, a)
  # list of [feature, value] pairs
  features = [
    # quantitative variables
    ['i0', i.user_generation],
    ['i1', i.user_age],
    ...,
    # categorical features
    ["c1##{i.user_address}", 1.0],
    ["c2##{i.user_browser}", 1.0],
    ["c3##{a.ad_id}", 1.0],
    ["c4##{i.publisher_id}", 1.0],
    ["c5##{a.advertiser_id}", 1.0],
    ["c6##{a.campaign_id}", 1.0],
    ["c7##{a.creative_id}", 1.0],
    ...
  ]

  # compute weighted sum
  features.inject(0) { |sum, f| sum += ($model[f.first] || 0) * f.last }
end

impression = ... # target impression
ads = [ ... ] # list of possible ads
best_performing_ad = ads.map{|ad| [scoring(impression, ad), ad]}.sort.last[1]

  • No labels