Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
This article supports Audience Studio - Legacy.

The goal of this tutorial is to perform a churn prediction on public data from a US telecom. This tutorial uses a data set which that consists of the churn history of phone numbers from a the book “Discovering Knowledge in Data: An Introduction to Data Mining.”

...

This is a tutorial overview and does not include steps or screen captures for a complete scenario. Assume the data is already imported to Treasure Data as a table:

Image Modified

The table has 1,000,000 records (such as customers, phone numbers; 1 record = 1 phone number), and each profile has 20 attributes (such as day calls, account length and international plan) and 1 label column “Churn” (True. or False.).

...

  1. Create a master segment based on the data. Because the data is quite simple, you can create a master segment by directly using the table as a master table:

  2. Click Runto generate the master segment data.


...

Define batch segments representing a churn prediction. In this example, the goal is to put predictive scores to customers who have not churned yet. See Predicting Customer Behavior. The separation between population, positive samples, and scoring target can be illustrated as follows:

Image Modified

Create segments. Separate segments mean that:

...

  • Positive samples


  • Scoring target

Configure Predictive Scoring

...

Specify the dependent segments and attributes used for prediction in Predictive Scoring:

Image Modified

Choosing a subset of attributes is a part of feature engineering in the context of machine learning. Data scientists generally spend significant amounts of time to find an appropriate feature set.

...

Selected columns are categorized into the following types:

Categorical Features

  • Attributes which are not meaningful as a numeric value such as gender, day of week, group etc.

Categorical Array Features

  • Array column on TD which can be treated as single categorical information such as td_affinity_categories generated by the content affinity engine, list of games played before etc.

Quantitative Features

  • Numeric values such as age, price, frequency etc.

You can add and remove columns.

...

After the master segment is successfully re-generated, review your dashboard:

Image Modified

The histogram shows the distribution of predictive scores. The horizontal axis corresponds to predictive score distributed from 0 to 100. The vertical axis indicates the number of scored profiles (customers). The different colors score and categorize customers. Customer behavior is scored. Customers are categorized into two groups:

...

Based on the thresholds, adjusted by a seek bar located under the histogram, each of the customer profiles is assigned to one of four grades:

Likely

Possibly

Marginally

Unlikely



For example, no active customers are categorized into the Likely grade, and 29 active customers are in the Possibly grade. If you like to reach to more “likely” customers, you must adjust the right-most threshold to smaller value on the seek bar so that the percentage in the Likely circle is increased to a higher value.

...

After thresholds are adjusted to desired positions, select Create New Segment:

Image Modified


You create a new batch segment based on the predictive scores. For example, you are interested in Possibly and Likely customers.

...

A new segment based on the predictive scoring is created as follows:

Image Modified


Because the Possibly and Likely grade respectively have 0 and 29 customers according to the dashboard, this batch segment contains 29 “promising” customers in total.

...

Machine learning on real-world data is not simple, and is sometimes inaccurate or results in an undesired prediction result. Use auxiliary information, provided at the bottom of the Predictive Scoring view to understand and improve your predictive scoring:

Image Modified


Statistics of your audience are shown with an estimated accuracy of prediction:

Image Modified


Plus, you can visually confirm which attributes strongly contribute to the prediction and what kind of values exist in each attribute:

Image ModifiedImage Modified


Reviewing the information, you can see that customer service calls positively contributes to customer churn, and that no international plan leads to lower churn rate.

...