|This article supports Audience Studio - Legacy.|
The goal of this tutorial is to perform a churn prediction on public data from a US telecom. This tutorial uses a data set which that consists of the churn history of phone numbers from a the book “Discovering Knowledge in Data: An Introduction to Data Mining.”
This is a tutorial overview and does not include steps or screen captures for a complete scenario. Assume the data is already imported to Treasure Data as a table:
The table has 1,000,000 records (such as customers, phone numbers; 1 record = 1 phone number), and each profile has 20 attributes (such as day calls, account length and international plan) and 1 label column “Churn” (
Create a master segment based on the data. Because the data is quite simple, you can create a master segment by directly using the table as a master table:
Click Runto generate the master segment data.
Define batch segments representing a churn prediction. In this example, the goal is to put predictive scores to customers who have not churned yet. See Predicting Customer Behavior. The separation between population, positive samples, and scoring target can be illustrated as follows:
Create segments. Separate segments mean that:
Configure Predictive Scoring
Specify the dependent segments and attributes used for prediction in Predictive Scoring:
Choosing a subset of attributes is a part of feature engineering in the context of machine learning. Data scientists generally spend significant amounts of time to find an appropriate feature set.
Selected columns are categorized into the following types:
Categorical Array Features
You can add and remove columns.
After the master segment is successfully re-generated, review your dashboard:
The histogram shows the distribution of predictive scores. The horizontal axis corresponds to predictive score distributed from 0 to 100. The vertical axis indicates the number of scored profiles (customers). The different colors score and categorize customers. Customer behavior is scored. Customers are categorized into two groups:
Based on the thresholds, adjusted by a seek bar located under the histogram, each of the customer profiles is assigned to one of four grades:
For example, no active customers are categorized into the Likely grade, and 29 active customers are in the Possibly grade. If you like to reach to more “likely” customers, you must adjust the right-most threshold to smaller value on the seek bar so that the percentage in the Likely circle is increased to a higher value.
After thresholds are adjusted to desired positions, select Create New Segment:
You create a new batch segment based on the predictive scores. For example, you are interested in Possibly and Likely customers.
A new segment based on the predictive scoring is created as follows:
Because the Possibly and Likely grade respectively have 0 and 29 customers according to the dashboard, this batch segment contains 29 “promising” customers in total.
Machine learning on real-world data is not simple, and is sometimes inaccurate or results in an undesired prediction result. Use auxiliary information, provided at the bottom of the Predictive Scoring view to understand and improve your predictive scoring:
Statistics of your audience are shown with an estimated accuracy of prediction:
Plus, you can visually confirm which attributes strongly contribute to the prediction and what kind of values exist in each attribute:
Reviewing the information, you can see that customer service calls positively contributes to customer churn, and that no international plan leads to lower churn rate.