Skip to content
Last updated

RFM Analysis

Recency, frequency, monetary value (RFM) is a marketing analysis model that segments a company's customer base by purchasing patterns. Specifically, it evaluates a customer's recency (how long ago they made a purchase), frequency (how often they make purchases), and monetary value (how much money they spend). Ranking by quantile is performed for each recency, frequency, and monetary value. Customers are clustered by RFM scores using the k-means algorithm, into categories such as Loyalists, Champions, and Lost Customers.

Expected Input

This notebook runs RFM analysis on a given transaction table specified by input_table option. The expected input table schema depends on the parameter aggregated_input.

When aggregated_input is set to False , the input_table schema is expected as follows:

usertstampamount
31052859682011-04-05115
18509857342011-11-231037
2743828082011-04-2517
3582731442011-04-0260
.........

For each user,

  • Recency is defined by days since the latest tstamp
  • Frequency is defined by number of transactions
  • Monetary is defined by total amount

The max tstamp value in input_table is used for the reference date to define recency.

When aggregated_input is set to True , the input_table schema is expected as follows:

userrecencyfrequencymonetary_value
3105285968103200.12
1850985734205500.3
27438280830150.4
.........

If aggregated_input is False, missing value handling is done. If aggregated_input is True, missing value handling is not done and you would have to do the preprocessing on your own beforehand.

Expected Outcome

Ranking by quantile is performed for each recency , frequency , and monetary_value. Quantile ranks in r_quartile/f_quartile/m_quartile ranges from 1 to 4 and the higher the value, the better it is. RFM score is defined by (r_quartile + f_quartile + m_quartile) / 3.

Sample _output_table_:

61612318144115314R3F1M42.6667Promising0R3F2M2
9549322781456232R2F3M22.3333Need attention1R1F1M1
21495332662770123R1F2M32.0High Value Sleeping0R3F2M2
39408329094893244R2F4M43.3333Cannot lose them0R3F2M2
8120321651056211R2F1M11.3333Hibernating1R1F1M1
4511038182133111R1F1M11.0Lost customers1R1F1M1
....................................
userrecencyfrequencymonetary_valuer_quartilef_quartilem_quartilerfm_quartilerfm_scorerfm_segmentrfm_cluster_idrfm_cluster_rank

Users are clustered by RFM scores using the k-means algorithm. rfm_cluster_rank explains cluster traits. When cluster size is 3, the cluster R3F3M3 represents that recency , frequency , and monetary_value are high and it is a promising cluster. Whereas, R1F1M1 means that recency, frequency, and monetary_values are low.

An optimal number of clusters is automatically calculated from min_clusters to max_clusters. You can override this calculation by setting the num_clusters option.

When audience_name is specified, recency, frequency, monetary_values,__r_quartile, f_quartile, m_quartile, rfm_score, rfm_segment, rfm_cluster_rank in output_table are attached as attribute columns of the Master Segment and Segments using rfm_segment and rfm_cluster_rank are being defined.

rfm_segment names follow an industy-standard naming used by multiple players. See this treemap and pseudo code for rfm_segment naming rules.

See the following table for the description of 10 RFM segments.

Segment nameDescriptionRFM Quartile values
ChampionsIdeal customers. Bought recently, buys often and spends the most.R4F4M4
Loyal CustomersSpends good money. Responsive to promotions. These are very active and very valuable customers.R4F4M3, R4F3M4, R4F3M3, R3F4M4, R3F4M3, R3F3M4, R3F3M3
Potential LoyalistsRecent customers, spent good amount, bought more than once.R4F4M2, R4F3M2, R4F2M4, R4F2M3, R4F2M2, R3F4M2, R3F3M2, R3F2M4, R3F2M3, R3F2M2
PromisingRecent shoppers bought relatively recently for high value, or frequently. So, a promising customer segment.R4F4M1, R4F3M1, R4F2M1, R4F1M4, R4F1M3, R4F1M2, R3F4M1, R3F3M1, R3F2M1, R3F1M4, R3F1M3, R3F1M2
New CustomersBought recently, but not often.R4F1M1, R3F1M1
Cannot lose themMade big purchases and often, but long time ago. Large revenue comes for this segment and better not lose their attention.R2F4M4, R2F4M3, R2F3M4, R2F3M3, R2F2M4, R2F2M3, R2F1M4, R2F1M3
Need attentionPotential loyalist is losing attention. So, needs attention.R2F4M2, R2F3M2, R2F2M2
HibernatingLow spenders, low frequency, purchased long time ago. Not worth give attention to them.R2F4M1, R2F3M1, R2F2M1, R2F1M2, R2F1M1
High Value SleepingPast potential loyalist sleeping. Worth awaking their losing interests again.R1F4M4, R1F4M3, R1F4M2, R1F3M4, R1F3M3, R1F3M2, R1F2M4, R1F2M3, R1F2M2, R1F1M4, R1F1M3
Lost customersLowest recency, frequency, and monetary scores. This segment has the lowest priority.R1F4M1, R1F3M1, R1F2M1, R1F1M2, R1F1M1

Example Graph Outputs:

image

image

Screen Shot 2023-06-29 at 17 35 08

Workflow Example

Find a sample workflow in Treasure Boxes.

+rfm:
  ipynb>:
    notebook: RFM
    input_table: ml_datasets.cosmetics_store
    user_column: user_id
    tstamp_column: event_time
    amount_column: price  

Parameters

parameter namedescriptionrequireddata typedefault valueexample value
input_tablespecify a TD table used for RFM analysis like dbname.table_nameyesstring (dbname.table_name)
ml_dataset.td_rfm
user_columnspecify a column name for usernostringuseruser
tstamp_columnspecify a column name for timestampnostringtstamptime
amount_columnspecify a column name for transaction amount such as purchase amountnostringamountpurchase_amount
output_tablespecify a TD table to export RFM result as dbname.table_namenostring (dbname.table_name)
ml_output.rfm
min_clustersspecify a minimum number of clustersnointeger25
max_clustersspecify a maximum number of clustersnointeger825
num_clustersspecify a fixed number of clustersnointegerNone3
hide_table_contentssuppress showing table contentsnobooleanfalsefalse
audience_nameAudience name to merge an attribute tablenostringNonemaster_segment_name
foreign_keyforeign key column name of a master segment used for Audience integration. user_column value is used if not set.nostringNonetd_canonical_id
disable_clusteringdisables clusteringnobooleanfalsefalse
aggregated_inputAllows to input aggregated tablenobooleanfalsefalse