Skip to content
Last updated

Cltv Prediction

The CLTV Prediction solution notebook forecasts customer lifetime value (CLTV) using an open source Python library Lifetimes for implementing the Buy Till You Die (BTYD) model.

BTYD is implemented using:

  • The BG/NBD (Beta-Geometric/Negative Binomial Distribution) model fits on transactional history curves describing the distribution of purchase frequencies and engagement drop-off following a prior purchase.
  • The Gamma-Gamma model is used to predict average spend per transaction.

Expected Input

This notebook takes the same input format to RFM analysis notebook, a transaction table specified by input_table option, having user_column , tstamp_column , and amount_column columns, as follows:

usertstampamount
31052859682011-04-05115
18509857342011-11-231037
2743828082011-04-2517
3582731442011-04-0260
.........

Treasure Data accepts various ISO-8601 datetime format supported by dateutil as well as unix timestamp for the tstamp column.

A CLTV notebook uses date-based frequency/recency for CLTV prediction. Timestamp information is processed at the resolution of 24 hours intervals. Hours, minutes, and seconds, are not used.

Expected Outcome

In output_table:

  • Note that frequency, recency, and tenure are measured in days.
  • The tenure represents the number of days since the customer initially issued a transaction.
  • CLTVs (monetary values) for the next 1, 3, 6, 12, 24 months are exported.
  • Percentile rank takes value between 0 and 100.
  • The automl_alive_prob represents probability of customers alive now.
customeridfrequencyrecencytenuremonetary_valueautoml_cltv_1mautoml_cltv_1m_pctileautoml_cltv_3mautoml_cltv_3m_pctileautoml_cltv_6mautoml_cltv_6m_pctileautoml_cltv_12mautoml_cltv_12m_pctileautoml_cltv_24mautoml_cltv_24m_pctileautoml_alive_prob
123476365367599.7016679.75382.83228.9782.83257.08682.832110.85682.832209.20382.8320.998192
123483283358301.483.27736.9899.73436.98919.18136.98937.24636.98970.28636.9890.990166
123526260296368.2566677.20372.58121.39472.58142.15772.58181.86372.581154.48672.6160.996345
123562303325269.9052.62826.4877.80526.48715.3826.48729.86426.45256.35326.4870.990548
123581149150683.26.63269.17619.69869.17638.80969.17675.34569.14142.12569.140.947
1235932743311941.69333319.74695.0958.65295.09115.57495.09224.42495.09423.50295.0540.991769
123602148200789.249.19280.8627.30180.8653.79580.86104.45280.86197.07980.860.984178
................................................

When audience_name is specified as an input parameter, the following values will be created as new attributes of the specified parent segment where Xm represents months later.

  • automl_cltv_Xm
  • automl_cltv_Xm_pctile
  • automl_cltv_segment
  • automl_alive_prob

The automl_cltv_segment splits automl_cltv_12m into five groups (very low/low/medium/high/very high) using quantiles, and then generates five CDP Segments.

Example Graph Outputs:

Workflow Example

Find a sample workflow in Treasure Boxes.

+run_cltv:
  ipynb>:
    notebook: CLTV
    input_table: ml_datasets.online_retail_txn
    output_table: ml_results.online_retail_cltv_result
    user_column: customerid
    tstamp_column: invoicedate
    amount_column: purchaseamount

Parameters

Parameter NameDescriptionRequiredData TypeDefault ValueExample value
input_tableSpecify a TD table used for CLTV prediction like dbname.table_nameyesstring (dbname.table_name)ml_dataset.td_txn
user_columnSpecify a column name for usernostringuseruser
tstamp_columnSpecify a column name for timestampnostringtstamptime
amount_columnSpecify a column name for transaction amount such as purchase amount. Numerical values expected for data in this column.nostringamountpurchase_amount
output_tableSpecify a TD table to export CLTV prediction result as dbname.table_namenostring (dbname.table_name)ml_output.rfm
discount_rateThe monthly adjusted discount rate in a range between 0.0 and 1.0. 0 means no adjustment.nofloat0.010.01
hide_table_contentsSuppress showing table contentsnobooleanfalsefalse
audience_nameAudience name to merge an attribute tablenostringNonemaster_segment_name
foreign_keyForeign key column name of a master segment used for Audience integration. user_column value is used if not set.nostringNonetd_canonical_id
segment_time_horizonTime horizon for CLTV segments in 1m/3m/6m/12m/24mnostring12m6m

The discount_rate parameter is based on the concept of DCF (discounted cash flow), where you discount the future monetary value by a discount rate to get the present value of that cash flow, adjusting for cost of capital. You can set 0 to avoid cost of capital.