The CLTV Prediction solution notebook forecasts customer lifetime value (CLTV) using an open source Python library Lifetimes for implementing the Buy Till You Die (BTYD) model.

BTYD is implemented using:
- The BG/NBD (Beta-Geometric/Negative Binomial Distribution) model fits on transactional history curves describing the distribution of purchase frequencies and engagement drop-off following a prior purchase.
- The Gamma-Gamma model is used to predict average spend per transaction.
This notebook takes the same input format to RFM analysis notebook, a transaction table specified by input_table option, having user_column , tstamp_column , and amount_column columns, as follows:
| user | tstamp | amount |
|---|---|---|
| 3105285968 | 2011-04-05 | 115 |
| 1850985734 | 2011-11-23 | 1037 |
| 274382808 | 2011-04-25 | 17 |
| 358273144 | 2011-04-02 | 60 |
| ... | ... | ... |
Treasure Data accepts various ISO-8601 datetime format supported by dateutil as well as unix timestamp for the tstamp column.
A CLTV notebook uses date-based frequency/recency for CLTV prediction. Timestamp information is processed at the resolution of 24 hours intervals. Hours, minutes, and seconds, are not used.
In output_table:
- Note that frequency, recency, and tenure are measured in days.
- The
tenurerepresents the number of days since the customer initially issued a transaction. - CLTVs (monetary values) for the next 1, 3, 6, 12, 24 months are exported.
- Percentile rank takes value between 0 and 100.
- The
automl_alive_probrepresents probability of customers alive now.
| customerid | frequency | recency | tenure | monetary_value | automl_cltv_1m | automl_cltv_1m_pctile | automl_cltv_3m | automl_cltv_3m_pctile | automl_cltv_6m | automl_cltv_6m_pctile | automl_cltv_12m | automl_cltv_12m_pctile | automl_cltv_24m | automl_cltv_24m_pctile | automl_alive_prob |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12347 | 6 | 365 | 367 | 599.701667 | 9.753 | 82.832 | 28.97 | 82.832 | 57.086 | 82.832 | 110.856 | 82.832 | 209.203 | 82.832 | 0.998192 |
| 12348 | 3 | 283 | 358 | 301.48 | 3.277 | 36.989 | 9.734 | 36.989 | 19.181 | 36.989 | 37.246 | 36.989 | 70.286 | 36.989 | 0.990166 |
| 12352 | 6 | 260 | 296 | 368.256667 | 7.203 | 72.581 | 21.394 | 72.581 | 42.157 | 72.581 | 81.863 | 72.581 | 154.486 | 72.616 | 0.996345 |
| 12356 | 2 | 303 | 325 | 269.905 | 2.628 | 26.487 | 7.805 | 26.487 | 15.38 | 26.487 | 29.864 | 26.452 | 56.353 | 26.487 | 0.990548 |
| 12358 | 1 | 149 | 150 | 683.2 | 6.632 | 69.176 | 19.698 | 69.176 | 38.809 | 69.176 | 75.345 | 69.14 | 142.125 | 69.14 | 0.947 |
| 12359 | 3 | 274 | 331 | 1941.693333 | 19.746 | 95.09 | 58.652 | 95.09 | 115.574 | 95.09 | 224.424 | 95.09 | 423.502 | 95.054 | 0.991769 |
| 12360 | 2 | 148 | 200 | 789.24 | 9.192 | 80.86 | 27.301 | 80.86 | 53.795 | 80.86 | 104.452 | 80.86 | 197.079 | 80.86 | 0.984178 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
When audience_name is specified as an input parameter, the following values will be created as new attributes of the specified parent segment where Xm represents months later.
- automl_cltv_Xm
- automl_cltv_Xm_pctile
- automl_cltv_segment
- automl_alive_prob
The automl_cltv_segment splits automl_cltv_12m into five groups (very low/low/medium/high/very high) using quantiles, and then generates five CDP Segments.


Find a sample workflow in Treasure Boxes.
+run_cltv:
ipynb>:
notebook: CLTV
input_table: ml_datasets.online_retail_txn
output_table: ml_results.online_retail_cltv_result
user_column: customerid
tstamp_column: invoicedate
amount_column: purchaseamount| Parameter Name | Description | Required | Data Type | Default Value | Example value |
|---|---|---|---|---|---|
| input_table | Specify a TD table used for CLTV prediction like dbname.table_name | yes | string (dbname.table_name) | ml_dataset.td_txn | |
| user_column | Specify a column name for user | no | string | user | user |
| tstamp_column | Specify a column name for timestamp | no | string | tstamp | time |
| amount_column | Specify a column name for transaction amount such as purchase amount. Numerical values expected for data in this column. | no | string | amount | purchase_amount |
| output_table | Specify a TD table to export CLTV prediction result as dbname.table_name | no | string (dbname.table_name) | ml_output.rfm | |
| discount_rate | The monthly adjusted discount rate in a range between 0.0 and 1.0. 0 means no adjustment. | no | float | 0.01 | 0.01 |
| hide_table_contents | Suppress showing table contents | no | boolean | false | false |
| audience_name | Audience name to merge an attribute table | no | string | None | master_segment_name |
| foreign_key | Foreign key column name of a master segment used for Audience integration. user_column value is used if not set. | no | string | None | td_canonical_id |
| segment_time_horizon | Time horizon for CLTV segments in 1m/3m/6m/12m/24m | no | string | 12m | 6m |
The discount_rate parameter is based on the concept of DCF (discounted cash flow), where you discount the future monetary value by a discount rate to get the present value of that cash flow, adjusting for cost of capital. You can set 0 to avoid cost of capital.