# Cltv Prediction The CLTV Prediction solution notebook forecasts customer lifetime value (CLTV) using an open source Python library [Lifetimes](https://github.com/CamDavidsonPilon/lifetimes) for implementing the [Buy Till You Die (BTYD)](https://en.wikipedia.org/wiki/Buy_Till_you_Die) model. ![](/assets/112066640.443827cec104e35bc89a87cb7fb1f8ad148fdae7ac73349fd4e927d7adef41b5.3cb60505.png) BTYD is implemented using: * The [BG/NBD](https://brucehardie.com/papers/018/fader_et_al_mksc_05.pdf) (Beta-Geometric/Negative Binomial Distribution) model fits on transactional history curves describing the distribution of purchase frequencies and engagement drop-off following a prior purchase. * The [Gamma-Gamma model](https://www.brucehardie.com/notes/025/gamma_gamma.pdf) is used to predict average spend per transaction. ## Expected Input This notebook takes the same input format to RFM analysis notebook, a transaction table specified by *`input_table`* option, having *`user_column`* , *`tstamp_column`* , and *`amount_column`* columns, as follows: | user | tstamp | amount | | --- | --- | --- | | 3105285968 | 2011-04-05 | 115 | | 1850985734 | 2011-11-23 | 1037 | | 274382808 | 2011-04-25 | 17 | | 358273144 | 2011-04-02 | 60 | | ... | ... | ... | Treasure Data accepts various [ISO-8601 datetime format](https://en.wikipedia.org/wiki/ISO_8601) supported by [dateutil](https://dateutil.readthedocs.io/en/stable/parser.md) as well as unix timestamp for the *`tstamp`* column. A CLTV notebook uses date-based frequency/recency for CLTV prediction. Timestamp information is processed at the resolution of 24 hours intervals. Hours, minutes, and seconds, are not used. ## Expected Outcome In *output_table:* * Note that *frequency,* *recency, and tenure* are measured in days. * The *`tenure`* represents the number of days since the customer initially issued a transaction. * CLTVs (monetary values) for the next 1, 3, 6, 12, 24 months are exported. * Percentile rank takes value between 0 and 100. * The *`automl_alive_prob`* represents probability of customers alive now. | customerid | frequency | recency | tenure | monetary_value | automl_cltv_1m | automl_cltv_1m_pctile | automl_cltv_3m | automl_cltv_3m_pctile | automl_cltv_6m | automl_cltv_6m_pctile | automl_cltv_12m | automl_cltv_12m_pctile | automl_cltv_24m | automl_cltv_24m_pctile | automl_alive_prob | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 12347 | 6 | 365 | 367 | 599.701667 | 9.753 | 82.832 | 28.97 | 82.832 | 57.086 | 82.832 | 110.856 | 82.832 | 209.203 | 82.832 | 0.998192 | | 12348 | 3 | 283 | 358 | 301.48 | 3.277 | 36.989 | 9.734 | 36.989 | 19.181 | 36.989 | 37.246 | 36.989 | 70.286 | 36.989 | 0.990166 | | 12352 | 6 | 260 | 296 | 368.256667 | 7.203 | 72.581 | 21.394 | 72.581 | 42.157 | 72.581 | 81.863 | 72.581 | 154.486 | 72.616 | 0.996345 | | 12356 | 2 | 303 | 325 | 269.905 | 2.628 | 26.487 | 7.805 | 26.487 | 15.38 | 26.487 | 29.864 | 26.452 | 56.353 | 26.487 | 0.990548 | | 12358 | 1 | 149 | 150 | 683.2 | 6.632 | 69.176 | 19.698 | 69.176 | 38.809 | 69.176 | 75.345 | 69.14 | 142.125 | 69.14 | 0.947 | | 12359 | 3 | 274 | 331 | 1941.693333 | 19.746 | 95.09 | 58.652 | 95.09 | 115.574 | 95.09 | 224.424 | 95.09 | 423.502 | 95.054 | 0.991769 | | 12360 | 2 | 148 | 200 | 789.24 | 9.192 | 80.86 | 27.301 | 80.86 | 53.795 | 80.86 | 104.452 | 80.86 | 197.079 | 80.86 | 0.984178 | | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | When *`audience_name`* is specified as an input parameter, the following values will be created as new attributes of the specified parent segment where `Xm` represents months later. * *automl_cltv_Xm* * *automl_cltv_Xm_pctile* * *automl_cltv_segment* * *automl_alive_prob* The *`automl_cltv_segment`* splits *`automl_cltv_12m`* into five groups (very low/low/medium/high/very high) using quantiles, and then generates five CDP Segments. ### Example Graph Outputs: ![](/assets/112066641.2a1f1649e3d01a383b836d3c906c90ec5ec10a0de7eddc21e89289f7b17c43b4.3cb60505.png) ![](/assets/112066642.3dd47bd3d02ea9eceab9c71758eb5b4c99c56c9b88635e9cdf8678b6386986ae.3cb60505.png) ## Workflow Example Find a sample workflow in [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/cltv.dig). ```yaml +run_cltv: ipynb>: notebook: CLTV input_table: ml_datasets.online_retail_txn output_table: ml_results.online_retail_cltv_result user_column: customerid tstamp_column: invoicedate amount_column: purchaseamount ``` ## Parameters | Parameter Name | Description | Required | Data Type | Default Value | Example value | | --- | --- | --- | --- | --- | --- | | input_table | Specify a TD table used for CLTV prediction like dbname.table_name | yes | string (dbname.table_name) | | ml_dataset.td_txn | | user_column | Specify a column name for user | no | string | user | user | | tstamp_column | Specify a column name for timestamp | no | string | tstamp | time | | amount_column | Specify a column name for transaction amount such as purchase amount. Numerical values expected for data in this column. | no | string | amount | purchase_amount | | output_table | Specify a TD table to export CLTV prediction result as dbname.table_name | no | string (dbname.table_name) | | ml_output.rfm | | discount_rate | The monthly adjusted discount rate in a range between 0.0 and 1.0. 0 means no adjustment. | no | float | 0.01 | 0.01 | | hide_table_contents | Suppress showing table contents | no | boolean | false | false | | audience_name | Audience name to merge an attribute table | no | string | None | master_segment_name | | foreign_key | Foreign key column name of a master segment used for Audience integration. user_column value is used if not set. | no | string | None | td_canonical_id | | segment_time_horizon | Time horizon for CLTV segments in 1m/3m/6m/12m/24m | no | string | 12m | 6m | The *`discount_rate`* parameter is based on the concept of [DCF (discounted cash flow)](https://en.wikipedia.org/wiki/Discounted_cash_flow), where you discount the future monetary value by a discount rate to get the present value of that cash flow, adjusting for cost of capital. You can set 0 to avoid cost of capital.