# RFM Analysis

Recency, frequency, monetary value (RFM) is a marketing analysis model that segments a company's customer base by purchasing patterns. Specifically, it evaluates a customer's recency (how long ago they made a purchase), frequency (how often they make purchases), and monetary value (how much money they spend). Ranking by quantile is performed for each recency, frequency, and monetary value. Customers are clustered by RFM scores using the k-means algorithm, into categories such as Loyalists, Champions, and Lost Customers.

## Expected Input

This notebook runs RFM analysis on a given transaction table specified by  *`input_table`* option. The expected input table schema depends on the parameter  *aggregated_input*.

When *aggregated_input* is set to *False* , the *input_table* schema is expected as follows:

| user | tstamp | amount |
|  --- | --- | --- |
| 3105285968 | 2011-04-05 | 115 |
| 1850985734 | 2011-11-23 | 1037 |
| 274382808 | 2011-04-25 | 17 |
| 358273144 | 2011-04-02 | 60 |
| ... | ... | ... |


For each user,

* Recency is defined by days since the latest tstamp
* Frequency is defined by number of transactions
* Monetary is defined by total amount


The max *tstamp* value in  *`input_table`* is used for the reference date to define recency.

When *aggregated_input* is set to *True* , the *input_table* schema is expected as follows:

| **user** | **recency** | **frequency** | **monetary_value** |
|  --- | --- | --- | --- |
| 3105285968 | 10 | 3 | 200.12 |
| 1850985734 | 20 | 5 | 500.3 |
| 274382808 | 30 | 1 | 50.4 |
| ... | ... | ... |  |


If aggregated_input is False, missing value handling is done. If `aggregated_input is True`, missing value handling is not done and you would have to do the preprocessing on your own beforehand.

## Expected Outcome

Ranking by quantile is performed for each *recency* , *frequency* , and *monetary_value*. Quantile ranks in r_quartile/f_quartile/m_quartile ranges from 1 to 4 and the higher the value, the better it is. RFM score is defined by `(r_quartile + f_quartile + m_quartile) / 3`.

`Sample _output_table_`:

| 61612 | 3181 | 4 | 4115 | 3 | 1 | 4 | R3F1M4 | 2.6667 | Promising | 0 | R3F2M2 |
|  --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 9549 | 3227 | 8 | 1456 | 2 | 3 | 2 | R2F3M2 | 2.3333 | Need attention | 1 | R1F1M1 |
| 21495 | 3326 | 6 | 2770 | 1 | 2 | 3 | R1F2M3 | 2.0 | High Value Sleeping | 0 | R3F2M2 |
| 39408 | 3290 | 9 | 4893 | 2 | 4 | 4 | R2F4M4 | 3.3333 | Cannot lose them | 0 | R3F2M2 |
| 8120 | 3216 | 5 | 1056 | 2 | 1 | 1 | R2F1M1 | 1.3333 | Hibernating | 1 | R1F1M1 |
| 45110 | 3818 | 2 | 133 | 1 | 1 | 1 | R1F1M1 | 1.0 | Lost customers | 1 | R1F1M1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| user | recency | frequency | monetary_value | r_quartile | f_quartile | m_quartile | rfm_quartile | rfm_score | rfm_segment | rfm_cluster_id | rfm_cluster_rank |


`Users are clustered by RFM scores using the k-means algorithm. `*`rfm_cluster_rank`* explains cluster traits. When cluster size is 3, the cluster `R3F3M3` represents that *recency* , *frequency* , and *monetary_value* are high and it is a promising cluster. Whereas, `R1F1M1` means that recency, frequency, and *monetary_values* are low.

An optimal number of clusters is automatically calculated from *min_clusters* to *max_clusters*. You can override this calculation by setting the *num_clusters* option.

When  *`audience_name`* is specified, *recency, frequency, monetary_values,__`r_quartile, f_quartile, m_quartile, rfm_score, rfm_segment, rfm_cluster_rank`* in  *`output_table`* are attached as attribute columns of the Master Segment and Segments using  *`rfm_segment`* and  *`rfm_cluster_rank`* are being defined.

*rfm_segment* names follow an industy-standard naming used by multiple players. See [this treemap and pseudo code](https://gist.github.com/myui/0d86fc50cacf4a813393e21eca23bca5#file-rfm_treemap-svg) for *rfm_segment* naming rules.

See the following table for the description of 10 RFM segments.

| Segment name | Description | RFM Quartile values |
|  --- | --- | --- |
| Champions | Ideal customers. Bought recently, buys often and spends the most. | R4F4M4 |
| Loyal Customers | Spends good money. Responsive to promotions. These are very active and very valuable customers. | R4F4M3, R4F3M4, R4F3M3, R3F4M4, R3F4M3, R3F3M4, R3F3M3 |
| Potential Loyalists | Recent customers, spent good amount, bought more than once. | R4F4M2, R4F3M2, R4F2M4, R4F2M3, R4F2M2, R3F4M2, R3F3M2, R3F2M4, R3F2M3, R3F2M2 |
| Promising | Recent shoppers bought relatively recently for high value, or frequently. So, a promising customer segment. | R4F4M1, R4F3M1, R4F2M1, R4F1M4, R4F1M3, R4F1M2, R3F4M1, R3F3M1, R3F2M1, R3F1M4, R3F1M3, R3F1M2 |
| New Customers | Bought recently, but not often. | R4F1M1, R3F1M1 |
| Cannot lose them | Made big purchases and often, but long time ago. Large revenue comes for this segment and better not lose their attention. | R2F4M4, R2F4M3, R2F3M4, R2F3M3, R2F2M4, R2F2M3, R2F1M4, R2F1M3 |
| Need attention | Potential loyalist is losing attention. So, needs attention. | R2F4M2, R2F3M2, R2F2M2 |
| Hibernating | Low spenders, low frequency, purchased long time ago. Not worth give attention to them. | R2F4M1, R2F3M1, R2F2M1, R2F1M2, R2F1M1 |
| High Value Sleeping | Past potential loyalist sleeping. Worth awaking their losing interests again. | R1F4M4, R1F4M3, R1F4M2, R1F3M4, R1F3M3, R1F3M2, R1F2M4, R1F2M3, R1F2M2, R1F1M4, R1F1M3 |
| Lost customers | Lowest recency, frequency, and monetary scores. This segment has the lowest priority. | R1F4M1, R1F3M1, R1F2M1, R1F1M2, R1F1M1 |


### Example Graph Outputs:

![image](/assets/249718477-45bf1e00-7438-4eaf-aded-b380891f9807.4339c4d9d9c3deb6e7284522549c73d8ea6e6e3a479f1eedc5a47e228342b05b.3cb60505.png)

![image](/assets/249718519-9e979577-7e18-48da-89b9-a4a229c7233c.20b0216e9f472269bdd2a17fe4bd45615a3e0682b114ced7ea302252d5c38b99.3cb60505.png)

![Screen Shot 2023-06-29 at 17 35 08](/assets/249718706-4c40d950-0b6b-4807-ad9f-0d43449d8c46.5eda8ed5c6d41cc0daf970567d784f4a0adfef00cc22248ca4800d0332ec21ba.3cb60505.png)

## Workflow Example

Find a sample workflow in [Treasure Boxes](https://github.com/treasure-data/treasure-boxes/blob/automl/machine-learning-box/automl/rfm.dig).


```yaml
+rfm:
  ipynb>:
    notebook: RFM
    input_table: ml_datasets.cosmetics_store
    user_column: user_id
    tstamp_column: event_time
    amount_column: price
```

## Parameters

| parameter name | description | required | data type | default value | example value |
|  --- | --- | --- | --- | --- | --- |
| input_table | specify a TD table used for RFM analysis like dbname.table_name | yes | string (dbname.table_name) |  |  |
| ml_dataset.td_rfm |  |  |  |  |  |
| user_column | specify a column name for user | no | string | user | user |
| tstamp_column | specify a column name for timestamp | no | string | tstamp | time |
| amount_column | specify a column name for transaction amount such as purchase amount | no | string | amount | purchase_amount |
| output_table | specify a TD table to export RFM result as dbname.table_name | no | string (dbname.table_name) |  |  |
| ml_output.rfm |  |  |  |  |  |
| min_clusters | specify a minimum number of clusters | no | integer | 2 | 5 |
| max_clusters | specify a maximum number of clusters | no | integer | 8 | 25 |
| num_clusters | specify a fixed number of clusters | no | integer | None | 3 |
| hide_table_contents | suppress showing table contents | no | boolean | false | false |
| audience_name | Audience name to merge an attribute table | no | string | None | master_segment_name |
| foreign_key | foreign key column name of a master segment used for Audience integration. user_column value is used if not set. | no | string | None | td_canonical_id |
| disable_clustering | disables clustering | no | boolean | false | false |
| aggregated_input | Allows to input aggregated table | no | boolean | false | false |