[7. Churn Prediction](flow_zone:default) zone builds the churn prediction model using labeled data and customer-product-month level features. The model predicts the likelihood that a customer will churn from a given product in the near future. Once trained, the model is applied to new data to produce churn probabilities and predictions, which are later consumed by dashboards and action frameworks.

![Screenshot 2025-08-14 at 15.01.38.png](s1aRSCxmCncx)

## Training Data Preparation
 **Note:** Churn datasets in retail banking are often ** highly imbalanced** , where only a small percentage of customer-product relationships result in churn. This imbalance can negatively impact model training, especially for classification algorithms, which tend to favor the majority class i.e. non-churned.

To mitigate this, we perform undersampling to generate a  **balanced training set** using a series of visual recipes before the modeling step:

The first split recipe is to split the modelling (known labels) and inference datasets (unknown labels)
Afterwards, we are applying a sampling strategy ensures better model learning and avoids bias toward the majority class

- The  **split recipe**  divides the dataset into:
   train_data_1: All observations labeled as churned (modeled_churn = 1)
   train_data_0: All observations labeled as not churned (modeled_churn = 0)
 - The  **filter recipe**  is then used to randomly select a fixed number of rows (e.g., 100,000) from train_data_0, i.e., the non-churned class.
- The  **stack recipe**  is used to combine the sampled non-churn rows with all churn rows, creating a balanced dataset (modeling_data_prepared) used for model training.

## Model Training Process

The Binary classification model is trained on modeling_data_prepared using historical behavioral, transactional, and portfolio features.This model outputs both:
- A binary churn prediction (0 or 1) for each customer-product relationship.
- A churn probability score (0.0 to 1.0), which can be used to rank risk and prioritize retention actions.

## Hyperparameter optimisation
[Quick modelling setup](analysis:KlseT3SG) can be used to improve the model efficacy by: 
- Hyperparameter tuning
- Class imbalance handling (e.g., class weights, sampling)
- Cross-validation and metrics evaluation

Model evaluation is done here on **ROC AUC** and **cost-matrix**  to handle binary classification models with imbalanced dataset in the banking domain. 

- We used ROC AUC for model tuning, which evaluates the model's ability to distinguish positives and negatives.

- For threshold selection, we applied the cost matrix to reflect real-world business impact, where missing a churner is more expensive than falsely flagging a loyal customer.

 **Note:**  The above model training should be optimised suiting to the business needs and the available data to predict better churn probabilities, and the results can be consumed by dashboards and action frameworks.