Predicting Bank Customer Churn Using Machine Learning

Module: 25LLP132 – Principles of Artificial Intelligence and Data Analytics
Group: Creative Finance
Project: Predicting Bank Customer Churn Using Machine Learning
Dataset: Kaggle “Churn Modelling” (Churn_Modelling.csv)

This repository contains the reproducible pipeline used in our project .

Business Context

Customer churn is costly: retaining customers is usually cheaper than acquiring new ones.
Our goal is to identify high-risk customers early so the bank can apply targeted retention strategies.

Dataset Summary

Rows: 10,000 customers
Target: Exited (1 = churned, 0 = retained)
Removed non-predictive identifiers: RowNumber, CustomerId, Surname
Features include demographic, financial, and behavioural variables
Churn rate: ~20.37%, so evaluation focuses on Recall, F1, and PR-AUC (not accuracy alone)

Leakage-Safe Pipeline (Key Design)

Preprocess → SMOTE → Model

Preprocess: StandardScaler (numeric) + OneHotEncoder (categorical)
SMOTE is applied inside the pipeline, so oversampling happens only on training folds (prevents leakage)
Split: 70% train / 30% test (stratified)
Validation: split from training only (used for threshold exploration); test set untouched until final evaluation

Models Compared

Logistic Regression (baseline)
Decision Tree (interpretable)
Random Forest (300 trees)
Gradient Boosting (best overall performance in our run)

How to Run

1) Place the dataset

Download Churn_Modelling.csv from Kaggle and place it in the same folder as churn_pipeline.py.

Note: The dataset is not included in this repository to respect Kaggle licensing.

2) Install dependencies

pip install -r requirements.txt

### 3) Run
python churn_pipeline.py

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
churn_pipeline.py		churn_pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Bank Customer Churn Using Machine Learning

Business Context

Dataset Summary

Leakage-Safe Pipeline (Key Design)

Models Compared

How to Run

1) Place the dataset

2) Install dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Predicting Bank Customer Churn Using Machine Learning

Business Context

Dataset Summary

Leakage-Safe Pipeline (Key Design)

Models Compared

How to Run

1) Place the dataset

2) Install dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages