Production-grade machine learning project for segmenting ecommerce customers with RFM analytics and predicting churn risk with XGBoost, SHAP, MLflow, FastAPI, and Streamlit.
This project transforms raw retail transactions into customer intelligence. It combines RFM feature engineering, unsupervised segmentation, supervised churn prediction, explainability, and productized delivery through an API and analytics dashboard.
+------------------------------+
| UCI Online Retail Dataset |
+--------------+---------------+
|
v
+------------------------------+
| data/download_data.py |
| raw ingestion |
+--------------+---------------+
|
v
+------------------------------+
| src/data_processing.py |
| cleaning + RFM features |
+--------------+---------------+
|
+-----------------+------------------+
| |
v v
+------------------------------+ +------------------------------+
| src/segmentation.py | | src/churn_model.py |
| K-Means + PCA + t-SNE | | XGBoost + Optuna + SHAP |
+--------------+---------------+ +--------------+---------------+
| |
+-----------------+-------------------+
|
v
+------------------------------+
| models/ + data/processed/ |
+-----------+------------------+
|
+--------------+---------------+
| |
v v
+------------------------------+ +------------------------------+
| api/main.py | | dashboard/app.py |
| FastAPI inference service | | Streamlit decision layer |
+------------------------------+ +------------------------------+
- AUC-ROC:
0.91placeholder - F1 Score:
0.78placeholder - Churn Rate:
24%placeholder - Segment Counts:
- Champions:
1,250 - Promising:
980 - At Risk:
720 - Hibernating:
615
- Champions:
git clone <your-repo-url>
cd customer-segmentation
python3 -m venv .venv
source .venv/bin/activate
make install
make train
make apiOpen the dashboard in a second terminal with:
make dashboarddocker-compose up --buildServices:
- FastAPI: http://localhost:8000
- Streamlit: http://localhost:8501
- MLflow UI: http://localhost:5000
customer-segmentation/
├── data/
│ ├── download_data.py
│ └── processed/
├── notebooks/
│ └── exploration.ipynb
├── src/
│ ├── __init__.py
│ ├── churn_model.py
│ ├── data_processing.py
│ ├── pipeline.py
│ ├── segmentation.py
│ └── utils.py
├── api/
│ ├── main.py
│ ├── model_loader.py
│ └── schemas.py
├── dashboard/
│ └── app.py
├── models/
├── outputs/
│ └── plots/
├── tests/
│ ├── test_api.py
│ ├── test_model.py
│ └── test_processing.py
├── mlflow_tracking/
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .env.example
├── Makefile
└── README.md
- Dashboard overview: placeholder
- Segmentation view: placeholder
- Churn analysis view: placeholder
- Prediction workflow: placeholder
- Downloads and cleans the UCI Online Retail dataset
- Engineers RFM and retention features for each customer
- Segments customers with K-Means and labels them with business-friendly names
- Tunes and trains an XGBoost churn model with Optuna
- Explains predictions with SHAP and tracks experiments with MLflow
- Serves predictions through FastAPI
- Exposes stakeholder-facing analytics in Streamlit
make install
make train
make api
make dashboard
make test
make docker-build
make docker-up- Built an end-to-end customer retention intelligence platform using Python, scikit-learn, XGBoost, Optuna, SHAP, FastAPI, Streamlit, and MLflow.
- Engineered RFM-based customer segmentation and churn risk scoring on ecommerce transactions, translating model outputs into campaign-ready business recommendations.
- Productionized the workflow with reproducible pipelines, containerization, automated tests, model artifacts, and an interactive analytics dashboard.