Skip to content

AlienNationRefugee/FinalProject

 
 

Repository files navigation

Project - Bank Marketing Campaign

In today's competitive marketing world, Banks come up with many schemes/packages to attract customers. It would be helpful for the banks to design a scheme if they knew what sector of customers would be intersted in their schemes, so that they can specifically target those customer groups or customize their scheme so that large customer group is included. The goal of our classification model is to predict if a customer would buy their term deposit or not. (where a term deposit is a deposit that a financial institution offers with a high fixed rate and a fixed maturity date). This is for Portuguese banking institution.

Input Data

Dataset is from the UCI Machine Learning Repository. Features of this dataset includes:

  • Age
  • Job type
  • Marital status
  • Education
  • Credit default status
  • Average yearly balance
  • Mortgage
  • Personal loan
  • Contact type
  • Day of the month
  • Month of the year
  • Call duration
  • Number of contacts
  • Days since last contact
  • Number of contacts
  • Previous campaign outcome
  • Output: Opened Term Deposit

Machine Learning Model Process

Using Pipeline and GridSearchCV with five folds we built five models:

  • K-Nearest Neighbor (KNN)
  • Support Vector Machine (SVC)
  • Random Forest (RFC)
  • Gradient Boosting (GBC)
  • Deep Neural Network (DNN)

Imbalanced data

The data sets are imbalanced, with 88% no and 12% yes. Balanced data set is created for training and comparing to the imbalanced data set.

Comparison of models and Results:

Results baseline (unbalanced) classes

Classifier Best Parameters Precision Recall Accuracy
KNN neighbors: 5, pca components: 28
  • 0
  • 1
  • 91%
  • 49%
  • 97%
  • 24%
88.73%
SVM pca components: 30, C: 1, gamma: 1e-07, kernel: linear
  • 0
  • 1
  • 91%
  • 61%
  • 98%
  • 26%
89.83%
Random Forest Classification pca components: 34, estimators: 200
  • 0
  • 1
  • 91%
  • 58%
  • 97%
  • 28%
89.61%
Gradient Boosting Classifier learning rate: 0.1, pca components: 39, estimators: 200
  • 0
  • 1
  • 92%
  • 55%
  • 96%
  • 36%
89.54%
DNN 1 input layer, 2 hidden layers and 1 output layer
  • 0
  • 1
  • 92%
  • 50%
  • 96%
  • 34%
88.73%

Results balanced classes

Classifier Best Parameters Precision Recall Accuracy
KNN neighbors: 5, pca components: 28
  • 0
  • 1
  • 75%
  • 80%
  • 80%
  • 75%
77.60%
SVM pca components: 30, C: 1, gamma: 1e-07, kernel: linear
  • 0
  • 1
  • 81%
  • 83%
  • 82%
  • 81%
81.82%
Random Forest Classification pca components: 34, estimators: 200
  • 0
  • 1
  • 83%
  • 83%
  • 82%
  • 84%
82.99%
Gradient Boosting Classifier learning rate: 0.1, pca components: 39, estimators: 200
  • 0
  • 1
  • 83%
  • 83%
  • 83%
  • 83%
82.96%
DNN 1 input layer, 2 hidden layers and 1 output layer
  • 0
  • 1
  • 88%
  • 83%
  • 81%
  • 90%
85.15%

Conclusion:

Deep Neural Network outperformed all other classifiers.

Balancing classes reduced accuracy but also reduced overfitting and increased recall.

Watch the video, click Video

About

This my final project for University of Toronto 3253 Machine Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%