In today's competitive marketing world, Banks come up with many schemes/packages to attract customers. It would be helpful for the banks to design a scheme if they knew what sector of customers would be intersted in their schemes, so that they can specifically target those customer groups or customize their scheme so that large customer group is included. The goal of our classification model is to predict if a customer would buy their term deposit or not. (where a term deposit is a deposit that a financial institution offers with a high fixed rate and a fixed maturity date). This is for Portuguese banking institution.
Dataset is from the UCI Machine Learning Repository. Features of this dataset includes:
- Age
- Job type
- Marital status
- Education
- Credit default status
- Average yearly balance
- Mortgage
- Personal loan
- Contact type
- Day of the month
- Month of the year
- Call duration
- Number of contacts
- Days since last contact
- Number of contacts
- Previous campaign outcome
- Output: Opened Term Deposit
Using Pipeline and GridSearchCV with five folds we built five models:
- K-Nearest Neighbor (KNN)
- Support Vector Machine (SVC)
- Random Forest (RFC)
- Gradient Boosting (GBC)
- Deep Neural Network (DNN)
The data sets are imbalanced, with 88% no and 12% yes. Balanced data set is created for training and comparing to the imbalanced data set.
| Classifier | Best Parameters | Precision | Recall | Accuracy | |
|---|---|---|---|---|---|
| KNN | neighbors: 5, pca components: 28 |
|
|
|
88.73% |
| SVM | pca components: 30, C: 1, gamma: 1e-07, kernel: linear |
|
|
|
89.83% |
| Random Forest Classification | pca components: 34, estimators: 200 |
|
|
|
89.61% |
| Gradient Boosting Classifier | learning rate: 0.1, pca components: 39, estimators: 200 |
|
|
|
89.54% |
| DNN | 1 input layer, 2 hidden layers and 1 output layer |
|
|
|
88.73% |
| Classifier | Best Parameters | Precision | Recall | Accuracy | |
|---|---|---|---|---|---|
| KNN | neighbors: 5, pca components: 28 |
|
|
|
77.60% |
| SVM | pca components: 30, C: 1, gamma: 1e-07, kernel: linear |
|
|
|
81.82% |
| Random Forest Classification | pca components: 34, estimators: 200 |
|
|
|
82.99% |
| Gradient Boosting Classifier | learning rate: 0.1, pca components: 39, estimators: 200 |
|
|
|
82.96% |
| DNN | 1 input layer, 2 hidden layers and 1 output layer |
|
|
|
85.15% |
Deep Neural Network outperformed all other classifiers.
Balancing classes reduced accuracy but also reduced overfitting and increased recall.

