Random Forest Model for Loan Action Prediction

This project contains a complete machine learning pipeline that trains a Random Forest classifier to predict loan actions based on mortgage/loan application data.

Files Created

Main Scripts

random_forest_model.py - Main script that loads data, preprocesses it, trains the Random Forest model, and evaluates performance
model_analysis.py - Analysis script that loads the trained model and provides additional insights
requirements.txt - Python package dependencies

Generated Files (after running)

random_forest_model.pkl - Saved trained Random Forest model
label_encoders.pkl - Saved label encoders for categorical variables
feature_importance.png - Feature importance plot (if matplotlib display is available)

Dataset Structure

The project uses three CSV files:

TrainingSet.csv (60,000 samples) - Used to train the model
TestSet.csv (20,000 samples) - Used to test model performance
ValidationSet.csv (20,000 samples) - Used for additional validation

Target Variable: `action_taken`

Code 1: Loan originated (50.9% of test data)
Code 2: Application approved but not accepted (1.2%)
Code 3: Application denied (20.8%)
Code 4: Application withdrawn by applicant (12.1%)
Code 5: File closed for incompleteness (1.1%)
Code 6: Purchased loan (13.9%)
Code 8: Preapproval request denied

Model Performance

Accuracy Scores

Test Accuracy: 98.15%
Validation Accuracy: 98.07%
Cross-validation Accuracy: 98.06% (±0.22%)

Top Important Features

hoepa_status (13.7% importance) - High-cost mortgage indicator
denial_reason_1 (8.1% importance) - Primary reason for denial
initially_payable_to_institution (6.5% importance) - Institution payment indicator
interest_rate (5.4% importance) - Loan interest rate
applicant_credit_score_type (5.4% importance) - Type of credit score used

How to Run

1. Install Dependencies

pip install -r requirements.txt

2. Train and Test the Model

python random_forest_model.py

3. Analyze Results

python model_analysis.py

Model Details

Random Forest Configuration

Number of trees: 100
Maximum depth: 10 (to prevent overfitting)
Minimum samples to split: 5
Minimum samples in leaf: 2
Features: 98 (after preprocessing)

Data Preprocessing

Missing Value Handling: Median for numeric, mode for categorical
Categorical Encoding: Label encoding for all categorical variables
Feature Engineering: Automatic detection of numeric vs categorical columns

Model Evaluation

Classification report with precision, recall, and F1-scores
Confusion matrix analysis
5-fold cross-validation
Feature importance ranking

Key Insights

High Performance: The model achieves excellent accuracy (98%+) across all evaluation metrics
Feature Importance: HOEPA status and denial reasons are the most predictive features
Class Imbalance: Some action types (like Code 5) have fewer samples and are harder to predict
Robust Model: Consistent performance across training, validation, and test sets

Next Steps

To improve the model further, consider:

Hyperparameter tuning using GridSearchCV or RandomizedSearchCV
Handling class imbalance with techniques like SMOTE or class weights
Feature selection to reduce dimensionality
Ensemble methods combining multiple algorithms
Deep learning approaches for complex feature interactions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Random Forest Model for Loan Action Prediction

Files Created

Main Scripts

Generated Files (after running)

Dataset Structure

Target Variable: `action_taken`

Model Performance

Accuracy Scores

Top Important Features

How to Run

1. Install Dependencies

2. Train and Test the Model

3. Analyze Results

Model Details

Random Forest Configuration

Data Preprocessing

Model Evaluation

Key Insights

Next Steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
TestSet.csv		TestSet.csv
TrainingSet.csv		TrainingSet.csv
ValidationSet.csv		ValidationSet.csv
feature_importance.png		feature_importance.png
label_encoders.pkl		label_encoders.pkl
model_analysis.py		model_analysis.py
random_forest_model.pkl		random_forest_model.pkl
random_forest_model.py		random_forest_model.py
requirements.txt		requirements.txt

attentiondotnet/train_fandom_forest

Folders and files

Latest commit

History

Repository files navigation

Random Forest Model for Loan Action Prediction

Files Created

Main Scripts

Generated Files (after running)

Dataset Structure

Target Variable: action_taken

Model Performance

Accuracy Scores

Top Important Features

How to Run

1. Install Dependencies

2. Train and Test the Model

3. Analyze Results

Model Details

Random Forest Configuration

Data Preprocessing

Model Evaluation

Key Insights

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Target Variable: `action_taken`

Packages