A supervised machine learning project focused on classifying vehicles based solely on geometric features extracted from their silhouettes.
This repository contains the full workflow: data exploration, preprocessing, model training, evaluation, cross-validation, stacking, and comparative analysis across multiple algorithms. Everything is implemented inside a single, well-structured Jupyter Notebook.
This project explores how geometric silhouette features can be used to classify vehicles into three categories:
- Car
- Van
- Bus
The dataset contains numerical descriptors extracted from the outline of each vehicle, captured from different angles. These features encode geometric properties such as compactness, circularity, aspect ratios, rectangularity, variance measures, skewness, and hollows ratio.
The goal is to build a supervised machine learning model capable of predicting the correct vehicle class based solely on these silhouette-based measurements.
- Full end‑to‑end supervised machine learning pipeline
- Extensive Exploratory Data Analysis with visual and statistical insights
- Robust preprocessing: imputation, scaling, encoding, redundancy analysis
- Evaluation of multiple model variants (basic + regularized)
- Ensemble methods (Random Forest, Gradient Boosting, XGBoost) among top performers
- Assessment of cross‑validation and stacking to determine their usefulness for this task
- Detailed confusion matrices (3×3 and pairwise 2×2)
- Clean, reproducible, and well‑structured repository
Dataset used: Vehicle Silhouettes Dataset (Kaggle)
🔗 https://www.kaggle.com/datasets/rajansharma780/vehicle
class→ {car, van, bus}
Numerical geometric descriptors including:
compactness, circularity, distance_circularity, radius_ratio, pr.axis_aspect_ratio, max.length_aspect_ratio, scatter_ratio, elongatedness, pr.axis_rectangularity, max.length_rectangularity, scaled_variance, scaled_radius_of_gyration, skewness_about (x3), hollows_ratio.
A chain of car repair shops requested a model capable of identifying the type of vehicle based on its silhouette.
The task is a multiclass classification problem: predict whether a silhouette corresponds to a car, van, or bus.
This model could support automated intake systems, vehicle identification pipelines, or pre‑processing stages in computer vision workflows.
All steps are implemented inside a single Jupyter Notebook (VehiclesilhouetteClassificationProject.ipynb), organized into clear sections.
- Data Loading and Initial Inspection
- Data Cleaning and Preprocessing
- Exploratory Data Analysis (EDA)
- Feature Redundancy Analysis
- Train/Test Split
- Model Training
- Model Evaluation
- Cross‑Validation Assessment
- Stacking Assessment
- Conclusions
Ensemble methods such as Random Forest, Gradient Boosting, and XGBoost consistently achieved the highest F1‑scores and ROC AUC values.
Margin‑based models (SVM, Logistic Regression) also performed strongly.
Simpler models like Decision Tree and kNN showed lower generalization.
A stacking ensemble was also evaluated, achieving performance comparable to the best individual models.
Most misclassifications occurred in the car–van boundary, the most challenging separation.
- README.md
- requirements.txt
- .gitignore
- data/ (empty)
- notebooks/
- VehicleSilhouetteClassificationProject.ipynb
This project was developed by Maria Petralia (MaPi) as part of her Data Science & AI training journey.
With a background in Computer Science and experience in software and data solutions, she focuses on building clear, rigorous, and well‑documented machine learning workflows.
- GitHub: https://github.com/MapiAI
- LinkedIn: https://www.linkedin.com/in/mariapetralia/