Skip to content

πŸ“– Repo Description: Exploratory data analysis (EDA) and machine learning on the Sample Superstore dataset. Includes: Data cleaning & preprocessing Sales, profit, discount, and regional visualizations (line charts, bar plots, scatter plots, pie charts, choropleth maps) Predictive modeling using Linear Regression for profit estimation

Notifications You must be signed in to change notification settings

HabibFaial-py/Superstore-Sales-Analysis-Profit-Prediction

Repository files navigation

πŸ“Š Superstore Sales Analysis & Profit Prediction πŸ“Œ Overview

This project performs Exploratory Data Analysis (EDA) and Profit Prediction on the Sample Superstore dataset. The goal is to uncover sales & profit trends, visualize business insights, and build a simple predictive model using Linear Regression.

πŸ—‚ Dataset

The dataset (Sample - Superstore.csv) contains information about customer orders including:

Order details: Order ID, Order Date, Ship Date, Ship Mode

Customer details: Customer ID, Segment, Region, State, City

Transaction details: Sales, Quantity, Discount, Profit

πŸ› οΈ Features

βœ”οΈ Data Cleaning & Preprocessing (missing values, duplicates, invalid data removal) βœ”οΈ Time-series sales trend analysis βœ”οΈ Visualizations using Matplotlib, Seaborn & Plotly βœ”οΈ Profit vs Discount correlation analysis βœ”οΈ Sales distribution by Region, Segment, and State βœ”οΈ Choropleth Map for sales across USA states βœ”οΈ Linear Regression model for profit prediction

πŸ“ˆ Visualizations

Line Chart: Sales trend over time

Bar Chart: Sales by Segment

Scatter Plot: Discount vs Profit

Pie Chart: Regional sales distribution

Choropleth Map: State-wise sales across the USA

πŸ€– Machine Learning

A simple Linear Regression model was applied to predict profit from sales:

X (Feature): Sales

y (Target): Profit

The model was trained and predictions were compared against actual profit values.

πŸ–ΌοΈ Sample Output Sales Profit Predicted_Profit 0 261.96 41.91 43.12 1 731.94 219.58 207.44 2 14.62 6.87 5.27 3 957.58 -383.03 272.45 4 22.37 2.51 6.75

πŸ§‘β€πŸ’» Tech Stack

Python (Pandas, NumPy, Scikit-learn)

Matplotlib & Seaborn (Data Visualization)

Plotly (Interactive Choropleth Map)

πŸš€ How to Run

Clone this repository:

git clone https://github.com/HabibFaisal-py/Superstore-Sales-Analysis-Profit-Prediction.git cd superstore-analysis

Install dependencies:

pip install -r requirements.txt

Run the script:

python superstore_analysis.py

πŸ“Œ Future Improvements

Add advanced ML models (Random Forest, XGBoost) for better predictions

Build a dashboard using Streamlit/Power BI for real-time analysis

Include customer segmentation using clustering

πŸ™Œ Acknowledgments

Dataset: Sample Superstore (Tableau Public / Kaggle)

About

πŸ“– Repo Description: Exploratory data analysis (EDA) and machine learning on the Sample Superstore dataset. Includes: Data cleaning & preprocessing Sales, profit, discount, and regional visualizations (line charts, bar plots, scatter plots, pie charts, choropleth maps) Predictive modeling using Linear Regression for profit estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages