Skip to content

Case Study: Online Retail Sales Analysis #20

@Pankaj-Str

Description

@Pankaj-Str

This case study helps you build strong skills in data cleaning, data manipulation, aggregation, feature engineering, and exploratory data analysis (EDA) using Pandas. You will analyze an online retail sales dataset to understand customer purchasing behavior, sales performance, and revenue trends.


Problem Statement

You are working as a data analyst for an e-commerce company. Management wants insights into sales trends, customer behavior, product performance, and potential revenue issues to improve business decisions.

You have been provided with a dataset named online_retail_sales.csv, which contains detailed records of customer orders.


Dataset Overview (online_retail_sales.csv)

Columns Description

Column Name Data Type Description
Order_ID int Unique order identifier
Customer_ID int Unique customer identifier
Order_Date str Date of order (YYYY-MM-DD)
Product str Name of the product
Category str Product category (Electronics, Clothing, Home, etc.)
Quantity int Number of units purchased
Unit_Price float Price per unit (USD)
Total_Amount float Total order amount
Payment_Method str Credit Card, UPI, PayPal, Cash, etc.
Order_Status str Completed, Cancelled, Returned
City str Customer city

Assignment Tasks

1. Data Exploration and Cleaning

  1. Load the dataset into a Pandas DataFrame and display the first 5 rows.
  2. Check the dataset shape, column names, data types, and summary statistics.
  3. Identify missing values in each column.
  4. Handle missing values appropriately:
    • Numerical columns: fill with mean or median.
    • Categorical columns: fill with mode or drop.
  5. Convert Order_Date into datetime format.
  6. Extract Year, Month, and Day from Order_Date into new columns.

2. Data Selection and Filtering

  1. Retrieve all orders placed in March 2024.
  2. Find all orders where:
    • Total_Amount > 1000
    • Category is Electronics.
  3. Select only Completed orders.
  4. Retrieve orders placed from Mumbai or Delhi.

3. Data Manipulation and Feature Engineering

  1. Create a new column Discounted_Amount:
    • Apply a 10 percent discount on orders where Total_Amount > 800.
  2. Create a column Price_Level based on Total_Amount:
    • Low: Below 200
    • Medium: 200 to 800
    • High: Above 800
  3. Create a new column Revenue_Per_Item:
    • Revenue_Per_Item = Total_Amount / Quantity
  4. Drop the Product column if more than 40 percent of values are missing.

4. Aggregation and Business Insights

  1. Find the total revenue per Category.
  2. Calculate the average order value per Payment_Method.
  3. Identify the top 5 cities by total sales.
  4. Find the number of cancelled and returned orders per Category.
  5. Determine the monthly sales trend.

5. Customer Behavior Analysis

  1. Identify customers who placed more than 5 orders in a single month.
  2. Find customers whose average order value exceeds 1500.
  3. Identify customers who placed orders from multiple cities.
  4. Find repeat customers with more than one order.

6. Anomaly and Risk Indicators

  1. Identify orders where:
    • Quantity is greater than 20
    • Total_Amount is less than 100
  2. Find returned orders with Total_Amount > 2000.
  3. Detect customers who placed multiple orders on the same day.

7. Data Merging and Joining

You are given another dataset named customer_details.csv with the following columns:

Column Name Description
Customer_ID Unique customer identifier
Age Customer age
Gender Male or Female
Membership_Level Silver, Gold, or Platinum

Tasks

  1. Merge online_retail_sales.csv with customer_details.csv using Customer_ID.
  2. Find the average order value by Membership_Level.
  3. Identify the membership group generating the highest revenue.
  4. Analyze purchasing behavior by age group.

Bonus Tasks (Optional)

  • Visualize monthly revenue using a line chart.
  • Create a bar chart for revenue by category.
  • Plot the top 10 customers by total spending.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions