This case study analyzes 43,000+ Amazon delivery records from March 2022 to identify key factors affecting on-time delivery performance. Using the 6-step Data Analysis Life Cycle (Ask, Prepare, Process, Analyze, Share, Act), the analysis uncovers critical insights about traffic impact, agent performance, and weather effects on delivery times.
Key Finding: Traffic jam conditions cause 22 additional minutes of delay compared to normal traffic, presenting the highest opportunity for operational improvement.
Analyze shipment data to understand current on-time delivery performance and identify key factors (traffic, weather, agent rating) influencing delivery time to recommend efficiency improvements.
Stakeholders: Dispatch Team, Logistics Manager
- Source: Kaggle (publicly available Amazon delivery dataset)
- Records: 43,000 initial deliveries → 30,624 cleaned records
- Timeframe: March 2022
- Key Columns: Order_ID, Delivery_Time, Traffic_Conditions, Weather, Agent_Rating
Defined business question: What factors most significantly impact delivery times?
- Data source: Kaggle public dataset
- 43,000 initial records
- Data limitations documented
Data Cleaning in Excel:
- Filtered to March 2022 records only
- Removed 35 invalid rows (Agent_Rating > 5)
- Replaced 75 NaN cells with "Unknown"
- Verified: 0 duplicates, 0 empty values in key columns
- Final: 30,624 clean records
Used Excel pivot tables and AVERAGEIF functions to calculate:
- Average delivery time by traffic condition
- Average delivery time by agent rating
- Average delivery time by weather condition
Key Findings:
| Factor | Best Performance | Worst Performance | Difference |
|---|---|---|---|
| Traffic | Low Traffic: 101 min | Traffic Jam: 147 min | +46 min |
| Agent Rating | 4.5-5.0: 110 min | 1.0-2.0: 155 min | +45 min |
| Weather | Clear: 118 min | Fog: 152 min | +34 min |
- Dashboard: View Google Slides Presentation
- Visualization outputs saved as PNG files (see Visualizations folder)
Recommendations:
- Validate traffic data classification (data quality issue identified)
- Document high-performing agent tactics for team training
- Integrate traffic-aware routing into dispatch algorithms
- Increase overnight/early-morning delivery shifts to avoid peak traffic
- Data Cleaning: Filter, sort, error detection, null value handling
- Data Analysis: Pivot Tables, AVERAGEIF, conditional analysis
- Data Visualization: Chart creation, trend analysis
- Documentation: Case study write-up, methodology explanation
- Problem-Solving: Data quality issue identification and recommendation
amazon_delivery.xlsx- Complete dataset with cleaning log and pivot tablesReadme Amazon Delivery optimization project.Comprehensive case study report following the complete 6-step Data Analysis Life Cycle (Ask, Prepare, Process, Analyze, Share, Act). Documents the full methodology from business objective definition through data cleaning validation and actionable recommendations.Visualizations/- Charts and graphs supporting findingsREADME.md- This file
- Traffic is the dominant factor affecting delivery times (46-minute impact)
- Data quality issue discovered: Traffic jam volume nearly equals low traffic volume, suggesting possible logging error
- Agent performance matters: High-rated agents (4.5-5) deliver ~45 minutes faster
- Weather is secondary: Only 34-minute variance between best and worst conditions
This project reinforced the importance of:
- Thorough data cleaning and validation before analysis
- Critical evaluation of data quality (discovering the traffic anomaly)
- Presenting findings to non-technical stakeholders
- Recommending actionable solutions based on data insights
- LinkedIn: https://www.linkedin.com/in/kims009/
- Email: kimskh009@gmail.com
Project completed as capstone for Google Data Analytics Certificate