Multi-file data processing with Pandas - Joins, DateTime, and String operations
Analysis of loan default risk using customer and loan data from multiple CSV files. Implements data joining, temporal analysis, and string cleaning operations.
- Read multiple CSV files (customers.csv, loans.csv)
- Join customer + loan data on customer_id
- Inner, left, and right joins
- Handle missing values
- Convert string dates to datetime
- Extract date components (year, month, day)
- Filter data by date ranges
- Calculate date differences
- Find recent loans (last 6 months)
- Clean city names:
.str.upper() - Remove whitespace:
.str.strip() - Pattern matching:
.str.contains() - Split full names
- Replace values: 'Bombay' β 'Mumbai'
- DTI ratio = (loan_emi / monthly_income) * 100
- Identify high-risk customers (DTI > 40%)
- Group by credit score ranges
- Python 3.x
- Pandas
- NumPy
β
Multi-file joining (merge operations)
β
DateTime handling (.dt accessor)
β
String manipulation (.str accessor)
β
Missing data handling
β
Feature calculation (DTI ratio)
- customers.csv: 2000 records
- loans.csv: 2000 records