A production-ready web application for automated data cleaning, exploratory data analysis (EDA), and baseline ML model training with downloadable reports.
- CSV Upload: Drag & drop CSV files with preview
- Automated Cleaning: Missing value handling, type inference, duplicate detection
- EDA Reports: Comprehensive data profiling with ydata-profiling
- ML Models: Train baseline models (scikit-learn) with metrics
- Download Artifacts: Cleaned CSV and interactive HTML reports
- Real-time Progress: Track job status with live updates
- Production-Ready: Docker Compose, Celery workers, PostgreSQL, Redis, MinIO/S3
- FastAPI: Modern Python web framework
- Celery: Distributed task queue for background jobs
- PostgreSQL: Metadata storage
- Redis: Cache and message broker
- MinIO: S3-compatible object storage (local dev)
- Libraries: pandas, scikit-learn, ydata-profiling, joblib
- React + TypeScript: Modern UI with Vite
- Material-UI: Component library
- React Query: Data fetching and caching
- React Router: Client-side routing
- Docker Compose: Local development orchestration
- GitHub Actions: CI/CD pipeline (ready for deployment)
- S3/RDS/ElastiCache: Production AWS services (guide included)
- Docker & Docker Compose
- Node.js 18+ (for frontend)
- Python 3.11+ (optional, for local backend dev without Docker)
# From project root
docker-compose build
docker-compose upThis starts:
- PostgreSQL (port 5432)
- Redis (port 6379)
- MinIO (ports 9000, 9001)
- FastAPI Backend (port 8000)
- Celery Worker
cd project-frontend
npm install
npm run devFrontend runs at: http://localhost:5173
- Open http://localhost:5173
- Upload a CSV file
- Watch real-time job progress
- View analysis and download artifacts
- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Docs (Swagger): http://localhost:8000/docs
- MinIO Console: http://localhost:9001 (minioadmin/minioadmin)
Major_project/
βββ backend/
β βββ app/
β β βββ api/ # API endpoints
β β βββ core/ # Configuration
β β βββ worker/ # Celery tasks
β β βββ main.py # FastAPI app
β β βββ storage.py # Job store
β βββ Dockerfile
β βββ requirements.txt
β βββ README.md
βββ project-frontend/
β βββ src/
β β βββ api/ # API client
β β βββ components/ # React components
β β βββ features/ # Feature modules
β β βββ hooks/ # Custom hooks
β β βββ mocks/ # MSW mocks
β βββ public/
β β βββ mock/ # Mock artifacts
β βββ package.json
βββ docker-compose.yml
βββ .env.example
βββ PRODUCTION_GUIDE.md
βββ README.md
# Without Docker (requires Redis and PostgreSQL running)
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reloadcd project-frontend
npm run dev # Development server
npm run build # Production build
npm run preview # Preview production build# All services
docker-compose logs -f
# Specific service
docker-compose logs -f backend
docker-compose logs -f workerOpen http://localhost:8000/docs for interactive API documentation (Swagger UI).
curl -X POST "http://localhost:8000/api/v1/upload" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@your-dataset.csv"Response:
{
"job_id": "uuid-here",
"filename": "your-dataset.csv"
}curl "http://localhost:8000/api/v1/jobs/{job_id}/status"See PRODUCTION_GUIDE.md for detailed production deployment instructions including:
- AWS infrastructure setup (RDS, ElastiCache, S3, ECS)
- Authentication with Auth0/Cognito
- CI/CD with GitHub Actions
- Monitoring and logging
- Cost optimization strategies
- Set up AWS account and create resources
- Configure Auth0 or Cognito for authentication
- Set up S3 bucket for artifacts
- Deploy backend to ECS/Fargate or Render
- Deploy frontend to Vercel/Netlify
- Configure environment variables and secrets
- Set up monitoring (Sentry, CloudWatch)
- Configure CI/CD pipeline
- Authentication: JWT-based auth with Auth0/Cognito (production)
- CORS: Configured for allowed origins
- File Validation: Type and size limits on uploads
- Presigned URLs: Short-lived artifact access (production)
- Environment Variables: Secrets managed via AWS Secrets Manager
- Application Logs: Docker logs, CloudWatch (production)
- Error Tracking: Sentry integration ready
- Health Checks:
/endpoint returns API status - Job Metrics: Track job duration, success/failure rates
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/autodata
REDIS_URL=redis://redis:6379/0
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/1
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=autodata-artifacts
CORS_ORIGINS=http://localhost:5173,http://localhost:5174VITE_API_URL=http://localhost:8000# Clean up and restart
docker-compose down -v
docker-compose build --no-cache
docker-compose upIf ports are in use, modify docker-compose.yml to use different ports.
- Check
VITE_API_URLinproject-frontend/.env - Verify backend is running: http://localhost:8000
- Check CORS settings in
backend/app/core/config.py
- FastAPI Documentation
- Celery Documentation
- React Query Documentation
- Material-UI Documentation
- Docker Compose Documentation
MIT License - see LICENSE file for details
- ydata-profiling for EDA reports
- FastAPI for the amazing Python web framework
- Material-UI for beautiful React components