Skip to content

mayank2295/AutoDataLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AutoDataLab

A production-ready web application for automated data cleaning, exploratory data analysis (EDA), and baseline ML model training with downloadable reports.

🎯 Features

  • CSV Upload: Drag & drop CSV files with preview
  • Automated Cleaning: Missing value handling, type inference, duplicate detection
  • EDA Reports: Comprehensive data profiling with ydata-profiling
  • ML Models: Train baseline models (scikit-learn) with metrics
  • Download Artifacts: Cleaned CSV and interactive HTML reports
  • Real-time Progress: Track job status with live updates
  • Production-Ready: Docker Compose, Celery workers, PostgreSQL, Redis, MinIO/S3

πŸ—οΈ Tech Stack

Backend

  • FastAPI: Modern Python web framework
  • Celery: Distributed task queue for background jobs
  • PostgreSQL: Metadata storage
  • Redis: Cache and message broker
  • MinIO: S3-compatible object storage (local dev)
  • Libraries: pandas, scikit-learn, ydata-profiling, joblib

Frontend

  • React + TypeScript: Modern UI with Vite
  • Material-UI: Component library
  • React Query: Data fetching and caching
  • React Router: Client-side routing

Infrastructure

  • Docker Compose: Local development orchestration
  • GitHub Actions: CI/CD pipeline (ready for deployment)
  • S3/RDS/ElastiCache: Production AWS services (guide included)

πŸš€ Quick Start (Local Development)

Prerequisites

  • Docker & Docker Compose
  • Node.js 18+ (for frontend)
  • Python 3.11+ (optional, for local backend dev without Docker)

1. Start Backend Services

# From project root
docker-compose build
docker-compose up

This starts:

  • PostgreSQL (port 5432)
  • Redis (port 6379)
  • MinIO (ports 9000, 9001)
  • FastAPI Backend (port 8000)
  • Celery Worker

2. Start Frontend

cd project-frontend
npm install
npm run dev

Frontend runs at: http://localhost:5173

3. Test the System

  1. Open http://localhost:5173
  2. Upload a CSV file
  3. Watch real-time job progress
  4. View analysis and download artifacts

4. Access Services

πŸ“ Project Structure

Major_project/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ app/
β”‚   β”‚   β”œβ”€β”€ api/          # API endpoints
β”‚   β”‚   β”œβ”€β”€ core/         # Configuration
β”‚   β”‚   β”œβ”€β”€ worker/       # Celery tasks
β”‚   β”‚   β”œβ”€β”€ main.py       # FastAPI app
β”‚   β”‚   └── storage.py    # Job store
β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── README.md
β”œβ”€β”€ project-frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ api/          # API client
β”‚   β”‚   β”œβ”€β”€ components/   # React components
β”‚   β”‚   β”œβ”€β”€ features/     # Feature modules
β”‚   β”‚   β”œβ”€β”€ hooks/        # Custom hooks
β”‚   β”‚   └── mocks/        # MSW mocks
β”‚   β”œβ”€β”€ public/
β”‚   β”‚   └── mock/         # Mock artifacts
β”‚   └── package.json
β”œβ”€β”€ docker-compose.yml
β”œβ”€β”€ .env.example
β”œβ”€β”€ PRODUCTION_GUIDE.md
└── README.md

πŸ”§ Development

Backend Development

# Without Docker (requires Redis and PostgreSQL running)
cd backend
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reload

Frontend Development

cd project-frontend
npm run dev     # Development server
npm run build   # Production build
npm run preview # Preview production build

View Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f backend
docker-compose logs -f worker

πŸ§ͺ API Testing

Open http://localhost:8000/docs for interactive API documentation (Swagger UI).

Example: Upload File

curl -X POST "http://localhost:8000/api/v1/upload" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your-dataset.csv"

Response:

{
  "job_id": "uuid-here",
  "filename": "your-dataset.csv"
}

Example: Check Job Status

curl "http://localhost:8000/api/v1/jobs/{job_id}/status"

🌐 Production Deployment

See PRODUCTION_GUIDE.md for detailed production deployment instructions including:

  • AWS infrastructure setup (RDS, ElastiCache, S3, ECS)
  • Authentication with Auth0/Cognito
  • CI/CD with GitHub Actions
  • Monitoring and logging
  • Cost optimization strategies

Quick Production Checklist

  • Set up AWS account and create resources
  • Configure Auth0 or Cognito for authentication
  • Set up S3 bucket for artifacts
  • Deploy backend to ECS/Fargate or Render
  • Deploy frontend to Vercel/Netlify
  • Configure environment variables and secrets
  • Set up monitoring (Sentry, CloudWatch)
  • Configure CI/CD pipeline

πŸ” Security

  • Authentication: JWT-based auth with Auth0/Cognito (production)
  • CORS: Configured for allowed origins
  • File Validation: Type and size limits on uploads
  • Presigned URLs: Short-lived artifact access (production)
  • Environment Variables: Secrets managed via AWS Secrets Manager

πŸ“Š Monitoring

  • Application Logs: Docker logs, CloudWatch (production)
  • Error Tracking: Sentry integration ready
  • Health Checks: / endpoint returns API status
  • Job Metrics: Track job duration, success/failure rates

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ Environment Variables

Backend (.env)

DATABASE_URL=postgresql://postgres:postgres@postgres:5432/autodata
REDIS_URL=redis://redis:6379/0
CELERY_BROKER_URL=redis://redis:6379/0
CELERY_RESULT_BACKEND=redis://redis:6379/1
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET=autodata-artifacts
CORS_ORIGINS=http://localhost:5173,http://localhost:5174

Frontend (.env)

VITE_API_URL=http://localhost:8000

πŸ› οΈ Troubleshooting

Docker Issues

# Clean up and restart
docker-compose down -v
docker-compose build --no-cache
docker-compose up

Port Conflicts

If ports are in use, modify docker-compose.yml to use different ports.

Frontend Not Connecting to Backend

  1. Check VITE_API_URL in project-frontend/.env
  2. Verify backend is running: http://localhost:8000
  3. Check CORS settings in backend/app/core/config.py

πŸ“š Resources

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

  • ydata-profiling for EDA reports
  • FastAPI for the amazing Python web framework
  • Material-UI for beautiful React components

About

Upload a dataset and automatically get cleaning, EDA, visualizations and ML model reports. Perfect for prototyping, learning, or improving data quality.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors