A web app for uploading CSV files, describing what you want cleaned in plain English, and getting back cleaned data plus the Python code that did it. Useful for data scientists, or students.
First-time setup:
make setupThen add your Gemini API key to backend/.env:
GEMINI_API_KEY=your_key_here
Get a key from Google AI Studio.
Run the app:
make devThis starts the Python backend (port 8000) and the Next.js frontend (port 3000). Open http://localhost:3000 in your browser.
Run backend and frontend separately (e.g. in two terminals):
make start-backend # backend only
make start-frontend # frontend onlyOr without Make:
# Terminal 1 - backend
cd backend && source venv/bin/activate && python main.py
# Terminal 2 - frontend
cd frontend && npm run dev- Upload CSVs: One or more files at a time.
- Chat to clean: Say things like “remove rows with missing values,” “delete the notes column,” or “convert age to numeric.” A multi-agent system (orchestrator, exploration, code execution, documentation) uses Gemini to figure out what you mean and run pandas code.
- Safe execution: Generated code runs in a sandbox; only pandas/numpy-style operations are allowed.
- Instant analysis: After upload, you get row/column counts, missing value checks, basic stats, and a preview of the first few rows.
- Suggestions: The app suggests actions (e.g. drop nulls, fill missing values, remove duplicate columns) based on the data; you can apply them with one click.
- Download: Get the cleaned CSV and the Python script that produced it.
Frontend (Next.js + React)
|
API Routes (Next.js)
|
Python Backend (FastAPI)
|
Multi-Agent System (Gemini)
|
Data Processing (Pandas)
- Frontend: Next.js 14, TypeScript, Tailwind, React Dropzone, Papa Parse, Lucide icons.
- Backend: FastAPI, Pandas, Google Gemini, Uvicorn.
- Optional: Supabase for conversation history (schema in
scripts/supabase_schema.sql).
clean-your-data/
├── frontend/ # Next.js app
├── backend/ # FastAPI + agents
│ ├── agents/ # conversational, exploration, code execution, docs, etc.
│ └── main.py
├── scripts/ # setup, start_dev, Supabase schema
├── docs/ # specs, enhancements, ideas, logging, etc.
└── README.md
Prerequisites: Python 3.8+, Node.js 18+.
Backend env (backend/.env):
GEMINI_API_KEY=...
SUPABASE_URL=... # optional
SUPABASE_KEY=... # optional
Frontend env (frontend/.env.local):
PYTHON_BACKEND_URL=http://localhost:8000
# Supabase vars optional
Supabase (optional): Create a project, run scripts/supabase_schema.sql, then set the env vars. The app works without it; only conversation persistence is skipped.
- Backend:
POST /process(main chat endpoint),GET /health. - Frontend proxy:
POST /api/process,POST /api/suggestions.
- CSV only.
- Code execution is limited to pandas/numpy.
- Conversation state in memory unless Supabase is configured.
- Set up for local development; harden before production.
| Issue | What to try |
|---|---|
| “Module not found” | Activate the backend venv and pip install -r requirements.txt in backend/. |
| Frontend won’t start | npm install in frontend/, use Node 18+. |
| Backend connection errors | Ensure backend runs on 8000 and PYTHON_BACKEND_URL in frontend/.env.local is correct. |
| Gemini errors | Check GEMINI_API_KEY and Google AI Studio quota. |
Extra notes and specs live in docs/:
docs/specs.md– product/feature specsdocs/ENHANCEMENTS.md,docs/FINAL_ENHANCEMENTS.mddocs/ideas.md,docs/prompt.mddocs/LOGGING.md,docs/frontend.md
MIT. See the LICENSE file.


