A comprehensive EDA platform built with FastAPI that combines advanced exploratory data analysis (EDA), secure code execution, and intelligent LLM integrations for modern data science workflows.
- Multi-format Data Ingestion: Support for CSV, JSON, Excel, Parquet, and database connections
- Intelligent EDA Engine: Automated exploratory data analysis with domain-specific insights
- Granular Analysis Components: Modular analysis for categorical, numerical, geospatial, and text data
- Real-time Data Quality Monitoring: Comprehensive data profiling and quality metrics
- Sandboxed Environment: Isolated code execution with resource limits and security controls
- Pattern-based Security: Static analysis and runtime protection against malicious operations
- Process Isolation: Multi-process architecture for safe user code execution
- Resource Management: CPU and memory limits to prevent abuse
- Multi-Provider Support: OpenAI, Anthropic Claude, DeepSeek, and local models (Ollama)
- Intelligent Model Switching: Automatic model selection based on task type (code, math, analysis)
- Context-Aware Queries: Data-driven context injection for relevant AI responses
- Specialized Models: DeepSeek Coder for programming tasks, DeepSeek Math for analysis
- JWT Authentication: Secure token-based authentication with role-based access control
- Multi-user Support: User management with admin dashboard and permissions
- Async Architecture: Modern FastAPI with async/await for high performance
- Database Flexibility: Support for SQLite, PostgreSQL with async operations
MLOps Platform
βββ π Authentication Layer (JWT + RBAC)
βββ π Data Ingestion Engine
βββ π EDA Analysis Engine
βββ π€ LLM Service Layer
βββ π‘οΈ Security Sandbox
βββ π₯ Admin Dashboard
- FastAPI Backend: Modern async web framework with automatic API documentation
- Advanced EDA: Domain-specific analysis with granular components
- LLM Router: Intelligent routing to appropriate AI models
- Security Layer: Multi-layered protection for code execution
- Data Pipeline: Robust ingestion and processing workflow
- Templates: Jinja2 views in
templates/power data ingestion, EDA dashboards, admin pages, and login flows - Static Assets: Modular JavaScript and CSS bundles in
static/for ingestion, preview, chat, and shared UI components - Page Routing:
core.pages.routeswires HTML routes, whilecore.templatescentralizes template setup - Frontend Security:
core.auth.page_securityenforces per-page access and dynamic context rendering
- Authentication (
core/auth): JWT issuance, OAuth2 dependencies, page security helpers, and template routes - Admin (
core/admin): Async services for system stats, ingestion oversight, maintenance utilities, and admin APIs - Data Ingestion (
core/data_ingestion): Async upload pipeline, metadata management, schema validation, and router APIs - EDA (
core/eda): Preview services, text analytics, sandboxed execution, and advanced analysis orchestrators - LLM (
core/llm): Provider abstractions (OpenAI, Claude, DeepSeek, local), context builders, and chat endpoints - Database (
core/database): Async engine factory, migration utilities, repository helpers, and model declarations - Exceptions & Middleware:
core.exceptions.handlersplusmiddleware/for structured logging and global error handling - Utilities (
core/utils): File helpers, logging adapters, maintenance routines, and audit logging
git clone https://github.com/flyingruverhorse/Automated-EDA.git
cd Automated-EDA# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements-fastapi.txt# Copy environment template
cp .env.example .env
# Configure your settings
# Required: Set SECRET_KEY, database credentials, LLM API keys# Initialize database
alembic upgrade head
# Create admin user (optional)
python -c "from core.auth.auth_core import create_dummy_users; create_dummy_users()"# Development server
python run_fastapi.py
# Production server
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4The application will be available at http://localhost:8000
# Application
SECRET_KEY=your-secret-key-here
DEBUG=false
ENVIRONMENT=production
# Database
DATABASE_TYPE=postgresql # or sqlite
DB_HOST=localhost
DB_NAME=db
DB_USER=username
DB_PASSWORD=password# OpenAI
OPENAI_API_KEY=your-openai-key
OPENAI_DEFAULT_MODEL=gpt-3.5-turbo
# DeepSeek (Code-specialized)
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_CODE_MODEL=deepseek-coder
DEEPSEEK_MATH_MODEL=deepseek-math
# Anthropic Claude
ANTHROPIC_API_KEY=your-claude-key
CLAUDE_DEFAULT_MODEL=claude-3-haiku-20240307
# Local LLM (Ollama)
LOCAL_LLM_URL=http://localhost:11434
LOCAL_LLM_MODEL=llama2- Config Management:
config.pyusespydantic-settingswith cachedget_settings()accessor and environment validation - Profiles:
DevelopmentSettings,ProductionSettings, andTestingSettingstoggle middleware, logging, caching, and debugging defaults - Feature Flags: Data lineage, schema drift, retention, and LLM behaviour exposed as
ENABLE_*toggles - Helper APIs: Utility methods for pandas configuration, database URLs, logging setup, and ML feature sizing
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
POST /api/auth/login # User login
POST /api/auth/refresh # Token refresh
POST /api/auth/logout # User logoutPOST /data/upload # Upload datasets
GET /data/sources # List data sources
GET /data/sources/{id} # Get source detailsGET /eda/api/sources/{id}/preview # Data preview
POST /eda/api/sources/{id}/quality # Quality report
POST /advanced-eda/analyze/{id} # Advanced analysisPOST /llm/query # Chat with AI
POST /llm/recommend-model # Get model recommendation
POST /llm/context/{id} # Data-aware queries- Static Analysis: AST parsing for dangerous pattern detection
- Import Restrictions: Whitelist of allowed libraries only
- Process Isolation: Subprocess execution with timeout limits
- Resource Limits: Memory and CPU constraints
- Network Blocking: Prevention of external connections
- JWT Tokens: Secure token-based authentication
- Role-Based Access: User, admin, and custom permissions
- Session Management: Secure session handling
- API Rate Limiting: Protection against abuse
# Categorical data analysis
from core.eda.advanced_eda.granular_components.categorical import CategoricalAnalysis
# Geospatial analysis
from core.eda.advanced_eda.granular_components.geospatial import GeospatialAnalysis
# Time series analysis
from core.eda.advanced_eda.granular_components.time_series import TimeSeriesAnalysis// Switch to code-optimized model
await window.LLMChat.switchToCodeModel();
// Get model recommendation
await window.LLMChat.getModelRecommendation('code', 'deepseek');# Build image
docker build -t mlops-platform .
# Run container
docker run -p 8000:8000 \
-e SECRET_KEY=your-secret \
-e DATABASE_URL=your-db-url \
mlops-platform# Using Gunicorn + Uvicorn
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120 \
--keepalive 2- Health Checks:
/healthendpoint for service monitoring - Performance Metrics: Request timing and resource usage
- Error Tracking: Comprehensive logging and error handling
- Admin Dashboard: Real-time system statistics
# Structured logging with rotation
LOGGING = {
'level': 'INFO',
'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
'file': 'logs/mlops.log',
'max_bytes': 10_000_000,
'backup_count': 5
}# Unit tests
pytest tests/unit/
# Integration tests
pytest tests/integration/
# Security tests
pytest tests/security/
# Full test suite with coverage
pytest --cov=core --cov-report=html- Unit Tests: Core functionality testing
- Integration Tests: End-to-end workflow testing
- Security Tests: Sandbox and authentication testing
- Performance Tests: Load and stress testing
# Install development dependencies
pip install -r requirements-dev.txt
# Pre-commit hooks
pre-commit install
# Code formatting
black .
isort .
# Type checking
mypy core/βββ core/ # Core application modules
β βββ auth/ # Authentication & authorization
β βββ data_ingestion/ # Data upload & management
β βββ eda/ # Exploratory data analysis
β βββ llm/ # LLM integration
β βββ admin/ # Admin dashboard
β βββ database/ # Database models & connections
βββ static/ # Frontend assets
βββ templates/ # Jinja2 templates
βββ tests/ # Test suite
βββ docs/ # Documentation
βββ config.py # Application configuration
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Black: Code formatting
- isort: Import sorting
- MyPy: Type checking
- Pytest: Testing framework
- Pre-commit: Git hooks for quality
This project is licensed under the MIT License - see the LICENSE file for details.
- API Docs: Available at
/docswhen running in debug mode - User Guide: See
docs/directory for detailed guides - Security Guide:
docs/EDA_SECURITY_GUIDE.md
- Issues: GitHub Issues for bug reports and feature requests
- Discussions: GitHub Discussions for questions and community
- Wiki: Comprehensive documentation and examples
- FastAPI: Modern web framework for Python APIs
- Pandas: Data manipulation and analysis library
- SQLAlchemy: SQL toolkit and ORM
- Jupyter: Interactive computing environment
- All Contributors: Thanks to everyone who has contributed to this project
Made with β€οΈ for the Data Science Community
Transform your data workflows with intelligent automation and secure analysis.