Advanced Data Analysis & LLM Integration

A comprehensive EDA platform built with FastAPI that combines advanced exploratory data analysis (EDA), secure code execution, and intelligent LLM integrations for modern data science workflows.

🚀 Key Features

📊 Advanced Data Analysis

Multi-format Data Ingestion: Support for CSV, JSON, Excel, Parquet, and database connections
Intelligent EDA Engine: Automated exploratory data analysis with domain-specific insights
Granular Analysis Components: Modular analysis for categorical, numerical, geospatial, and text data
Real-time Data Quality Monitoring: Comprehensive data profiling and quality metrics

🔒 Secure Code Execution

Sandboxed Environment: Isolated code execution with resource limits and security controls
Pattern-based Security: Static analysis and runtime protection against malicious operations
Process Isolation: Multi-process architecture for safe user code execution
Resource Management: CPU and memory limits to prevent abuse

🤖 LLM Integration

Multi-Provider Support: OpenAI, Anthropic Claude, DeepSeek, and local models (Ollama)
Intelligent Model Switching: Automatic model selection based on task type (code, math, analysis)
Context-Aware Queries: Data-driven context injection for relevant AI responses
Specialized Models: DeepSeek Coder for programming tasks, DeepSeek Math for analysis

👥 Enterprise Features

JWT Authentication: Secure token-based authentication with role-based access control
Multi-user Support: User management with admin dashboard and permissions
Async Architecture: Modern FastAPI with async/await for high performance
Database Flexibility: Support for SQLite, PostgreSQL with async operations

🏗️ Architecture

MLOps Platform
├── 🔐 Authentication Layer (JWT + RBAC)
├── 📊 Data Ingestion Engine
├── 🔍 EDA Analysis Engine
├── 🤖 LLM Service Layer
├── 🛡️ Security Sandbox
└── 👥 Admin Dashboard

Core Components

FastAPI Backend: Modern async web framework with automatic API documentation
Advanced EDA: Domain-specific analysis with granular components
LLM Router: Intelligent routing to appropriate AI models
Security Layer: Multi-layered protection for code execution
Data Pipeline: Robust ingestion and processing workflow

Web Experience

Templates: Jinja2 views in templates/ power data ingestion, EDA dashboards, admin pages, and login flows
Static Assets: Modular JavaScript and CSS bundles in static/ for ingestion, preview, chat, and shared UI components
Page Routing: core.pages.routes wires HTML routes, while core.templates centralizes template setup
Frontend Security: core.auth.page_security enforces per-page access and dynamic context rendering

🧩 Module Overview

Authentication (core/auth): JWT issuance, OAuth2 dependencies, page security helpers, and template routes
Admin (core/admin): Async services for system stats, ingestion oversight, maintenance utilities, and admin APIs
Data Ingestion (core/data_ingestion): Async upload pipeline, metadata management, schema validation, and router APIs
EDA (core/eda): Preview services, text analytics, sandboxed execution, and advanced analysis orchestrators
LLM (core/llm): Provider abstractions (OpenAI, Claude, DeepSeek, local), context builders, and chat endpoints
Database (core/database): Async engine factory, migration utilities, repository helpers, and model declarations
Exceptions & Middleware: core.exceptions.handlers plus middleware/ for structured logging and global error handling
Utilities (core/utils): File helpers, logging adapters, maintenance routines, and audit logging

⚡ Quick Start

1. Clone Repository

git clone https://github.com/flyingruverhorse/Automated-EDA.git
cd Automated-EDA

2. Environment Setup

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements-fastapi.txt

3. Configuration

# Copy environment template
cp .env.example .env

# Configure your settings
# Required: Set SECRET_KEY, database credentials, LLM API keys

4. Database Setup

# Initialize database
alembic upgrade head

# Create admin user (optional)
python -c "from core.auth.auth_core import create_dummy_users; create_dummy_users()"

5. Run Application

# Development server
python run_fastapi.py

# Production server
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

The application will be available at http://localhost:8000

🔧 Configuration

Environment Variables

Core Settings

# Application
SECRET_KEY=your-secret-key-here
DEBUG=false
ENVIRONMENT=production

# Database
DATABASE_TYPE=postgresql  # or sqlite
DB_HOST=localhost
DB_NAME=db
DB_USER=username
DB_PASSWORD=password

LLM Configuration

# OpenAI
OPENAI_API_KEY=your-openai-key
OPENAI_DEFAULT_MODEL=gpt-3.5-turbo

# DeepSeek (Code-specialized)
DEEPSEEK_API_KEY=your-deepseek-key
DEEPSEEK_CODE_MODEL=deepseek-coder
DEEPSEEK_MATH_MODEL=deepseek-math

# Anthropic Claude
ANTHROPIC_API_KEY=your-claude-key
CLAUDE_DEFAULT_MODEL=claude-3-haiku-20240307

# Local LLM (Ollama)
LOCAL_LLM_URL=http://localhost:11434
LOCAL_LLM_MODEL=llama2

⚙️ Environment Profiles

Config Management: config.py uses pydantic-settings with cached get_settings() accessor and environment validation
Profiles: DevelopmentSettings, ProductionSettings, and TestingSettings toggle middleware, logging, caching, and debugging defaults
Feature Flags: Data lineage, schema drift, retention, and LLM behaviour exposed as ENABLE_* toggles
Helper APIs: Utility methods for pandas configuration, database URLs, logging setup, and ML feature sizing

📚 API Documentation

Interactive Documentation

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Key Endpoints

Authentication

POST /api/auth/login     # User login
POST /api/auth/refresh   # Token refresh
POST /api/auth/logout    # User logout

Data Management

POST /data/upload        # Upload datasets
GET  /data/sources       # List data sources
GET  /data/sources/{id}  # Get source details

EDA & Analysis

GET  /eda/api/sources/{id}/preview   # Data preview
POST /eda/api/sources/{id}/quality   # Quality report
POST /advanced-eda/analyze/{id}      # Advanced analysis

LLM Integration

POST /llm/query              # Chat with AI
POST /llm/recommend-model    # Get model recommendation
POST /llm/context/{id}       # Data-aware queries

🛡️ Security Features

Code Execution Security

Static Analysis: AST parsing for dangerous pattern detection
Import Restrictions: Whitelist of allowed libraries only
Process Isolation: Subprocess execution with timeout limits
Resource Limits: Memory and CPU constraints
Network Blocking: Prevention of external connections

Authentication & Authorization

JWT Tokens: Secure token-based authentication
Role-Based Access: User, admin, and custom permissions
Session Management: Secure session handling
API Rate Limiting: Protection against abuse

🔬 Advanced Features

Domain-Specific Analysis

# Categorical data analysis
from core.eda.advanced_eda.granular_components.categorical import CategoricalAnalysis

# Geospatial analysis
from core.eda.advanced_eda.granular_components.geospatial import GeospatialAnalysis

# Time series analysis
from core.eda.advanced_eda.granular_components.time_series import TimeSeriesAnalysis

LLM Model Switching

// Switch to code-optimized model
await window.LLMChat.switchToCodeModel();

// Get model recommendation
await window.LLMChat.getModelRecommendation('code', 'deepseek');

🚀 Deployment

Docker Deployment

# Build image
docker build -t mlops-platform .

# Run container
docker run -p 8000:8000 \
  -e SECRET_KEY=your-secret \
  -e DATABASE_URL=your-db-url \
  mlops-platform

Production Deployment

# Using Gunicorn + Uvicorn
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 120 \
  --keepalive 2

📊 Monitoring & Observability

Built-in Monitoring

Health Checks: /health endpoint for service monitoring
Performance Metrics: Request timing and resource usage
Error Tracking: Comprehensive logging and error handling
Admin Dashboard: Real-time system statistics

Logging Configuration

# Structured logging with rotation
LOGGING = {
    'level': 'INFO',
    'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    'file': 'logs/mlops.log',
    'max_bytes': 10_000_000,
    'backup_count': 5
}

🧪 Testing

Run Test Suite

# Unit tests
pytest tests/unit/

# Integration tests
pytest tests/integration/

# Security tests
pytest tests/security/

# Full test suite with coverage
pytest --cov=core --cov-report=html

Test Categories

Unit Tests: Core functionality testing
Integration Tests: End-to-end workflow testing
Security Tests: Sandbox and authentication testing
Performance Tests: Load and stress testing

🛠️ Development

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Pre-commit hooks
pre-commit install

# Code formatting
black .
isort .

# Type checking
mypy core/

Project Structure

├── core/                   # Core application modules
│   ├── auth/              # Authentication & authorization
│   ├── data_ingestion/    # Data upload & management
│   ├── eda/               # Exploratory data analysis
│   ├── llm/               # LLM integration
│   ├── admin/             # Admin dashboard
│   └── database/          # Database models & connections
├── static/                # Frontend assets
├── templates/             # Jinja2 templates
├── tests/                 # Test suite
├── docs/                  # Documentation
└── config.py              # Application configuration

🤝 Contributing

Contribution Guidelines

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Standards

Black: Code formatting
isort: Import sorting
MyPy: Type checking
Pytest: Testing framework
Pre-commit: Git hooks for quality

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

Documentation

API Docs: Available at /docs when running in debug mode
User Guide: See docs/ directory for detailed guides
Security Guide: docs/EDA_SECURITY_GUIDE.md

Getting Help

Issues: GitHub Issues for bug reports and feature requests
Discussions: GitHub Discussions for questions and community
Wiki: Comprehensive documentation and examples

🙏 Acknowledgments

FastAPI: Modern web framework for Python APIs
Pandas: Data manipulation and analysis library
SQLAlchemy: SQL toolkit and ORM
Jupyter: Interactive computing environment
All Contributors: Thanks to everyone who has contributed to this project

Made with ❤️ for the Data Science Community

Transform your data workflows with intelligent automation and secure analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
core		core
docs		docs
middleware		middleware
schemas		schemas
static		static
templates		templates
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LLM_MODEL_SWITCHING_GUIDE.md		LLM_MODEL_SWITCHING_GUIDE.md
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
alembic.ini		alembic.ini
config.py		config.py
dependencies.py		dependencies.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements-fastapi.txt		requirements-fastapi.txt
run_fastapi.py		run_fastapi.py
uv.lock		uv.lock

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Advanced Data Analysis & LLM Integration

🚀 Key Features

📊 Advanced Data Analysis

🔒 Secure Code Execution

🤖 LLM Integration

👥 Enterprise Features

🏗️ Architecture

Core Components

Web Experience

🧩 Module Overview

⚡ Quick Start

1. Clone Repository

2. Environment Setup

3. Configuration

4. Database Setup

5. Run Application

🔧 Configuration

Environment Variables

Core Settings

LLM Configuration

⚙️ Environment Profiles

📚 API Documentation

Interactive Documentation

Key Endpoints

Authentication

Data Management

EDA & Analysis

LLM Integration

🛡️ Security Features

Code Execution Security

Authentication & Authorization

🔬 Advanced Features

Domain-Specific Analysis

LLM Model Switching

🚀 Deployment

Docker Deployment

Production Deployment

📊 Monitoring & Observability

Built-in Monitoring

Logging Configuration

🧪 Testing

Run Test Suite

Test Categories

🛠️ Development

Development Setup

Project Structure

🤝 Contributing

Contribution Guidelines

Code Standards

📄 License

🆘 Support

Documentation

Getting Help

🙏 Acknowledgments

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages