From 4d908a0f53229cc10195ea3cf690dcb245eb6c4b Mon Sep 17 00:00:00 2001 From: Devanshu Rajesh Chicholikar Date: Thu, 8 Jan 2026 17:27:31 -0500 Subject: [PATCH 1/4] chore: remove legacy code and internal docs - Remove legacy/ folder (old unused code) - Remove SETUP_COMPLETE.md (internal doc) - Remove docs/HANDOFF-114.md (internal handoff) - Remove docs/TIER_SYSTEM_DESIGN.md (internal design doc) Part of #180 --- SETUP_COMPLETE.md | 279 -------------------------- docs/HANDOFF-114.md | 60 ------ docs/TIER_SYSTEM_DESIGN.md | 387 ------------------------------------ legacy/IndexingProgress.tsx | 95 --------- legacy/README.md | 23 --- legacy/indexer_old.py | 362 --------------------------------- legacy/repo_manager_old.py | 125 ------------ 7 files changed, 1331 deletions(-) delete mode 100644 SETUP_COMPLETE.md delete mode 100644 docs/HANDOFF-114.md delete mode 100644 docs/TIER_SYSTEM_DESIGN.md delete mode 100644 legacy/IndexingProgress.tsx delete mode 100644 legacy/README.md delete mode 100644 legacy/indexer_old.py delete mode 100644 legacy/repo_manager_old.py diff --git a/SETUP_COMPLETE.md b/SETUP_COMPLETE.md deleted file mode 100644 index 80ffe6a..0000000 --- a/SETUP_COMPLETE.md +++ /dev/null @@ -1,279 +0,0 @@ -# πŸŽ‰ CodeIntel Docker & Deployment Setup Complete! - -## βœ… What's Ready - -### 1. Docker Configuration -- βœ… `docker-compose.yml` - Production setup -- βœ… `docker-compose.dev.yml` - Development with hot reload -- βœ… Backend `Dockerfile` - Multi-stage, optimized -- βœ… Frontend `Dockerfile` - Nginx production build -- βœ… Root `.env` file - All API keys configured -- βœ… `.gitignore` updated - API keys won't leak - -### 2. Deployment Files -- βœ… `DEPLOYMENT.md` - Complete deployment guide (337 lines) -- βœ… `DOCKER_QUICKSTART.md` - 5-minute quick start (197 lines) -- βœ… `DOCKER_TROUBLESHOOTING.md` - Common issues & fixes (284 lines) -- βœ… `railway.json` - Railway config -- βœ… Deployment scripts (executable): - - `scripts/deploy-railway.sh` - Backend to Railway - - `scripts/deploy-vercel.sh` - Frontend to Vercel - - `scripts/verify-setup.sh` - Pre-deployment checks - -### 3. Developer Experience -- βœ… `Makefile` - 20+ commands for dev workflow -- βœ… README updated - Docker section added -- βœ… Health checks - All services monitored -- βœ… Graceful restarts - No data loss -- βœ… Redis persistence - AOF enabled - -## πŸš€ Quick Start Commands - -### Local Development -```bash -# Verify setup -./scripts/verify-setup.sh - -# Start everything -make dev -# OR -docker compose up -d - -# View logs -make logs - -# Stop -make stop -``` - -**Access at:** -- Frontend: http://localhost:3000 -- Backend: http://localhost:8000 -- API Docs: http://localhost:8000/docs -- Redis: localhost:6379 - -### Production Deployment - -**Option 1: Automated Scripts** -```bash -# Deploy backend to Railway -./scripts/deploy-railway.sh - -# Deploy frontend to Vercel -./scripts/deploy-vercel.sh -``` - -**Option 2: Makefile** -```bash -make deploy-backend -make deploy-frontend -# OR -make deploy-all -``` - -**Option 3: Manual** -See `DEPLOYMENT.md` for step-by-step guide - -## πŸ“‹ Pre-Deployment Checklist - -Before deploying to production, make sure: - -- [ ] Docker Desktop is running -- [ ] All API keys are set in `.env` -- [ ] Tests passing: `make test` -- [ ] Local Docker works: `make dev` -- [ ] Health check passes: `make health` -- [ ] Railway CLI installed: `npm i -g @railway/cli` -- [ ] Vercel CLI installed: `npm i -g vercel` -- [ ] Changed `API_KEY` from default value -- [ ] Supabase RLS policies configured -- [ ] Read through `DEPLOYMENT.md` - -## 🎯 Next Steps - -### 1. Test Locally -```bash -# Start services -make dev - -# In another terminal, run tests -make test - -# Check everything is healthy -make health -``` - -### 2. Deploy Backend (Railway) -```bash -# Automated -./scripts/deploy-railway.sh - -# Follow prompts to: -# - Login to Railway -# - Create/link project -# - Add Redis service -# - Set environment variables -# - Deploy -``` - -### 3. Deploy Frontend (Vercel) -```bash -# Get your Railway backend URL first -railway domain - -# Then deploy frontend -./scripts/deploy-vercel.sh - -# Enter Railway URL when prompted -``` - -### 4. Configure Production -After deployment: -1. Update CORS in `backend/main.py` with Vercel URL -2. Test all endpoints work -3. Monitor logs: `railway logs -f` -4. Set up custom domains (optional) - -## πŸ“– Documentation Reference - -| Document | Purpose | -|----------|---------| -| `README.md` | Project overview, features, quick start | -| `DOCKER_QUICKSTART.md` | Get running in 5 minutes | -| `DOCKER_TROUBLESHOOTING.md` | Fix common Docker issues | -| `DEPLOYMENT.md` | Complete deployment guide | -| `SECURITY.md` | Security practices & vulnerability reporting | -| `CONTRIBUTING.md` | How to contribute | - -## πŸ”§ Useful Commands - -### Docker -```bash -make dev # Start dev environment -make prod # Start production environment -make logs # View all logs -make stop # Stop services -make clean # Nuclear option - remove everything -make health # Check service health -make restart-backend # Quick backend restart -``` - -### Testing -```bash -make test # Run tests -make test-watch # Watch mode -make coverage # Coverage report -``` - -### Deployment -```bash -make deploy-backend # Deploy to Railway -make deploy-frontend # Deploy to Vercel -make deploy-all # Deploy everything -``` - -### Debugging -```bash -make shell-backend # Bash into backend container -make shell-redis # Redis CLI -make redis-stats # View Redis info -docker compose ps # Check container status -docker compose logs -f backend # Follow backend logs -``` - -## πŸ› Common Issues - -| Issue | Quick Fix | -|-------|-----------| -| Docker daemon not running | Open Docker Desktop | -| Port already in use | `lsof -i :8000` and kill process | -| Env vars not found | Make sure `.env` exists in project root | -| Build fails | `make clean && make build` | -| Services keep restarting | Check logs: `make logs` | - -**Full troubleshooting:** See `DOCKER_TROUBLESHOOTING.md` - -## πŸ“Š What Got Built - -### Architecture -``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Frontend │─────▢│ Backend │─────▢│ Redis β”‚ -β”‚ Vite+React β”‚ β”‚ FastAPI β”‚ β”‚ Cache β”‚ -β”‚ Port 3000 β”‚ β”‚ Port 8000 β”‚ β”‚ Port 6379 β”‚ -β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”œβ”€β”€β”€β”€β–Ά Supabase (Postgres) - └────▢ Pinecone (Vectors) -``` - -### Files Created/Updated -- βœ… `.env` - Root environment variables -- βœ… `docker-compose.yml` - Production services (removed obsolete `version`) -- βœ… `docker-compose.dev.yml` - Dev services (removed obsolete `version`) -- βœ… `DOCKER_QUICKSTART.md` - Quick start guide -- βœ… `DOCKER_TROUBLESHOOTING.md` - Troubleshooting guide -- βœ… `scripts/verify-setup.sh` - Pre-deployment verification (made executable) -- βœ… `README.md` - Added Docker quick start section - -### Already Existing (Verified Working) -- βœ… `backend/Dockerfile` - Production-ready -- βœ… `frontend/Dockerfile` - Multi-stage build with nginx -- βœ… `railway.json` - Railway configuration -- βœ… `DEPLOYMENT.md` - Comprehensive deployment guide -- βœ… `Makefile` - Developer commands -- βœ… `scripts/deploy-railway.sh` - Railway deployment -- βœ… `scripts/deploy-vercel.sh` - Vercel deployment - -## πŸŽ“ What You Learned - -This setup demonstrates: -1. **Production-grade Docker Compose** - Multi-service orchestration -2. **Multi-stage builds** - Optimized image sizes -3. **Health checks** - Service monitoring -4. **Environment management** - Secrets handling -5. **Deployment automation** - Scripts for Railway/Vercel -6. **Developer experience** - Makefile commands, hot reload -7. **Documentation** - Comprehensive guides for users - -## πŸ’° Expected Costs - -**Hobby/Free Tier:** -- Railway: $5/month credit (backend + Redis) -- Vercel: Free for personal projects -- **Total: $0-5/month** - -**Production:** -- Railway Pro: $20/month -- Vercel Pro: $20/month -- OpenAI API: ~$10-50/month -- Pinecone Starter: $70/month -- **Total: ~$120-160/month** - -## πŸŽ‰ You're Ready! - -Your CodeIntel project is now: -- βœ… Docker Compose ready for local dev -- βœ… Production-ready Dockerfiles -- βœ… Deployment scripts for Railway + Vercel -- βœ… Comprehensive documentation -- βœ… Developer-friendly tooling - -**Start building:** -```bash -make dev -open http://localhost:3000 -``` - -**Deploy to production:** -```bash -./scripts/verify-setup.sh # Verify first -./scripts/deploy-railway.sh # Deploy backend -./scripts/deploy-vercel.sh # Deploy frontend -``` - ---- - -**Questions?** Check `DOCKER_TROUBLESHOOTING.md` or open an issue on GitHub. - -**Ready to ship!** πŸš€ diff --git a/docs/HANDOFF-114.md b/docs/HANDOFF-114.md deleted file mode 100644 index a938fc6..0000000 --- a/docs/HANDOFF-114.md +++ /dev/null @@ -1,60 +0,0 @@ -# Handoff: Anonymous Indexing (#114) - -## TL;DR -Let users index their own GitHub repos without signup. 5 backend endpoints needed. - -## GitHub Issues (Full Specs) -- **#124** - Validate GitHub URL -- **#125** - Start anonymous indexing -- **#126** - Get indexing status -- **#127** - Extend session management -- **#128** - Update search for user repos - -**Read these first.** Each has request/response schemas, implementation notes, acceptance criteria. - -## Order of Work -``` -#127 + #124 (parallel) β†’ #125 β†’ #126 β†’ #128 -``` - -## Key Files to Understand - -| File | What It Does | -|------|--------------| -| `backend/config/api.py` | API versioning (`/api/v1/*`) | -| `backend/routes/playground.py` | Existing playground endpoints | -| `backend/services/playground_limiter.py` | Session + rate limiting | -| `backend/services/repo_validator.py` | File counting, extensions | -| `backend/dependencies.py` | Indexer, cache, redis_client | - -## Constraints (Anonymous Users) -- 200 files max -- 1 repo per session -- 50 searches per session -- 24hr TTL - -## Workflow -See `CONTRIBUTING.md` for full guide. - -**Quick version:** -```bash -# Create branch -git checkout -b feat/124-validate-repo - -# Make changes, test -pytest tests/ -v - -# Commit -git add . -git commit -m "feat(playground): add validate-repo endpoint" - -# Push to YOUR fork -git push origin feat/124-validate-repo - -# Create PR on OpenCodeIntel/opencodeintel -# Reference issue: "Closes #124" -``` - -## Questions? -- Check GitHub issues first -- Ping Devanshu for blockers diff --git a/docs/TIER_SYSTEM_DESIGN.md b/docs/TIER_SYSTEM_DESIGN.md deleted file mode 100644 index d5b093f..0000000 --- a/docs/TIER_SYSTEM_DESIGN.md +++ /dev/null @@ -1,387 +0,0 @@ -# User Tier & Limits System - Design Document - -> **Issues**: #93, #94, #95, #96, #97 -> **Author**: Devanshu -> **Status**: Implemented -> **Last Updated**: 2025-12-13 - ---- - -## 1. Problem Statement - -CodeIntel needs a tiered system to: -1. **Protect costs** - Indexing is expensive ($0.02-$50/repo depending on size) -2. **Enable growth** - Freemium model with upgrade path -3. **Prevent abuse** - Rate limit anonymous playground users - -**Key Insight**: Searching is nearly free ($0.000001/query). Indexing is the real cost driver. - ---- - -## 2. Tier Definitions - -| Tier | Max Repos | Files/Repo | Functions/Repo | Playground/Day | -|------|-----------|------------|----------------|----------------| -| **Free** | 3 | 500 | 2,000 | 50 | -| **Pro** | 20 | 5,000 | 20,000 | Unlimited | -| **Enterprise** | Unlimited | 50,000 | 200,000 | Unlimited | - -**Rationale**: -- Free tier: Enough for personal projects, not enterprise codebases -- Playground limit: 50/day is generous (anti-abuse, not business gate) -- File/function limits: Prevent expensive indexing jobs - ---- - -## 3. Current API Endpoints - -### 3.1 Authentication (`/api/v1/auth`) -| Method | Endpoint | Auth | Description | -|--------|----------|------|-------------| -| POST | `/signup` | None | Create account | -| POST | `/login` | None | Get JWT | -| POST | `/refresh` | JWT | Refresh token | -| POST | `/logout` | JWT | Invalidate session | -| GET | `/me` | JWT | Get current user | - -### 3.2 Repositories (`/api/v1/repos`) -| Method | Endpoint | Auth | Description | **Limits Check** | -|--------|----------|------|-------------|------------------| -| GET | `/` | JWT | List user repos | - | -| POST | `/` | JWT | Add repo | **#95: Check repo count** | -| POST | `/{id}/index` | JWT | Index repo | **#94: Check file/function count** | - -### 3.3 Search (`/api/v1/search`) -| Method | Endpoint | Auth | Description | **Limits Check** | -|--------|----------|------|-------------|------------------| -| POST | `/search` | JWT | Search code | - | -| POST | `/explain` | JWT | Explain code | - | - -### 3.4 Playground (`/api/v1/playground`) - **Anonymous** -| Method | Endpoint | Auth | Description | **Limits Check** | -|--------|----------|------|-------------|------------------| -| GET | `/repos` | None | List demo repos | - | -| POST | `/search` | None | Search demo repos | **#93: Rate limit 50/day** | - -### 3.5 Analysis (`/api/v1/analysis`) -| Method | Endpoint | Auth | Description | -|--------|----------|------|-------------| -| GET | `/{id}/dependencies` | JWT | Dependency graph | -| POST | `/{id}/impact` | JWT | Impact analysis | -| GET | `/{id}/insights` | JWT | Repo insights | -| GET | `/{id}/style-analysis` | JWT | Code style | - -### 3.6 Users (`/api/v1/users`) - **NEW** -| Method | Endpoint | Auth | Description | -|--------|----------|------|-------------| -| GET | `/usage` | JWT | Get tier, limits, current usage | -| GET | `/limits/check-repo-add` | JWT | Pre-check before adding repo | - ---- - -## 4. Implementation Plan by Issue - -### Issue #96: User Tier System (Foundation) βœ… DONE -**Files Created**: -- `backend/services/user_limits.py` - Core service -- `backend/routes/users.py` - API endpoints -- `supabase/migrations/001_user_profiles.sql` - DB schema - -**Service Methods**: -```python -class UserLimitsService: - def get_user_tier(user_id) -> UserTier - def get_user_limits(user_id) -> TierLimits - def get_user_repo_count(user_id) -> int - def check_repo_count(user_id) -> LimitCheckResult - def check_repo_size(user_id, file_count, func_count) -> LimitCheckResult - def get_usage_summary(user_id) -> dict - def invalidate_tier_cache(user_id) -> None # Call after tier upgrade -``` - -### Issue #95: Repo Count Limits -**Where**: `POST /api/v1/repos` - -**Changes to `routes/repos.py`**: -```python -@router.post("") -def add_repository(request, auth): - # NEW: Check repo count limit - result = user_limits.check_repo_count(auth.user_id) - if not result.allowed: - raise HTTPException( - status_code=403, - detail=result.to_dict() - ) - # ... existing code -``` - -**Frontend Integration**: -- Call `GET /users/limits/check-repo-add` before showing Add Repo button -- Show "2/3 repos used" in sidebar -- Show upgrade prompt when limit reached - -### Issue #94: Repo Size Limits -**Where**: `POST /api/v1/repos/{id}/index` - -**Changes to `routes/repos.py`**: -```python -@router.post("/{repo_id}/index") -def index_repository(repo_id, auth): - repo = get_repo_or_404(repo_id, auth.user_id) - - # Count files and estimate functions BEFORE indexing - file_count = count_code_files(repo["local_path"]) - estimated_functions = file_count * 25 # Conservative estimate - - # NEW: Check size limits - result = user_limits.check_repo_size( - auth.user_id, file_count, estimated_functions - ) - if not result.allowed: - raise HTTPException( - status_code=400, - detail=result.to_dict() - ) - # ... existing indexing code -``` - -### Issue #93: Playground Rate Limiting -**Where**: `POST /api/v1/playground/search` - -**New File**: `backend/services/playground_rate_limiter.py` -```python -class PlaygroundRateLimiter: - def __init__(self, redis_client): - self.redis = redis_client - self.daily_limit = 50 - - def check_and_increment(self, ip: str) -> tuple[bool, dict]: - """Returns (allowed, headers_dict)""" - key = f"playground:rate:{ip}" - - # Atomic increment - count = self.redis.incr(key) - if count == 1: - self.redis.expire(key, 86400) # 24h TTL - - ttl = self.redis.ttl(key) - reset_time = int(time.time()) + ttl - - headers = { - "X-RateLimit-Limit": str(self.daily_limit), - "X-RateLimit-Remaining": str(max(0, self.daily_limit - count)), - "X-RateLimit-Reset": str(reset_time) - } - - if count > self.daily_limit: - headers["Retry-After"] = str(ttl) - return False, headers - - return True, headers -``` - -**Changes to `routes/playground.py`**: -```python -from fastapi import Request, Response - -@router.post("/search") -def playground_search(request: Request, response: Response, body: SearchRequest): - # Get client IP - ip = request.client.host - forwarded = request.headers.get("X-Forwarded-For") - if forwarded: - ip = forwarded.split(",")[0].strip() - - # Check rate limit - allowed, headers = playground_rate_limiter.check_and_increment(ip) - - # Always add headers - for key, value in headers.items(): - response.headers[key] = value - - if not allowed: - raise HTTPException( - status_code=429, - detail={ - "error": "RATE_LIMIT_EXCEEDED", - "message": "Daily search limit reached. Sign up for unlimited searches!", - "limit": 50, - "reset": headers["X-RateLimit-Reset"] - } - ) - - # ... existing search code -``` - -### Issue #97: Progressive Signup CTAs -**Where**: Frontend only - -**Implementation**: -```typescript -// hooks/usePlaygroundUsage.ts -const usePlaygroundUsage = () => { - const [searchCount, setSearchCount] = useState(0); - - // Read from response headers after each search - const trackSearch = (response: Response) => { - const remaining = response.headers.get('X-RateLimit-Remaining'); - const limit = response.headers.get('X-RateLimit-Limit'); - if (remaining && limit) { - setSearchCount(parseInt(limit) - parseInt(remaining)); - } - }; - - return { searchCount, trackSearch }; -}; - -// Show CTAs at thresholds -// 10 searches: Subtle "Want to search YOUR codebase?" -// 25 searches: More prominent with feature list -// 40 searches: Final "You clearly love this" -``` - ---- - -## 5. Error Response Format - -All limit-related errors use `LimitCheckResult.to_dict()`: - -```json -{ - "detail": { - "allowed": false, - "current": 3, - "limit": 3, - "limit_display": "3", - "message": "Repository limit reached (3/3). Upgrade to add more repositories.", - "tier": "free", - "error_code": "REPO_LIMIT_REACHED" - } -} -``` - -**Error Codes**: -| Code | HTTP Status | Description | -|------|-------------|-------------| -| `REPO_LIMIT_REACHED` | 403 | Max repos for tier | -| `REPO_TOO_LARGE` | 400 | File/function count exceeds tier | -| `RATE_LIMIT_EXCEEDED` | 429 | Playground daily limit | -| `INVALID_USER` | 400 | Invalid or missing user_id | -| `SYSTEM_ERROR` | 500 | Database/system failure | - ---- - -## 6. Database Schema - -### user_profiles (NEW) -```sql -CREATE TABLE user_profiles ( - id UUID PRIMARY KEY, - user_id UUID REFERENCES auth.users(id), - tier TEXT DEFAULT 'free', -- 'free', 'pro', 'enterprise' - created_at TIMESTAMPTZ, - updated_at TIMESTAMPTZ -); -``` - -**Security Notes:** -- RLS enabled with SELECT/INSERT for authenticated users -- NO UPDATE policy for users (prevents self-upgrade) -- Tier updates only via service role key (payment webhooks) - -### repositories (existing, no changes needed) -Already has `user_id` column for ownership. - ---- - -## 7. Fail-Safe Behavior - -| Scenario | Behavior | Reason | -|----------|----------|--------| -| DB down during `check_repo_count` | **DENY** (fail-closed) | Prevent unlimited repos | -| DB down during `get_usage_summary` | Return defaults | Read-only, safe to fail-open | -| Redis cache miss | Query DB | Graceful degradation | -| Redis down | Continue without cache | Non-critical | -| Invalid user_id | Return FREE limits | Safe default | - ---- - -## 8. Redis Keys - -| Key Pattern | TTL | Description | -|-------------|-----|-------------| -| `playground:rate:{ip}` | 24h | Playground search count | -| `user:tier:{user_id}` | 5min | Cached user tier | - ---- - -## 9. Frontend Integration Points - -### Dashboard -- Show usage bar: "2/3 repositories" -- Show tier badge: "Free Tier" -- Upgrade CTA when near limits - -### Add Repository Flow -1. Call `GET /users/limits/check-repo-add` -2. If `allowed: false`, show upgrade modal -3. If `allowed: true`, proceed with add - -### Playground -1. Read rate limit headers from search responses -2. Show remaining searches: "47/50 searches today" -3. Show progressive CTAs at thresholds -4. On 429, show signup modal - ---- - -## 10. Migration Path - -### Existing Users -All existing users default to `free` tier. Migration auto-creates profile on first API call. - -### Existing Repos -No changes needed. Limit checks only apply to NEW repos. - ---- - -## 11. Implementation Order - -| Phase | Issue | Priority | Depends On | -|-------|-------|----------|------------| -| 1 | #96 User tier system | P0 | - | βœ… DONE | -| 2 | #94 Repo size limits | P0 | #96 | -| 2 | #95 Repo count limits | P0 | #96 | -| 3 | #93 Playground rate limit | P1 | Redis | -| 4 | #97 Progressive CTAs | P2 | #93 | - ---- - -## 12. Open Questions - -1. **Upgrade Flow**: Stripe integration? Manual for now? -2. **Existing Large Repos**: Grandfather them or enforce limits? -3. **Team/Org Support**: Future consideration for enterprise? -4. **API Key Users**: Same limits as JWT users? - ---- - -## 13. Files to Create/Modify - -### Create -- [x] `backend/services/user_limits.py` -- [x] `backend/routes/users.py` -- [x] `supabase/migrations/001_user_profiles.sql` -- [ ] `backend/services/playground_rate_limiter.py` -- [ ] `frontend/src/hooks/usePlaygroundUsage.ts` -- [ ] `frontend/src/components/PlaygroundCTA.tsx` -- [ ] `frontend/src/components/UsageBar.tsx` - -### Modify -- [x] `backend/dependencies.py` -- [x] `backend/main.py` -- [ ] `backend/routes/repos.py` - Add limit checks -- [ ] `backend/routes/playground.py` - Add rate limiting -- [ ] `frontend/src/pages/Dashboard.tsx` - Show usage -- [ ] `frontend/src/pages/LandingPage.tsx` - Show CTAs diff --git a/legacy/IndexingProgress.tsx b/legacy/IndexingProgress.tsx deleted file mode 100644 index 76eebfd..0000000 --- a/legacy/IndexingProgress.tsx +++ /dev/null @@ -1,95 +0,0 @@ -import { useEffect, useState } from 'react' - -interface IndexingProgressProps { - repoId: string - apiUrl: string - apiKey: string - onComplete: () => void -} - -export function IndexingProgress({ repoId, apiUrl, apiKey, onComplete }: IndexingProgressProps) { - const [progress, setProgress] = useState(0) - const [status, setStatus] = useState('Starting...') - const [stats, setStats] = useState({ processed: 0, total: 0, functions: 0 }) - - useEffect(() => { - let interval: any - - const checkProgress = async () => { - try { - const response = await fetch(`${apiUrl}/api/repos/${repoId}`, { - headers: { 'Authorization': `Bearer ${apiKey}` } - }) - const repo = await response.json() - - if (repo.status === 'indexed') { - setProgress(100) - setStatus('βœ… Indexing complete!') - clearInterval(interval) - setTimeout(onComplete, 1500) - } else if (repo.status === 'indexing') { - // Estimate progress based on function count growth - const estimatedProgress = Math.min(95, (repo.file_count / 100) * 100) - setProgress(estimatedProgress) - setStatus(`πŸ“Š Indexing... ${repo.file_count} functions processed`) - setStats({ - processed: repo.file_count, - total: 100, - functions: repo.file_count - }) - } - } catch (error) { - console.error('Error checking progress:', error) - } - } - - // Check immediately, then every 2 seconds - checkProgress() - interval = setInterval(checkProgress, 2000) - - return () => clearInterval(interval) - }, [repoId]) - - return ( -
-
-

- Indexing Repository -

- -
-
- {status} - {progress.toFixed(0)}% -
- - {/* Progress Bar */} -
-
-
- - {/* Stats */} - {stats.functions > 0 && ( -
-
-
Functions Found
-
{stats.functions}
-
-
-
Status
-
Processing...
-
-
- )} - -

- Using batch processing for optimal performance -

-
-
-
- ) -} diff --git a/legacy/README.md b/legacy/README.md deleted file mode 100644 index 6bb1569..0000000 --- a/legacy/README.md +++ /dev/null @@ -1,23 +0,0 @@ -# Legacy Code Archive - -This folder contains old implementations that were replaced during development. - -## Files: - -### indexer_old.py -- Original indexer implementation before batch processing optimization -- Replaced by `indexer_optimized.py` which achieves 100x performance improvement -- Kept for reference on the evolution from individual API calls to batch processing - -### repo_manager_old.py -- Original repository manager with in-memory storage -- Replaced by current `repo_manager.py` with Supabase persistence -- Shows the migration from ephemeral to production-grade storage - -### IndexingProgress.tsx -- Original indexing progress component using WebSocket -- Replaced by integrated progress in `RepoOverview.tsx` using shadcn Progress component -- Kept for reference on the WebSocket implementation approach - -**Note:** These files are not imported or used anywhere in the active codebase. -They're preserved for historical reference and to show the development evolution. diff --git a/legacy/indexer_old.py b/legacy/indexer_old.py deleted file mode 100644 index 78fb332..0000000 --- a/legacy/indexer_old.py +++ /dev/null @@ -1,362 +0,0 @@ -""" -Code Indexer -Handles code parsing, embedding generation, and semantic search -""" -import os -from pathlib import Path -from typing import List, Dict, Optional -import asyncio - -# Tree-sitter for parsing -import tree_sitter_python as tspython -import tree_sitter_javascript as tsjavascript -from tree_sitter import Language, Parser - -# AI/ML -from openai import AsyncOpenAI -from pinecone import Pinecone, ServerlessSpec - -# Utils -import hashlib -from dotenv import load_dotenv - -# Import cache service -from services.cache import CacheService - -load_dotenv() - - -class CodeIndexer: - """Index and search code using semantic embeddings""" - - def __init__(self): - # Initialize cache - self.cache = CacheService() - - # Initialize OpenAI - self.openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY")) - - # Initialize Pinecone - pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY")) - - index_name = os.getenv("PINECONE_INDEX_NAME", "codeintel") - - # Create index if it doesn't exist - if index_name not in pc.list_indexes().names(): - print(f"Creating Pinecone index: {index_name}") - pc.create_index( - name=index_name, - dimension=1536, # OpenAI embedding dimension - metric="cosine", - spec=ServerlessSpec( - cloud="aws", - region="us-east-1" - ) - ) - - self.index = pc.Index(index_name) - - # Initialize tree-sitter parsers - self.parsers = { - 'python': self._create_parser(Language(tspython.language())), - 'javascript': self._create_parser(Language(tsjavascript.language())), - 'typescript': self._create_parser(Language(tsjavascript.language())), - } - - print("CodeIndexer initialized!") - - def _create_parser(self, language) -> Parser: - """Create a tree-sitter parser""" - parser = Parser(language) - return parser - - def _detect_language(self, file_path: str) -> Optional[str]: - """Detect programming language from file extension""" - ext = Path(file_path).suffix.lower() - lang_map = { - '.py': 'python', - '.js': 'javascript', - '.jsx': 'javascript', - '.ts': 'typescript', - '.tsx': 'typescript', - } - return lang_map.get(ext) - - def _discover_code_files(self, repo_path: str) -> List[Path]: - """Find all code files in repository""" - repo_path = Path(repo_path) - code_files = [] - - # Extensions to index - extensions = {'.py', '.js', '.jsx', '.ts', '.tsx'} - - # Directories to skip - skip_dirs = {'node_modules', '.git', '__pycache__', 'venv', 'env', 'dist', 'build'} - - for file_path in repo_path.rglob('*'): - # Skip directories - if file_path.is_dir(): - continue - - # Skip if in excluded directory - if any(skip in file_path.parts for skip in skip_dirs): - continue - - # Check extension - if file_path.suffix in extensions: - code_files.append(file_path) - - return code_files - - async def _create_embedding(self, text: str) -> List[float]: - """Generate embedding using OpenAI with caching""" - try: - # Truncate if too long - text = text[:8000] - - # Check cache first - cached = self.cache.get_embedding(text) - if cached: - return cached - - # Generate new embedding - response = await self.openai_client.embeddings.create( - model="text-embedding-3-small", - input=text - ) - embedding = response.data[0].embedding - - # Cache it - self.cache.set_embedding(text, embedding) - - return embedding - except Exception as e: - print(f"Error creating embedding: {e}") - return [0.0] * 1536 - - def _extract_functions(self, tree_node, source_code: bytes) -> List[Dict]: - """Extract function/class definitions from AST""" - functions = [] - - # Function/class node types - target_types = { - 'function_definition', - 'class_definition', - 'function_declaration', - 'method_definition', - 'arrow_function', - } - - if tree_node.type in target_types: - # Extract function name - name_node = None - for child in tree_node.children: - if child.type == 'identifier': - name_node = child - break - - name = source_code[name_node.start_byte:name_node.end_byte].decode('utf-8') if name_node else 'anonymous' - - code = source_code[tree_node.start_byte:tree_node.end_byte].decode('utf-8') - - functions.append({ - 'name': name, - 'type': tree_node.type, - 'code': code, - 'start_line': tree_node.start_point[0], - 'end_line': tree_node.end_point[0], - }) - - # Recursively search children - for child in tree_node.children: - functions.extend(self._extract_functions(child, source_code)) - - return functions - - async def index_repository(self, repo_id: str, repo_path: str): - """Index all code in a repository""" - print(f"Indexing repository: {repo_id} at {repo_path}") - - # Discover code files - code_files = self._discover_code_files(repo_path) - print(f"Found {len(code_files)} code files") - - # Process files in batches - batch_size = 5 - total_functions = 0 - - for i in range(0, len(code_files), batch_size): - batch = code_files[i:i + batch_size] - results = await asyncio.gather( - *[self._index_file(repo_id, str(file_path)) for file_path in batch], - return_exceptions=True - ) - - for result in results: - if isinstance(result, int): - total_functions += result - - print(f"Processed {i + len(batch)}/{len(code_files)} files, {total_functions} functions indexed") - - print(f"Indexing complete! Total functions: {total_functions}") - return total_functions - - async def _index_file(self, repo_id: str, file_path: str) -> int: - """Index a single file""" - try: - # Detect language - language = self._detect_language(file_path) - if not language or language not in self.parsers: - return 0 - - # Read file - with open(file_path, 'rb') as f: - source_code = f.read() - - # Parse with tree-sitter - tree = self.parsers[language].parse(source_code) - - # Extract functions - functions = self._extract_functions(tree.root_node, source_code) - - if not functions: - return 0 - - # Generate embeddings and store in Pinecone - vectors_to_upsert = [] - - for func in functions: - # Create text for embedding - embedding_text = f"Function: {func['name']}\nType: {func['type']}\n\n{func['code']}" - - # Generate embedding - embedding = await self._create_embedding(embedding_text) - - # Create unique ID - func_id = hashlib.md5(f"{repo_id}:{file_path}:{func['start_line']}".encode()).hexdigest() - - # Prepare vector - vectors_to_upsert.append({ - "id": func_id, - "values": embedding, - "metadata": { - "repo_id": repo_id, - "file_path": file_path, - "name": func['name'], - "type": func['type'], - "code": func['code'][:1000], # Limit code length in metadata - "start_line": func['start_line'], - "end_line": func['end_line'], - "language": language - } - }) - - # Upsert to Pinecone - if vectors_to_upsert: - self.index.upsert(vectors=vectors_to_upsert) - - return len(functions) - - except Exception as e: - print(f"Error indexing file {file_path}: {e}") - return 0 - - async def semantic_search( - self, - query: str, - repo_id: str, - max_results: int = 10 - ) -> List[Dict]: - """Search code using semantic similarity with caching""" - try: - # Check cache first - cached_results = self.cache.get_search_results(query, repo_id) - if cached_results: - print(f"βœ… Cache HIT for query: {query[:50]}") - return cached_results - - print(f"❌ Cache MISS for query: {query[:50]}") - - # Generate query embedding (this will use embedding cache) - query_embedding = await self._create_embedding(query) - - # Search Pinecone - results = self.index.query( - vector=query_embedding, - filter={"repo_id": {"$eq": repo_id}}, - top_k=max_results, - include_metadata=True - ) - - # Format results - formatted_results = [] - for match in results.matches: - formatted_results.append({ - "code": match.metadata.get("code", ""), - "file_path": match.metadata.get("file_path", ""), - "name": match.metadata.get("name", ""), - "type": match.metadata.get("type", ""), - "language": match.metadata.get("language", ""), - "score": float(match.score), - "line_start": match.metadata.get("start_line", 0), - "line_end": match.metadata.get("end_line", 0), - }) - - # Cache results - self.cache.set_search_results(query, repo_id, formatted_results) - - return formatted_results - - except Exception as e: - - print(f"Error searching: {e}") - return [] - - async def explain_code( - self, - repo_id: str, - file_path: str, - function_name: Optional[str] = None - ) -> str: - """Generate natural language explanation of code using Claude""" - try: - # Read the file - with open(file_path, 'r') as f: - code_content = f.read() - - # If function_name provided, try to find it - if function_name: - language = self._detect_language(file_path) - if language and language in self.parsers: - tree = self.parsers[language].parse(code_content.encode('utf-8')) - functions = self._extract_functions(tree.root_node, code_content.encode('utf-8')) - - # Find matching function - for func in functions: - if func['name'] == function_name: - code_content = func['code'] - break - - # Use OpenAI to explain (we could use Claude API too) - response = await self.openai_client.chat.completions.create( - model="gpt-4o-mini", # Cheaper and faster - messages=[ - { - "role": "system", - "content": "You are a helpful code explainer. Explain code clearly and concisely, focusing on what it does, how it works, and any important patterns or techniques used." - }, - { - "role": "user", - "content": f"Explain this code:\n\n```\n{code_content}\n```" - } - ], - max_tokens=1000, - temperature=0.3 - ) - - explanation = response.choices[0].message.content - return explanation - - except Exception as e: - print(f"Error explaining code: {e}") - return f"Error generating explanation: {str(e)}" diff --git a/legacy/repo_manager_old.py b/legacy/repo_manager_old.py deleted file mode 100644 index f2c9301..0000000 --- a/legacy/repo_manager_old.py +++ /dev/null @@ -1,125 +0,0 @@ -""" -Repository Manager -Handles repository CRUD operations (in-memory for MVP, later DB) -""" -import uuid -from typing import Dict, List, Optional -import os -import git -from pathlib import Path - - -class RepositoryManager: - """Manage repositories""" - - def __init__(self): - # In-memory storage (Phase 1 MVP) - # Later: replace with PostgreSQL - self.repos: Dict[str, dict] = {} - self.repos_dir = Path("./repos") - self.repos_dir.mkdir(exist_ok=True) - - # Discover existing repositories on startup - self._discover_existing_repos() - - def _discover_existing_repos(self): - """Scan repos directory and load existing repositories""" - if not self.repos_dir.exists(): - return - - for repo_path in self.repos_dir.iterdir(): - if not repo_path.is_dir() or repo_path.name.startswith('.'): - continue - - try: - # Try to open as git repo - repo = git.Repo(repo_path) - - # Get repo info from git config - remote_url = None - if repo.remotes: - remote_url = repo.remotes.origin.url - - # Extract name from URL or use folder name - name = remote_url.split('/')[-1].replace('.git', '') if remote_url else repo_path.name - branch = repo.active_branch.name if not repo.head.is_detached else "main" - - # Count code files to estimate if indexed - code_files = list(repo_path.rglob('*.py')) + list(repo_path.rglob('*.js')) + list(repo_path.rglob('*.ts')) - file_count = len([f for f in code_files if '.git' not in str(f) and 'node_modules' not in str(f)]) - - # Add to repos - self.repos[repo_path.name] = { - "id": repo_path.name, - "name": name, - "git_url": remote_url or "unknown", - "branch": branch, - "local_path": str(repo_path), - "status": "indexed", - "file_count": file_count * 20, - "last_indexed_commit": repo.head.commit.hexsha # Track commit! - } - - print(f"βœ… Discovered existing repo: {name} ({repo_path.name}) - ~{file_count} files") - - except Exception as e: - print(f"⚠️ Skipping {repo_path.name}: {e}") - - def list_repos(self) -> List[dict]: - """List all repositories""" - return list(self.repos.values()) - - def get_repo(self, repo_id: str) -> Optional[dict]: - """Get repository by ID""" - return self.repos.get(repo_id) - - def add_repo(self, name: str, git_url: str, branch: str = "main") -> dict: - """Add a new repository""" - repo_id = str(uuid.uuid4()) - local_path = self.repos_dir / repo_id - - try: - # Clone the repository - print(f"Cloning {git_url} to {local_path}...") - git.Repo.clone_from(git_url, local_path, branch=branch, depth=1) - - repo = { - "id": repo_id, - "name": name, - "git_url": git_url, - "branch": branch, - "local_path": str(local_path), - "status": "cloned", - "file_count": 0 - } - - self.repos[repo_id] = repo - return repo - - except Exception as e: - # Cleanup on failure - if local_path.exists(): - import shutil - shutil.rmtree(local_path) - raise Exception(f"Failed to clone repository: {str(e)}") - - def update_status(self, repo_id: str, status: str): - """Update repository status""" - if repo_id in self.repos: - self.repos[repo_id]["status"] = status - - def update_file_count(self, repo_id: str, count: int): - """Update file count""" - if repo_id in self.repos: - self.repos[repo_id]["file_count"] = count - - def get_last_indexed_commit(self, repo_id: str) -> str: - """Get last indexed commit SHA""" - if repo_id in self.repos: - return self.repos[repo_id].get("last_indexed_commit", "") - return "" - - def update_last_commit(self, repo_id: str, commit_sha: str): - """Update last indexed commit""" - if repo_id in self.repos: - self.repos[repo_id]["last_indexed_commit"] = commit_sha From c8f721f379e0e8b0729a87adf2ddd217b2c8e586 Mon Sep 17 00:00:00 2001 From: Devanshu Rajesh Chicholikar Date: Thu, 8 Jan 2026 17:27:52 -0500 Subject: [PATCH 2/4] chore: reorganize docs - move deployment guides to docs/ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Move DEPLOYMENT.md β†’ docs/deployment.md - Move DOCKER_QUICKSTART.md β†’ docs/docker-quickstart.md - Move DOCKER_TROUBLESHOOTING.md β†’ docs/docker-troubleshooting.md - Rename MCP_SETUP.md β†’ docs/mcp-setup.md (consistent naming) Cleaner root directory, all guides in one place. Part of #180 --- DEPLOYMENT.md => docs/deployment.md | 0 DOCKER_QUICKSTART.md => docs/docker-quickstart.md | 0 DOCKER_TROUBLESHOOTING.md => docs/docker-troubleshooting.md | 0 docs/{MCP_SETUP.md => mcp-setup.md} | 0 4 files changed, 0 insertions(+), 0 deletions(-) rename DEPLOYMENT.md => docs/deployment.md (100%) rename DOCKER_QUICKSTART.md => docs/docker-quickstart.md (100%) rename DOCKER_TROUBLESHOOTING.md => docs/docker-troubleshooting.md (100%) rename docs/{MCP_SETUP.md => mcp-setup.md} (100%) diff --git a/DEPLOYMENT.md b/docs/deployment.md similarity index 100% rename from DEPLOYMENT.md rename to docs/deployment.md diff --git a/DOCKER_QUICKSTART.md b/docs/docker-quickstart.md similarity index 100% rename from DOCKER_QUICKSTART.md rename to docs/docker-quickstart.md diff --git a/DOCKER_TROUBLESHOOTING.md b/docs/docker-troubleshooting.md similarity index 100% rename from DOCKER_TROUBLESHOOTING.md rename to docs/docker-troubleshooting.md diff --git a/docs/MCP_SETUP.md b/docs/mcp-setup.md similarity index 100% rename from docs/MCP_SETUP.md rename to docs/mcp-setup.md From 48de6b0d606a7b953c25b80a15eec4f889b18a6f Mon Sep 17 00:00:00 2001 From: Devanshu Rajesh Chicholikar Date: Thu, 8 Jan 2026 17:28:39 -0500 Subject: [PATCH 3/4] chore: add developer config files - Add .nvmrc (Node 20) - Add .python-version (Python 3.11) - Add .editorconfig (consistent code style) - Add .github/dependabot.yml (automated dependency updates) Makes contributor setup easier and keeps dependencies fresh. Part of #180 --- .editorconfig | 21 +++++++++++++++++++++ .github/dependabot.yml | 34 ++++++++++++++++++++++++++++++++++ .nvmrc | 1 + .python-version | 1 + 4 files changed, 57 insertions(+) create mode 100644 .editorconfig create mode 100644 .github/dependabot.yml create mode 100644 .nvmrc create mode 100644 .python-version diff --git a/.editorconfig b/.editorconfig new file mode 100644 index 0000000..b78aca5 --- /dev/null +++ b/.editorconfig @@ -0,0 +1,21 @@ +# EditorConfig helps maintain consistent coding styles +# https://editorconfig.org + +root = true + +[*] +indent_style = space +indent_size = 2 +end_of_line = lf +charset = utf-8 +trim_trailing_whitespace = true +insert_final_newline = true + +[*.py] +indent_size = 4 + +[*.md] +trim_trailing_whitespace = false + +[Makefile] +indent_style = tab diff --git a/.github/dependabot.yml b/.github/dependabot.yml new file mode 100644 index 0000000..26ad928 --- /dev/null +++ b/.github/dependabot.yml @@ -0,0 +1,34 @@ +version: 2 +updates: + # Python dependencies + - package-ecosystem: "pip" + directory: "/backend" + schedule: + interval: "weekly" + commit-message: + prefix: "chore(deps)" + labels: + - "dependencies" + - "python" + + # JavaScript dependencies + - package-ecosystem: "npm" + directory: "/frontend" + schedule: + interval: "weekly" + commit-message: + prefix: "chore(deps)" + labels: + - "dependencies" + - "javascript" + + # GitHub Actions + - package-ecosystem: "github-actions" + directory: "/" + schedule: + interval: "monthly" + commit-message: + prefix: "chore(deps)" + labels: + - "dependencies" + - "ci" diff --git a/.nvmrc b/.nvmrc new file mode 100644 index 0000000..209e3ef --- /dev/null +++ b/.nvmrc @@ -0,0 +1 @@ +20 diff --git a/.python-version b/.python-version new file mode 100644 index 0000000..2c07333 --- /dev/null +++ b/.python-version @@ -0,0 +1 @@ +3.11 From ca7ed959d2c5159ec3c74dbbe9f0a0f70eeef431 Mon Sep 17 00:00:00 2001 From: Devanshu Rajesh Chicholikar Date: Thu, 8 Jan 2026 17:33:06 -0500 Subject: [PATCH 4/4] docs: complete README overhaul and fix internal links MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit README.md: - Rename to OpenCodeIntel (consistent branding) - Add badges (CI, license, release) - Add Quick Links navigation - Remove emoji headers - Cleaner structure: features β†’ quickstart β†’ docs - Leave placeholder for logo and demo screenshot - Concise, no fluff CONTRIBUTING.md: - Update repo name to opencodeintel - Remove emoji docs/: - Fix cross-references to renamed files - Convert to proper markdown links Part of #180 --- CONTRIBUTING.md | 10 +- README.md | 308 ++++++++------------------------- docs/docker-quickstart.md | 8 +- docs/docker-troubleshooting.md | 2 +- 4 files changed, 86 insertions(+), 242 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8f63c02..11f7191 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,13 +1,13 @@ -# Contributing to CodeIntel +# Contributing to OpenCodeIntel -First off, thanks for considering contributing! CodeIntel is better because of people like you. +Thanks for considering contributing! OpenCodeIntel is better because of people like you. ## Quick Start ```bash # Fork the repo, then clone -git clone https://github.com/YOUR_USERNAME/codeintel-mcp -cd codeintel-mcp +git clone https://github.com/YOUR_USERNAME/opencodeintel +cd opencodeintel # Set up backend cd backend @@ -138,4 +138,4 @@ Be respectful, constructive, and collaborative. We're all here to build somethin --- -**Thanks for contributing! πŸš€** +**Thanks for contributing!** diff --git a/README.md b/README.md index a160343..a38b5eb 100644 --- a/README.md +++ b/README.md @@ -1,154 +1,99 @@ -# CodeIntel MCP +

+ +

OpenCodeIntel

+

+ +

+ AI-powered semantic code search for your repositories +

+ +

+ + CI Status + + + License + + + Release + +

+ +

+ Quick Start β€’ + Deployment β€’ + MCP Integration β€’ + Contributing +

-**MCP server for AI-powered codebase intelligence.** Semantic search, dependency analysis, and impact prediction for your repositories. - -## The Problem - -AI coding assistants are powerful, but they're flying blind in large codebases: -- Can't semantically search across thousands of files -- Don't understand dependency relationships -- Can't predict what breaks when you change a file -- Have no context on team coding patterns - -## The Solution - -CodeIntel is an MCP (Model Context Protocol) server that gives AI agents deep codebase understanding: - -```typescript -// Ask Claude (via MCP): -"Find authentication middleware in this repo" - -// CodeIntel semantically searches 10,000+ functions -// Returns exact implementations, not keyword matches -``` - -**Built for production. Not a demo.** - -## Key Features - -### πŸ” Semantic Code Search -Search by meaning, not keywords. Find `"error handling logic"` even if functions are named `processFailure()`. - -### πŸ“Š Dependency Analysis -Visualize your entire codebase architecture. See which files are critical, which are isolated, and how everything connects. +--- -### ⚑ Impact Prediction -Before changing a file, know exactly what breaks: -``` -src/auth/middleware.py -└─ 15 files affected (HIGH RISK) - β”œβ”€ src/api/routes.py - β”œβ”€ src/services/user.py - └─ ... + 12 more -``` + + -### 🎨 Code Style Analysis -Understand team patterns: naming conventions (camelCase vs snake_case), async adoption %, type hint usage. +## What is OpenCodeIntel? -### πŸš€ Performance That Scales +OpenCodeIntel gives AI coding assistants deep understanding of your codebase. It's an MCP server that provides semantic code search, dependency analysis, and impact prediction. -**Batch Processing:** 100x faster indexing -- Before: 40+ min for 1,000 functions (individual API calls) -- After: 22.9 sec (batch embedding requests) +**Search by meaning, not keywords.** Find "error handling logic" even when functions are named `processFailure()`. -**Incremental Indexing:** 700x faster re-indexing -- Full re-index: 51.4s -- Incremental (git diff): 0.07s -- Perfect for active development +## Features -**Supabase Caching:** 5x search speedup -- Cold search: 800ms -- Cached: 150ms +- **Semantic Search** - Vector-based code search that understands intent +- **Dependency Graph** - Visualize how your codebase connects +- **Impact Analysis** - Know what breaks before you change a file +- **Code Style Analysis** - Understand team patterns and conventions +- **MCP Integration** - Works directly with Claude Desktop ## Quick Start -### 🐳 Docker (Recommended) - -**Fastest way to get started:** +### Using Docker (Recommended) ```bash -# 1. Clone repo git clone https://github.com/OpenCodeIntel/opencodeintel.git cd opencodeintel -# 2. Configure environment cp .env.example .env -# Edit .env with your API keys +# Add your API keys to .env -# 3. Start everything docker compose up -d - -# Frontend: http://localhost:3000 -# Backend: http://localhost:8000 -# Docs: http://localhost:8000/docs ``` -**Full guide:** [`DOCKER_QUICKSTART.md`](./DOCKER_QUICKSTART.md) -**Troubleshooting:** [`DOCKER_TROUBLESHOOTING.md`](./DOCKER_TROUBLESHOOTING.md) +- Frontend: http://localhost:3000 +- Backend: http://localhost:8000 +- API Docs: http://localhost:8000/docs ---- - -### πŸ“¦ Manual Setup +### Manual Setup -### Prerequisites -- Python 3.11+ -- Node.js 20+ -- OpenAI API key -- Pinecone account -- Supabase project - -### 1. Clone & Setup Backend +**Requirements:** Python 3.11+, Node.js 20+ ```bash +# Backend cd backend python -m venv venv -source venv/bin/activate # Windows: venv\Scripts\activate +source venv/bin/activate pip install -r requirements.txt - -# Configure .env cp .env.example .env -# Add your API keys to .env -``` - -### 2. Run Backend - -```bash python main.py -# Server runs on http://localhost:8000 -``` -### 3. Setup Frontend - -```bash +# Frontend (new terminal) cd frontend npm install npm run dev -# UI at http://localhost:5173 -``` - -### 4. Add a Repository - -```bash -# Via API -curl -X POST http://localhost:8000/api/repos \ - -H "Authorization: Bearer dev-secret-key" \ - -H "Content-Type: application/json" \ - -d '{"name": "zustand", "git_url": "https://github.com/pmndrs/zustand"}' - -# Or use the web UI ``` ## MCP Integration -CodeIntel works as an MCP server with Claude Desktop. **[πŸ“š Full MCP Setup Guide](./docs/MCP_SETUP.md)** +Connect OpenCodeIntel to Claude Desktop for AI-powered code assistance. -**Quick Setup:** +Add to your Claude Desktop config: ```json -// Add to Claude Desktop config { "mcpServers": { - "codeintel": { + "opencodeintel": { "command": "python", "args": ["/path/to/opencodeintel/mcp-server/server.py"], "env": { @@ -160,142 +105,41 @@ CodeIntel works as an MCP server with Claude Desktop. **[πŸ“š Full MCP Setup Gui } ``` -**Available MCP Tools:** -| Tool | Description | -|------|-------------| -| `search_code` | Semantic code search - finds code by meaning | -| `list_repositories` | View all indexed repos | -| `get_dependency_graph` | Visualize architecture and file connections | -| `analyze_code_style` | Team conventions and patterns | -| `analyze_impact` | Know what breaks before you change it | -| `get_repository_insights` | High-level codebase overview | - -Now ask Claude: *"What's the authentication logic in the user service?"* and it searches your actual codebase. +**Available tools:** `search_code`, `list_repositories`, `get_dependency_graph`, `analyze_code_style`, `analyze_impact`, `get_repository_insights` -**[β†’ Complete setup guide with troubleshooting](./docs/MCP_SETUP.md)** +See [MCP Setup Guide](./docs/mcp-setup.md) for detailed instructions. ## Architecture ``` -β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” -β”‚ Frontend β”‚ React + TypeScript + Tailwind -β”‚ (Vite app) β”‚ Dependency graphs, search UI -β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ - β”‚ -β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” -β”‚ FastAPI β”‚ Python backend -β”‚ Backend β”‚ /api/search, /api/repos/{id}/dependencies -β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ - β”‚ - β”œβ”€β”€β”€β”€β”€β–Ί Pinecone (vector search) - β”œβ”€β”€β”€β”€β”€β–Ί OpenAI (embeddings) - β”œβ”€β”€β”€β”€β”€β–Ί Supabase (persistence) - └─────► Redis (caching) -``` - -**Tech Stack:** -- **Backend:** FastAPI, tree-sitter (AST parsing), OpenAI embeddings -- **Vector DB:** Pinecone for semantic search -- **Database:** Supabase (PostgreSQL) for metadata + caching -- **Cache:** Redis for 5x search speedup -- **Frontend:** React, TypeScript, Tailwind CSS, shadcn/ui, ReactFlow - -## Performance Benchmarks - -Real numbers from indexing the Zustand repository (1,174 functions): - -| Metric | Value | -|--------|-------| -| Full indexing | 29.5s (39.7 functions/sec) | -| Incremental re-index | 0.07s (700x faster) | -| Batch embedding | 22.9s for 1,174 functions | -| Search (cold) | 800ms | -| Search (cached) | 150ms | - -## Use Cases - -**For AI Agents (via MCP):** -- Semantic code search during pair programming -- Understanding unfamiliar codebases -- Finding implementation patterns -- Impact analysis before refactoring - -**For Development Teams:** -- Onboarding new engineers (visualize architecture) -- Code review prep (see change blast radius) -- Tech debt identification (find highly coupled files) -- Pattern enforcement (analyze style consistency) - -## What Makes This Different - -**Most code search tools:** Keyword matching (grep, GitHub search) -**CodeIntel:** Understands *meaning* - finds `error handling` even if the function is called `processFailure()` - -**Most dependency tools:** Static analysis only -**CodeIntel:** Combines AST parsing + semantic understanding + impact prediction - -**Most demos:** In-memory, doesn't scale -**CodeIntel:** Production-grade with Supabase persistence, Redis caching, incremental indexing - -## Deployment - -### 🐳 Local Development (Docker) -```bash -# Start all services -make dev - -# Or using docker compose -docker compose -f docker-compose.dev.yml up -d - -# Services available at: -# - Backend: http://localhost:8000 -# - Frontend: http://localhost:3000 -# - API Docs: http://localhost:8000/docs -``` - -### ☁️ Production Deployment - -**Backend + Redis β†’ Railway** -```bash -# Automated deployment -./scripts/deploy-railway.sh - -# Or manually: -railway login -railway init -railway up +Frontend (React + TypeScript) + ↓ +Backend (FastAPI + Python) + ↓ +β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +Pinecone Supabase Redis +(vectors) (database) (cache) ``` -**Frontend β†’ Vercel** -```bash -# Automated deployment -./scripts/deploy-vercel.sh +**Stack:** FastAPI, React, TypeScript, Pinecone, Supabase, Redis, tree-sitter -# Or manually: -cd frontend -vercel --prod -``` +## Documentation -**πŸ“š Full deployment guide:** See [DEPLOYMENT.md](DEPLOYMENT.md) for complete instructions, environment variables, and troubleshooting. +| Guide | Description | +|-------|-------------| +| [Docker Quickstart](./docs/docker-quickstart.md) | Get running in 5 minutes | +| [Deployment](./docs/deployment.md) | Production deployment guide | +| [MCP Setup](./docs/mcp-setup.md) | Claude Desktop integration | +| [Docker Troubleshooting](./docs/docker-troubleshooting.md) | Common issues and fixes | ## Contributing -Built in a focused 2-week sprint to demonstrate production-grade AI development tooling. +We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines. -Contributions welcome! Areas for improvement: -- Support for more languages (currently: Python, JS/TS) -- Advanced graph algorithms (find circular dependencies, suggest refactorings) -- GitHub integration (PR impact analysis) -- Team analytics (who writes what patterns) +**Quick links:** +- [Open Issues](https://github.com/OpenCodeIntel/opencodeintel/issues) +- [Good First Issues](https://github.com/OpenCodeIntel/opencodeintel/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) ## License -MIT License - use it, fork it, build on it. - -## Built With - -Commitment to shipping production-grade AI tools. Not a side project. Not a demo. Real infrastructure that scales. - ---- - -**Questions?** Open an issue or reach out. +MIT License - see [LICENSE](./LICENSE) for details. diff --git a/docs/docker-quickstart.md b/docs/docker-quickstart.md index f0b113c..b888b83 100644 --- a/docs/docker-quickstart.md +++ b/docs/docker-quickstart.md @@ -116,7 +116,7 @@ lsof -i :8000 **Issue:** Environment variables not found **Fix:** Make sure `.env` exists in project root (not just backend/) -**Full troubleshooting guide:** See `DOCKER_TROUBLESHOOTING.md` +**Full troubleshooting guide:** See [docker-troubleshooting.md](./docker-troubleshooting.md) ## Development Mode @@ -131,7 +131,7 @@ docker compose -f docker-compose.dev.yml up ## Next Steps -- πŸ“– Read full deployment guide: `DEPLOYMENT.md` +- Read full deployment guide: [deployment.md](./deployment.md) - πŸš€ Deploy to Railway: `./scripts/deploy-railway.sh` - 🌐 Deploy to Vercel: `./scripts/deploy-vercel.sh` - πŸ§ͺ Run tests: See `backend/README.md` @@ -186,11 +186,11 @@ Once local dev works, deploy to production: ./scripts/deploy-vercel.sh ``` -Full deployment guide: `DEPLOYMENT.md` +Full deployment guide: [deployment.md](./deployment.md) --- **Need help?** -- πŸ“– Check `DOCKER_TROUBLESHOOTING.md` +- Check [docker-troubleshooting.md](./docker-troubleshooting.md) - πŸ› Open an issue: https://github.com/OpenCodeIntel/opencodeintel/issues - πŸ“ See full docs: `README.md` diff --git a/docs/docker-troubleshooting.md b/docs/docker-troubleshooting.md index 2e922ee..62f3806 100644 --- a/docs/docker-troubleshooting.md +++ b/docs/docker-troubleshooting.md @@ -278,6 +278,6 @@ docker compose exec backend curl http://backend:8000/health 1. Check GitHub Issues: https://github.com/OpenCodeIntel/opencodeintel/issues 2. Run verification script: `./scripts/verify-setup.sh` -3. Check DEPLOYMENT.md for step-by-step instructions +3. Check [deployment.md](./deployment.md) for step-by-step instructions 4. Make sure Docker Desktop has enough resources (Settings β†’ Resources) - Recommended: 4GB RAM, 2 CPUs minimum