From 4d908a0f53229cc10195ea3cf690dcb245eb6c4b Mon Sep 17 00:00:00 2001
From: Devanshu Rajesh Chicholikar
Date: Thu, 8 Jan 2026 17:27:31 -0500
Subject: [PATCH 1/4] chore: remove legacy code and internal docs
- Remove legacy/ folder (old unused code)
- Remove SETUP_COMPLETE.md (internal doc)
- Remove docs/HANDOFF-114.md (internal handoff)
- Remove docs/TIER_SYSTEM_DESIGN.md (internal design doc)
Part of #180
---
SETUP_COMPLETE.md | 279 --------------------------
docs/HANDOFF-114.md | 60 ------
docs/TIER_SYSTEM_DESIGN.md | 387 ------------------------------------
legacy/IndexingProgress.tsx | 95 ---------
legacy/README.md | 23 ---
legacy/indexer_old.py | 362 ---------------------------------
legacy/repo_manager_old.py | 125 ------------
7 files changed, 1331 deletions(-)
delete mode 100644 SETUP_COMPLETE.md
delete mode 100644 docs/HANDOFF-114.md
delete mode 100644 docs/TIER_SYSTEM_DESIGN.md
delete mode 100644 legacy/IndexingProgress.tsx
delete mode 100644 legacy/README.md
delete mode 100644 legacy/indexer_old.py
delete mode 100644 legacy/repo_manager_old.py
diff --git a/SETUP_COMPLETE.md b/SETUP_COMPLETE.md
deleted file mode 100644
index 80ffe6a..0000000
--- a/SETUP_COMPLETE.md
+++ /dev/null
@@ -1,279 +0,0 @@
-# π CodeIntel Docker & Deployment Setup Complete!
-
-## β
What's Ready
-
-### 1. Docker Configuration
-- β
`docker-compose.yml` - Production setup
-- β
`docker-compose.dev.yml` - Development with hot reload
-- β
Backend `Dockerfile` - Multi-stage, optimized
-- β
Frontend `Dockerfile` - Nginx production build
-- β
Root `.env` file - All API keys configured
-- β
`.gitignore` updated - API keys won't leak
-
-### 2. Deployment Files
-- β
`DEPLOYMENT.md` - Complete deployment guide (337 lines)
-- β
`DOCKER_QUICKSTART.md` - 5-minute quick start (197 lines)
-- β
`DOCKER_TROUBLESHOOTING.md` - Common issues & fixes (284 lines)
-- β
`railway.json` - Railway config
-- β
Deployment scripts (executable):
- - `scripts/deploy-railway.sh` - Backend to Railway
- - `scripts/deploy-vercel.sh` - Frontend to Vercel
- - `scripts/verify-setup.sh` - Pre-deployment checks
-
-### 3. Developer Experience
-- β
`Makefile` - 20+ commands for dev workflow
-- β
README updated - Docker section added
-- β
Health checks - All services monitored
-- β
Graceful restarts - No data loss
-- β
Redis persistence - AOF enabled
-
-## π Quick Start Commands
-
-### Local Development
-```bash
-# Verify setup
-./scripts/verify-setup.sh
-
-# Start everything
-make dev
-# OR
-docker compose up -d
-
-# View logs
-make logs
-
-# Stop
-make stop
-```
-
-**Access at:**
-- Frontend: http://localhost:3000
-- Backend: http://localhost:8000
-- API Docs: http://localhost:8000/docs
-- Redis: localhost:6379
-
-### Production Deployment
-
-**Option 1: Automated Scripts**
-```bash
-# Deploy backend to Railway
-./scripts/deploy-railway.sh
-
-# Deploy frontend to Vercel
-./scripts/deploy-vercel.sh
-```
-
-**Option 2: Makefile**
-```bash
-make deploy-backend
-make deploy-frontend
-# OR
-make deploy-all
-```
-
-**Option 3: Manual**
-See `DEPLOYMENT.md` for step-by-step guide
-
-## π Pre-Deployment Checklist
-
-Before deploying to production, make sure:
-
-- [ ] Docker Desktop is running
-- [ ] All API keys are set in `.env`
-- [ ] Tests passing: `make test`
-- [ ] Local Docker works: `make dev`
-- [ ] Health check passes: `make health`
-- [ ] Railway CLI installed: `npm i -g @railway/cli`
-- [ ] Vercel CLI installed: `npm i -g vercel`
-- [ ] Changed `API_KEY` from default value
-- [ ] Supabase RLS policies configured
-- [ ] Read through `DEPLOYMENT.md`
-
-## π― Next Steps
-
-### 1. Test Locally
-```bash
-# Start services
-make dev
-
-# In another terminal, run tests
-make test
-
-# Check everything is healthy
-make health
-```
-
-### 2. Deploy Backend (Railway)
-```bash
-# Automated
-./scripts/deploy-railway.sh
-
-# Follow prompts to:
-# - Login to Railway
-# - Create/link project
-# - Add Redis service
-# - Set environment variables
-# - Deploy
-```
-
-### 3. Deploy Frontend (Vercel)
-```bash
-# Get your Railway backend URL first
-railway domain
-
-# Then deploy frontend
-./scripts/deploy-vercel.sh
-
-# Enter Railway URL when prompted
-```
-
-### 4. Configure Production
-After deployment:
-1. Update CORS in `backend/main.py` with Vercel URL
-2. Test all endpoints work
-3. Monitor logs: `railway logs -f`
-4. Set up custom domains (optional)
-
-## π Documentation Reference
-
-| Document | Purpose |
-|----------|---------|
-| `README.md` | Project overview, features, quick start |
-| `DOCKER_QUICKSTART.md` | Get running in 5 minutes |
-| `DOCKER_TROUBLESHOOTING.md` | Fix common Docker issues |
-| `DEPLOYMENT.md` | Complete deployment guide |
-| `SECURITY.md` | Security practices & vulnerability reporting |
-| `CONTRIBUTING.md` | How to contribute |
-
-## π§ Useful Commands
-
-### Docker
-```bash
-make dev # Start dev environment
-make prod # Start production environment
-make logs # View all logs
-make stop # Stop services
-make clean # Nuclear option - remove everything
-make health # Check service health
-make restart-backend # Quick backend restart
-```
-
-### Testing
-```bash
-make test # Run tests
-make test-watch # Watch mode
-make coverage # Coverage report
-```
-
-### Deployment
-```bash
-make deploy-backend # Deploy to Railway
-make deploy-frontend # Deploy to Vercel
-make deploy-all # Deploy everything
-```
-
-### Debugging
-```bash
-make shell-backend # Bash into backend container
-make shell-redis # Redis CLI
-make redis-stats # View Redis info
-docker compose ps # Check container status
-docker compose logs -f backend # Follow backend logs
-```
-
-## π Common Issues
-
-| Issue | Quick Fix |
-|-------|-----------|
-| Docker daemon not running | Open Docker Desktop |
-| Port already in use | `lsof -i :8000` and kill process |
-| Env vars not found | Make sure `.env` exists in project root |
-| Build fails | `make clean && make build` |
-| Services keep restarting | Check logs: `make logs` |
-
-**Full troubleshooting:** See `DOCKER_TROUBLESHOOTING.md`
-
-## π What Got Built
-
-### Architecture
-```
-βββββββββββββββ βββββββββββββββ βββββββββββββββ
-β Frontend βββββββΆβ Backend βββββββΆβ Redis β
-β Vite+React β β FastAPI β β Cache β
-β Port 3000 β β Port 8000 β β Port 6379 β
-βββββββββββββββ βββββββββββββββ βββββββββββββββ
- β
- ββββββΆ Supabase (Postgres)
- ββββββΆ Pinecone (Vectors)
-```
-
-### Files Created/Updated
-- β
`.env` - Root environment variables
-- β
`docker-compose.yml` - Production services (removed obsolete `version`)
-- β
`docker-compose.dev.yml` - Dev services (removed obsolete `version`)
-- β
`DOCKER_QUICKSTART.md` - Quick start guide
-- β
`DOCKER_TROUBLESHOOTING.md` - Troubleshooting guide
-- β
`scripts/verify-setup.sh` - Pre-deployment verification (made executable)
-- β
`README.md` - Added Docker quick start section
-
-### Already Existing (Verified Working)
-- β
`backend/Dockerfile` - Production-ready
-- β
`frontend/Dockerfile` - Multi-stage build with nginx
-- β
`railway.json` - Railway configuration
-- β
`DEPLOYMENT.md` - Comprehensive deployment guide
-- β
`Makefile` - Developer commands
-- β
`scripts/deploy-railway.sh` - Railway deployment
-- β
`scripts/deploy-vercel.sh` - Vercel deployment
-
-## π What You Learned
-
-This setup demonstrates:
-1. **Production-grade Docker Compose** - Multi-service orchestration
-2. **Multi-stage builds** - Optimized image sizes
-3. **Health checks** - Service monitoring
-4. **Environment management** - Secrets handling
-5. **Deployment automation** - Scripts for Railway/Vercel
-6. **Developer experience** - Makefile commands, hot reload
-7. **Documentation** - Comprehensive guides for users
-
-## π° Expected Costs
-
-**Hobby/Free Tier:**
-- Railway: $5/month credit (backend + Redis)
-- Vercel: Free for personal projects
-- **Total: $0-5/month**
-
-**Production:**
-- Railway Pro: $20/month
-- Vercel Pro: $20/month
-- OpenAI API: ~$10-50/month
-- Pinecone Starter: $70/month
-- **Total: ~$120-160/month**
-
-## π You're Ready!
-
-Your CodeIntel project is now:
-- β
Docker Compose ready for local dev
-- β
Production-ready Dockerfiles
-- β
Deployment scripts for Railway + Vercel
-- β
Comprehensive documentation
-- β
Developer-friendly tooling
-
-**Start building:**
-```bash
-make dev
-open http://localhost:3000
-```
-
-**Deploy to production:**
-```bash
-./scripts/verify-setup.sh # Verify first
-./scripts/deploy-railway.sh # Deploy backend
-./scripts/deploy-vercel.sh # Deploy frontend
-```
-
----
-
-**Questions?** Check `DOCKER_TROUBLESHOOTING.md` or open an issue on GitHub.
-
-**Ready to ship!** π
diff --git a/docs/HANDOFF-114.md b/docs/HANDOFF-114.md
deleted file mode 100644
index a938fc6..0000000
--- a/docs/HANDOFF-114.md
+++ /dev/null
@@ -1,60 +0,0 @@
-# Handoff: Anonymous Indexing (#114)
-
-## TL;DR
-Let users index their own GitHub repos without signup. 5 backend endpoints needed.
-
-## GitHub Issues (Full Specs)
-- **#124** - Validate GitHub URL
-- **#125** - Start anonymous indexing
-- **#126** - Get indexing status
-- **#127** - Extend session management
-- **#128** - Update search for user repos
-
-**Read these first.** Each has request/response schemas, implementation notes, acceptance criteria.
-
-## Order of Work
-```
-#127 + #124 (parallel) β #125 β #126 β #128
-```
-
-## Key Files to Understand
-
-| File | What It Does |
-|------|--------------|
-| `backend/config/api.py` | API versioning (`/api/v1/*`) |
-| `backend/routes/playground.py` | Existing playground endpoints |
-| `backend/services/playground_limiter.py` | Session + rate limiting |
-| `backend/services/repo_validator.py` | File counting, extensions |
-| `backend/dependencies.py` | Indexer, cache, redis_client |
-
-## Constraints (Anonymous Users)
-- 200 files max
-- 1 repo per session
-- 50 searches per session
-- 24hr TTL
-
-## Workflow
-See `CONTRIBUTING.md` for full guide.
-
-**Quick version:**
-```bash
-# Create branch
-git checkout -b feat/124-validate-repo
-
-# Make changes, test
-pytest tests/ -v
-
-# Commit
-git add .
-git commit -m "feat(playground): add validate-repo endpoint"
-
-# Push to YOUR fork
-git push origin feat/124-validate-repo
-
-# Create PR on OpenCodeIntel/opencodeintel
-# Reference issue: "Closes #124"
-```
-
-## Questions?
-- Check GitHub issues first
-- Ping Devanshu for blockers
diff --git a/docs/TIER_SYSTEM_DESIGN.md b/docs/TIER_SYSTEM_DESIGN.md
deleted file mode 100644
index d5b093f..0000000
--- a/docs/TIER_SYSTEM_DESIGN.md
+++ /dev/null
@@ -1,387 +0,0 @@
-# User Tier & Limits System - Design Document
-
-> **Issues**: #93, #94, #95, #96, #97
-> **Author**: Devanshu
-> **Status**: Implemented
-> **Last Updated**: 2025-12-13
-
----
-
-## 1. Problem Statement
-
-CodeIntel needs a tiered system to:
-1. **Protect costs** - Indexing is expensive ($0.02-$50/repo depending on size)
-2. **Enable growth** - Freemium model with upgrade path
-3. **Prevent abuse** - Rate limit anonymous playground users
-
-**Key Insight**: Searching is nearly free ($0.000001/query). Indexing is the real cost driver.
-
----
-
-## 2. Tier Definitions
-
-| Tier | Max Repos | Files/Repo | Functions/Repo | Playground/Day |
-|------|-----------|------------|----------------|----------------|
-| **Free** | 3 | 500 | 2,000 | 50 |
-| **Pro** | 20 | 5,000 | 20,000 | Unlimited |
-| **Enterprise** | Unlimited | 50,000 | 200,000 | Unlimited |
-
-**Rationale**:
-- Free tier: Enough for personal projects, not enterprise codebases
-- Playground limit: 50/day is generous (anti-abuse, not business gate)
-- File/function limits: Prevent expensive indexing jobs
-
----
-
-## 3. Current API Endpoints
-
-### 3.1 Authentication (`/api/v1/auth`)
-| Method | Endpoint | Auth | Description |
-|--------|----------|------|-------------|
-| POST | `/signup` | None | Create account |
-| POST | `/login` | None | Get JWT |
-| POST | `/refresh` | JWT | Refresh token |
-| POST | `/logout` | JWT | Invalidate session |
-| GET | `/me` | JWT | Get current user |
-
-### 3.2 Repositories (`/api/v1/repos`)
-| Method | Endpoint | Auth | Description | **Limits Check** |
-|--------|----------|------|-------------|------------------|
-| GET | `/` | JWT | List user repos | - |
-| POST | `/` | JWT | Add repo | **#95: Check repo count** |
-| POST | `/{id}/index` | JWT | Index repo | **#94: Check file/function count** |
-
-### 3.3 Search (`/api/v1/search`)
-| Method | Endpoint | Auth | Description | **Limits Check** |
-|--------|----------|------|-------------|------------------|
-| POST | `/search` | JWT | Search code | - |
-| POST | `/explain` | JWT | Explain code | - |
-
-### 3.4 Playground (`/api/v1/playground`) - **Anonymous**
-| Method | Endpoint | Auth | Description | **Limits Check** |
-|--------|----------|------|-------------|------------------|
-| GET | `/repos` | None | List demo repos | - |
-| POST | `/search` | None | Search demo repos | **#93: Rate limit 50/day** |
-
-### 3.5 Analysis (`/api/v1/analysis`)
-| Method | Endpoint | Auth | Description |
-|--------|----------|------|-------------|
-| GET | `/{id}/dependencies` | JWT | Dependency graph |
-| POST | `/{id}/impact` | JWT | Impact analysis |
-| GET | `/{id}/insights` | JWT | Repo insights |
-| GET | `/{id}/style-analysis` | JWT | Code style |
-
-### 3.6 Users (`/api/v1/users`) - **NEW**
-| Method | Endpoint | Auth | Description |
-|--------|----------|------|-------------|
-| GET | `/usage` | JWT | Get tier, limits, current usage |
-| GET | `/limits/check-repo-add` | JWT | Pre-check before adding repo |
-
----
-
-## 4. Implementation Plan by Issue
-
-### Issue #96: User Tier System (Foundation) β
DONE
-**Files Created**:
-- `backend/services/user_limits.py` - Core service
-- `backend/routes/users.py` - API endpoints
-- `supabase/migrations/001_user_profiles.sql` - DB schema
-
-**Service Methods**:
-```python
-class UserLimitsService:
- def get_user_tier(user_id) -> UserTier
- def get_user_limits(user_id) -> TierLimits
- def get_user_repo_count(user_id) -> int
- def check_repo_count(user_id) -> LimitCheckResult
- def check_repo_size(user_id, file_count, func_count) -> LimitCheckResult
- def get_usage_summary(user_id) -> dict
- def invalidate_tier_cache(user_id) -> None # Call after tier upgrade
-```
-
-### Issue #95: Repo Count Limits
-**Where**: `POST /api/v1/repos`
-
-**Changes to `routes/repos.py`**:
-```python
-@router.post("")
-def add_repository(request, auth):
- # NEW: Check repo count limit
- result = user_limits.check_repo_count(auth.user_id)
- if not result.allowed:
- raise HTTPException(
- status_code=403,
- detail=result.to_dict()
- )
- # ... existing code
-```
-
-**Frontend Integration**:
-- Call `GET /users/limits/check-repo-add` before showing Add Repo button
-- Show "2/3 repos used" in sidebar
-- Show upgrade prompt when limit reached
-
-### Issue #94: Repo Size Limits
-**Where**: `POST /api/v1/repos/{id}/index`
-
-**Changes to `routes/repos.py`**:
-```python
-@router.post("/{repo_id}/index")
-def index_repository(repo_id, auth):
- repo = get_repo_or_404(repo_id, auth.user_id)
-
- # Count files and estimate functions BEFORE indexing
- file_count = count_code_files(repo["local_path"])
- estimated_functions = file_count * 25 # Conservative estimate
-
- # NEW: Check size limits
- result = user_limits.check_repo_size(
- auth.user_id, file_count, estimated_functions
- )
- if not result.allowed:
- raise HTTPException(
- status_code=400,
- detail=result.to_dict()
- )
- # ... existing indexing code
-```
-
-### Issue #93: Playground Rate Limiting
-**Where**: `POST /api/v1/playground/search`
-
-**New File**: `backend/services/playground_rate_limiter.py`
-```python
-class PlaygroundRateLimiter:
- def __init__(self, redis_client):
- self.redis = redis_client
- self.daily_limit = 50
-
- def check_and_increment(self, ip: str) -> tuple[bool, dict]:
- """Returns (allowed, headers_dict)"""
- key = f"playground:rate:{ip}"
-
- # Atomic increment
- count = self.redis.incr(key)
- if count == 1:
- self.redis.expire(key, 86400) # 24h TTL
-
- ttl = self.redis.ttl(key)
- reset_time = int(time.time()) + ttl
-
- headers = {
- "X-RateLimit-Limit": str(self.daily_limit),
- "X-RateLimit-Remaining": str(max(0, self.daily_limit - count)),
- "X-RateLimit-Reset": str(reset_time)
- }
-
- if count > self.daily_limit:
- headers["Retry-After"] = str(ttl)
- return False, headers
-
- return True, headers
-```
-
-**Changes to `routes/playground.py`**:
-```python
-from fastapi import Request, Response
-
-@router.post("/search")
-def playground_search(request: Request, response: Response, body: SearchRequest):
- # Get client IP
- ip = request.client.host
- forwarded = request.headers.get("X-Forwarded-For")
- if forwarded:
- ip = forwarded.split(",")[0].strip()
-
- # Check rate limit
- allowed, headers = playground_rate_limiter.check_and_increment(ip)
-
- # Always add headers
- for key, value in headers.items():
- response.headers[key] = value
-
- if not allowed:
- raise HTTPException(
- status_code=429,
- detail={
- "error": "RATE_LIMIT_EXCEEDED",
- "message": "Daily search limit reached. Sign up for unlimited searches!",
- "limit": 50,
- "reset": headers["X-RateLimit-Reset"]
- }
- )
-
- # ... existing search code
-```
-
-### Issue #97: Progressive Signup CTAs
-**Where**: Frontend only
-
-**Implementation**:
-```typescript
-// hooks/usePlaygroundUsage.ts
-const usePlaygroundUsage = () => {
- const [searchCount, setSearchCount] = useState(0);
-
- // Read from response headers after each search
- const trackSearch = (response: Response) => {
- const remaining = response.headers.get('X-RateLimit-Remaining');
- const limit = response.headers.get('X-RateLimit-Limit');
- if (remaining && limit) {
- setSearchCount(parseInt(limit) - parseInt(remaining));
- }
- };
-
- return { searchCount, trackSearch };
-};
-
-// Show CTAs at thresholds
-// 10 searches: Subtle "Want to search YOUR codebase?"
-// 25 searches: More prominent with feature list
-// 40 searches: Final "You clearly love this"
-```
-
----
-
-## 5. Error Response Format
-
-All limit-related errors use `LimitCheckResult.to_dict()`:
-
-```json
-{
- "detail": {
- "allowed": false,
- "current": 3,
- "limit": 3,
- "limit_display": "3",
- "message": "Repository limit reached (3/3). Upgrade to add more repositories.",
- "tier": "free",
- "error_code": "REPO_LIMIT_REACHED"
- }
-}
-```
-
-**Error Codes**:
-| Code | HTTP Status | Description |
-|------|-------------|-------------|
-| `REPO_LIMIT_REACHED` | 403 | Max repos for tier |
-| `REPO_TOO_LARGE` | 400 | File/function count exceeds tier |
-| `RATE_LIMIT_EXCEEDED` | 429 | Playground daily limit |
-| `INVALID_USER` | 400 | Invalid or missing user_id |
-| `SYSTEM_ERROR` | 500 | Database/system failure |
-
----
-
-## 6. Database Schema
-
-### user_profiles (NEW)
-```sql
-CREATE TABLE user_profiles (
- id UUID PRIMARY KEY,
- user_id UUID REFERENCES auth.users(id),
- tier TEXT DEFAULT 'free', -- 'free', 'pro', 'enterprise'
- created_at TIMESTAMPTZ,
- updated_at TIMESTAMPTZ
-);
-```
-
-**Security Notes:**
-- RLS enabled with SELECT/INSERT for authenticated users
-- NO UPDATE policy for users (prevents self-upgrade)
-- Tier updates only via service role key (payment webhooks)
-
-### repositories (existing, no changes needed)
-Already has `user_id` column for ownership.
-
----
-
-## 7. Fail-Safe Behavior
-
-| Scenario | Behavior | Reason |
-|----------|----------|--------|
-| DB down during `check_repo_count` | **DENY** (fail-closed) | Prevent unlimited repos |
-| DB down during `get_usage_summary` | Return defaults | Read-only, safe to fail-open |
-| Redis cache miss | Query DB | Graceful degradation |
-| Redis down | Continue without cache | Non-critical |
-| Invalid user_id | Return FREE limits | Safe default |
-
----
-
-## 8. Redis Keys
-
-| Key Pattern | TTL | Description |
-|-------------|-----|-------------|
-| `playground:rate:{ip}` | 24h | Playground search count |
-| `user:tier:{user_id}` | 5min | Cached user tier |
-
----
-
-## 9. Frontend Integration Points
-
-### Dashboard
-- Show usage bar: "2/3 repositories"
-- Show tier badge: "Free Tier"
-- Upgrade CTA when near limits
-
-### Add Repository Flow
-1. Call `GET /users/limits/check-repo-add`
-2. If `allowed: false`, show upgrade modal
-3. If `allowed: true`, proceed with add
-
-### Playground
-1. Read rate limit headers from search responses
-2. Show remaining searches: "47/50 searches today"
-3. Show progressive CTAs at thresholds
-4. On 429, show signup modal
-
----
-
-## 10. Migration Path
-
-### Existing Users
-All existing users default to `free` tier. Migration auto-creates profile on first API call.
-
-### Existing Repos
-No changes needed. Limit checks only apply to NEW repos.
-
----
-
-## 11. Implementation Order
-
-| Phase | Issue | Priority | Depends On |
-|-------|-------|----------|------------|
-| 1 | #96 User tier system | P0 | - | β
DONE |
-| 2 | #94 Repo size limits | P0 | #96 |
-| 2 | #95 Repo count limits | P0 | #96 |
-| 3 | #93 Playground rate limit | P1 | Redis |
-| 4 | #97 Progressive CTAs | P2 | #93 |
-
----
-
-## 12. Open Questions
-
-1. **Upgrade Flow**: Stripe integration? Manual for now?
-2. **Existing Large Repos**: Grandfather them or enforce limits?
-3. **Team/Org Support**: Future consideration for enterprise?
-4. **API Key Users**: Same limits as JWT users?
-
----
-
-## 13. Files to Create/Modify
-
-### Create
-- [x] `backend/services/user_limits.py`
-- [x] `backend/routes/users.py`
-- [x] `supabase/migrations/001_user_profiles.sql`
-- [ ] `backend/services/playground_rate_limiter.py`
-- [ ] `frontend/src/hooks/usePlaygroundUsage.ts`
-- [ ] `frontend/src/components/PlaygroundCTA.tsx`
-- [ ] `frontend/src/components/UsageBar.tsx`
-
-### Modify
-- [x] `backend/dependencies.py`
-- [x] `backend/main.py`
-- [ ] `backend/routes/repos.py` - Add limit checks
-- [ ] `backend/routes/playground.py` - Add rate limiting
-- [ ] `frontend/src/pages/Dashboard.tsx` - Show usage
-- [ ] `frontend/src/pages/LandingPage.tsx` - Show CTAs
diff --git a/legacy/IndexingProgress.tsx b/legacy/IndexingProgress.tsx
deleted file mode 100644
index 76eebfd..0000000
--- a/legacy/IndexingProgress.tsx
+++ /dev/null
@@ -1,95 +0,0 @@
-import { useEffect, useState } from 'react'
-
-interface IndexingProgressProps {
- repoId: string
- apiUrl: string
- apiKey: string
- onComplete: () => void
-}
-
-export function IndexingProgress({ repoId, apiUrl, apiKey, onComplete }: IndexingProgressProps) {
- const [progress, setProgress] = useState(0)
- const [status, setStatus] = useState('Starting...')
- const [stats, setStats] = useState({ processed: 0, total: 0, functions: 0 })
-
- useEffect(() => {
- let interval: any
-
- const checkProgress = async () => {
- try {
- const response = await fetch(`${apiUrl}/api/repos/${repoId}`, {
- headers: { 'Authorization': `Bearer ${apiKey}` }
- })
- const repo = await response.json()
-
- if (repo.status === 'indexed') {
- setProgress(100)
- setStatus('β
Indexing complete!')
- clearInterval(interval)
- setTimeout(onComplete, 1500)
- } else if (repo.status === 'indexing') {
- // Estimate progress based on function count growth
- const estimatedProgress = Math.min(95, (repo.file_count / 100) * 100)
- setProgress(estimatedProgress)
- setStatus(`π Indexing... ${repo.file_count} functions processed`)
- setStats({
- processed: repo.file_count,
- total: 100,
- functions: repo.file_count
- })
- }
- } catch (error) {
- console.error('Error checking progress:', error)
- }
- }
-
- // Check immediately, then every 2 seconds
- checkProgress()
- interval = setInterval(checkProgress, 2000)
-
- return () => clearInterval(interval)
- }, [repoId])
-
- return (
-
-
-
- Indexing Repository
-
-
-
-
- {status}
- {progress.toFixed(0)}%
-
-
- {/* Progress Bar */}
-
-
- {/* Stats */}
- {stats.functions > 0 && (
-
-
-
Functions Found
-
{stats.functions}
-
-
-
Status
-
Processing...
-
-
- )}
-
-
- Using batch processing for optimal performance
-
-
-
-
- )
-}
diff --git a/legacy/README.md b/legacy/README.md
deleted file mode 100644
index 6bb1569..0000000
--- a/legacy/README.md
+++ /dev/null
@@ -1,23 +0,0 @@
-# Legacy Code Archive
-
-This folder contains old implementations that were replaced during development.
-
-## Files:
-
-### indexer_old.py
-- Original indexer implementation before batch processing optimization
-- Replaced by `indexer_optimized.py` which achieves 100x performance improvement
-- Kept for reference on the evolution from individual API calls to batch processing
-
-### repo_manager_old.py
-- Original repository manager with in-memory storage
-- Replaced by current `repo_manager.py` with Supabase persistence
-- Shows the migration from ephemeral to production-grade storage
-
-### IndexingProgress.tsx
-- Original indexing progress component using WebSocket
-- Replaced by integrated progress in `RepoOverview.tsx` using shadcn Progress component
-- Kept for reference on the WebSocket implementation approach
-
-**Note:** These files are not imported or used anywhere in the active codebase.
-They're preserved for historical reference and to show the development evolution.
diff --git a/legacy/indexer_old.py b/legacy/indexer_old.py
deleted file mode 100644
index 78fb332..0000000
--- a/legacy/indexer_old.py
+++ /dev/null
@@ -1,362 +0,0 @@
-"""
-Code Indexer
-Handles code parsing, embedding generation, and semantic search
-"""
-import os
-from pathlib import Path
-from typing import List, Dict, Optional
-import asyncio
-
-# Tree-sitter for parsing
-import tree_sitter_python as tspython
-import tree_sitter_javascript as tsjavascript
-from tree_sitter import Language, Parser
-
-# AI/ML
-from openai import AsyncOpenAI
-from pinecone import Pinecone, ServerlessSpec
-
-# Utils
-import hashlib
-from dotenv import load_dotenv
-
-# Import cache service
-from services.cache import CacheService
-
-load_dotenv()
-
-
-class CodeIndexer:
- """Index and search code using semantic embeddings"""
-
- def __init__(self):
- # Initialize cache
- self.cache = CacheService()
-
- # Initialize OpenAI
- self.openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
-
- # Initialize Pinecone
- pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
-
- index_name = os.getenv("PINECONE_INDEX_NAME", "codeintel")
-
- # Create index if it doesn't exist
- if index_name not in pc.list_indexes().names():
- print(f"Creating Pinecone index: {index_name}")
- pc.create_index(
- name=index_name,
- dimension=1536, # OpenAI embedding dimension
- metric="cosine",
- spec=ServerlessSpec(
- cloud="aws",
- region="us-east-1"
- )
- )
-
- self.index = pc.Index(index_name)
-
- # Initialize tree-sitter parsers
- self.parsers = {
- 'python': self._create_parser(Language(tspython.language())),
- 'javascript': self._create_parser(Language(tsjavascript.language())),
- 'typescript': self._create_parser(Language(tsjavascript.language())),
- }
-
- print("CodeIndexer initialized!")
-
- def _create_parser(self, language) -> Parser:
- """Create a tree-sitter parser"""
- parser = Parser(language)
- return parser
-
- def _detect_language(self, file_path: str) -> Optional[str]:
- """Detect programming language from file extension"""
- ext = Path(file_path).suffix.lower()
- lang_map = {
- '.py': 'python',
- '.js': 'javascript',
- '.jsx': 'javascript',
- '.ts': 'typescript',
- '.tsx': 'typescript',
- }
- return lang_map.get(ext)
-
- def _discover_code_files(self, repo_path: str) -> List[Path]:
- """Find all code files in repository"""
- repo_path = Path(repo_path)
- code_files = []
-
- # Extensions to index
- extensions = {'.py', '.js', '.jsx', '.ts', '.tsx'}
-
- # Directories to skip
- skip_dirs = {'node_modules', '.git', '__pycache__', 'venv', 'env', 'dist', 'build'}
-
- for file_path in repo_path.rglob('*'):
- # Skip directories
- if file_path.is_dir():
- continue
-
- # Skip if in excluded directory
- if any(skip in file_path.parts for skip in skip_dirs):
- continue
-
- # Check extension
- if file_path.suffix in extensions:
- code_files.append(file_path)
-
- return code_files
-
- async def _create_embedding(self, text: str) -> List[float]:
- """Generate embedding using OpenAI with caching"""
- try:
- # Truncate if too long
- text = text[:8000]
-
- # Check cache first
- cached = self.cache.get_embedding(text)
- if cached:
- return cached
-
- # Generate new embedding
- response = await self.openai_client.embeddings.create(
- model="text-embedding-3-small",
- input=text
- )
- embedding = response.data[0].embedding
-
- # Cache it
- self.cache.set_embedding(text, embedding)
-
- return embedding
- except Exception as e:
- print(f"Error creating embedding: {e}")
- return [0.0] * 1536
-
- def _extract_functions(self, tree_node, source_code: bytes) -> List[Dict]:
- """Extract function/class definitions from AST"""
- functions = []
-
- # Function/class node types
- target_types = {
- 'function_definition',
- 'class_definition',
- 'function_declaration',
- 'method_definition',
- 'arrow_function',
- }
-
- if tree_node.type in target_types:
- # Extract function name
- name_node = None
- for child in tree_node.children:
- if child.type == 'identifier':
- name_node = child
- break
-
- name = source_code[name_node.start_byte:name_node.end_byte].decode('utf-8') if name_node else 'anonymous'
-
- code = source_code[tree_node.start_byte:tree_node.end_byte].decode('utf-8')
-
- functions.append({
- 'name': name,
- 'type': tree_node.type,
- 'code': code,
- 'start_line': tree_node.start_point[0],
- 'end_line': tree_node.end_point[0],
- })
-
- # Recursively search children
- for child in tree_node.children:
- functions.extend(self._extract_functions(child, source_code))
-
- return functions
-
- async def index_repository(self, repo_id: str, repo_path: str):
- """Index all code in a repository"""
- print(f"Indexing repository: {repo_id} at {repo_path}")
-
- # Discover code files
- code_files = self._discover_code_files(repo_path)
- print(f"Found {len(code_files)} code files")
-
- # Process files in batches
- batch_size = 5
- total_functions = 0
-
- for i in range(0, len(code_files), batch_size):
- batch = code_files[i:i + batch_size]
- results = await asyncio.gather(
- *[self._index_file(repo_id, str(file_path)) for file_path in batch],
- return_exceptions=True
- )
-
- for result in results:
- if isinstance(result, int):
- total_functions += result
-
- print(f"Processed {i + len(batch)}/{len(code_files)} files, {total_functions} functions indexed")
-
- print(f"Indexing complete! Total functions: {total_functions}")
- return total_functions
-
- async def _index_file(self, repo_id: str, file_path: str) -> int:
- """Index a single file"""
- try:
- # Detect language
- language = self._detect_language(file_path)
- if not language or language not in self.parsers:
- return 0
-
- # Read file
- with open(file_path, 'rb') as f:
- source_code = f.read()
-
- # Parse with tree-sitter
- tree = self.parsers[language].parse(source_code)
-
- # Extract functions
- functions = self._extract_functions(tree.root_node, source_code)
-
- if not functions:
- return 0
-
- # Generate embeddings and store in Pinecone
- vectors_to_upsert = []
-
- for func in functions:
- # Create text for embedding
- embedding_text = f"Function: {func['name']}\nType: {func['type']}\n\n{func['code']}"
-
- # Generate embedding
- embedding = await self._create_embedding(embedding_text)
-
- # Create unique ID
- func_id = hashlib.md5(f"{repo_id}:{file_path}:{func['start_line']}".encode()).hexdigest()
-
- # Prepare vector
- vectors_to_upsert.append({
- "id": func_id,
- "values": embedding,
- "metadata": {
- "repo_id": repo_id,
- "file_path": file_path,
- "name": func['name'],
- "type": func['type'],
- "code": func['code'][:1000], # Limit code length in metadata
- "start_line": func['start_line'],
- "end_line": func['end_line'],
- "language": language
- }
- })
-
- # Upsert to Pinecone
- if vectors_to_upsert:
- self.index.upsert(vectors=vectors_to_upsert)
-
- return len(functions)
-
- except Exception as e:
- print(f"Error indexing file {file_path}: {e}")
- return 0
-
- async def semantic_search(
- self,
- query: str,
- repo_id: str,
- max_results: int = 10
- ) -> List[Dict]:
- """Search code using semantic similarity with caching"""
- try:
- # Check cache first
- cached_results = self.cache.get_search_results(query, repo_id)
- if cached_results:
- print(f"β
Cache HIT for query: {query[:50]}")
- return cached_results
-
- print(f"β Cache MISS for query: {query[:50]}")
-
- # Generate query embedding (this will use embedding cache)
- query_embedding = await self._create_embedding(query)
-
- # Search Pinecone
- results = self.index.query(
- vector=query_embedding,
- filter={"repo_id": {"$eq": repo_id}},
- top_k=max_results,
- include_metadata=True
- )
-
- # Format results
- formatted_results = []
- for match in results.matches:
- formatted_results.append({
- "code": match.metadata.get("code", ""),
- "file_path": match.metadata.get("file_path", ""),
- "name": match.metadata.get("name", ""),
- "type": match.metadata.get("type", ""),
- "language": match.metadata.get("language", ""),
- "score": float(match.score),
- "line_start": match.metadata.get("start_line", 0),
- "line_end": match.metadata.get("end_line", 0),
- })
-
- # Cache results
- self.cache.set_search_results(query, repo_id, formatted_results)
-
- return formatted_results
-
- except Exception as e:
-
- print(f"Error searching: {e}")
- return []
-
- async def explain_code(
- self,
- repo_id: str,
- file_path: str,
- function_name: Optional[str] = None
- ) -> str:
- """Generate natural language explanation of code using Claude"""
- try:
- # Read the file
- with open(file_path, 'r') as f:
- code_content = f.read()
-
- # If function_name provided, try to find it
- if function_name:
- language = self._detect_language(file_path)
- if language and language in self.parsers:
- tree = self.parsers[language].parse(code_content.encode('utf-8'))
- functions = self._extract_functions(tree.root_node, code_content.encode('utf-8'))
-
- # Find matching function
- for func in functions:
- if func['name'] == function_name:
- code_content = func['code']
- break
-
- # Use OpenAI to explain (we could use Claude API too)
- response = await self.openai_client.chat.completions.create(
- model="gpt-4o-mini", # Cheaper and faster
- messages=[
- {
- "role": "system",
- "content": "You are a helpful code explainer. Explain code clearly and concisely, focusing on what it does, how it works, and any important patterns or techniques used."
- },
- {
- "role": "user",
- "content": f"Explain this code:\n\n```\n{code_content}\n```"
- }
- ],
- max_tokens=1000,
- temperature=0.3
- )
-
- explanation = response.choices[0].message.content
- return explanation
-
- except Exception as e:
- print(f"Error explaining code: {e}")
- return f"Error generating explanation: {str(e)}"
diff --git a/legacy/repo_manager_old.py b/legacy/repo_manager_old.py
deleted file mode 100644
index f2c9301..0000000
--- a/legacy/repo_manager_old.py
+++ /dev/null
@@ -1,125 +0,0 @@
-"""
-Repository Manager
-Handles repository CRUD operations (in-memory for MVP, later DB)
-"""
-import uuid
-from typing import Dict, List, Optional
-import os
-import git
-from pathlib import Path
-
-
-class RepositoryManager:
- """Manage repositories"""
-
- def __init__(self):
- # In-memory storage (Phase 1 MVP)
- # Later: replace with PostgreSQL
- self.repos: Dict[str, dict] = {}
- self.repos_dir = Path("./repos")
- self.repos_dir.mkdir(exist_ok=True)
-
- # Discover existing repositories on startup
- self._discover_existing_repos()
-
- def _discover_existing_repos(self):
- """Scan repos directory and load existing repositories"""
- if not self.repos_dir.exists():
- return
-
- for repo_path in self.repos_dir.iterdir():
- if not repo_path.is_dir() or repo_path.name.startswith('.'):
- continue
-
- try:
- # Try to open as git repo
- repo = git.Repo(repo_path)
-
- # Get repo info from git config
- remote_url = None
- if repo.remotes:
- remote_url = repo.remotes.origin.url
-
- # Extract name from URL or use folder name
- name = remote_url.split('/')[-1].replace('.git', '') if remote_url else repo_path.name
- branch = repo.active_branch.name if not repo.head.is_detached else "main"
-
- # Count code files to estimate if indexed
- code_files = list(repo_path.rglob('*.py')) + list(repo_path.rglob('*.js')) + list(repo_path.rglob('*.ts'))
- file_count = len([f for f in code_files if '.git' not in str(f) and 'node_modules' not in str(f)])
-
- # Add to repos
- self.repos[repo_path.name] = {
- "id": repo_path.name,
- "name": name,
- "git_url": remote_url or "unknown",
- "branch": branch,
- "local_path": str(repo_path),
- "status": "indexed",
- "file_count": file_count * 20,
- "last_indexed_commit": repo.head.commit.hexsha # Track commit!
- }
-
- print(f"β
Discovered existing repo: {name} ({repo_path.name}) - ~{file_count} files")
-
- except Exception as e:
- print(f"β οΈ Skipping {repo_path.name}: {e}")
-
- def list_repos(self) -> List[dict]:
- """List all repositories"""
- return list(self.repos.values())
-
- def get_repo(self, repo_id: str) -> Optional[dict]:
- """Get repository by ID"""
- return self.repos.get(repo_id)
-
- def add_repo(self, name: str, git_url: str, branch: str = "main") -> dict:
- """Add a new repository"""
- repo_id = str(uuid.uuid4())
- local_path = self.repos_dir / repo_id
-
- try:
- # Clone the repository
- print(f"Cloning {git_url} to {local_path}...")
- git.Repo.clone_from(git_url, local_path, branch=branch, depth=1)
-
- repo = {
- "id": repo_id,
- "name": name,
- "git_url": git_url,
- "branch": branch,
- "local_path": str(local_path),
- "status": "cloned",
- "file_count": 0
- }
-
- self.repos[repo_id] = repo
- return repo
-
- except Exception as e:
- # Cleanup on failure
- if local_path.exists():
- import shutil
- shutil.rmtree(local_path)
- raise Exception(f"Failed to clone repository: {str(e)}")
-
- def update_status(self, repo_id: str, status: str):
- """Update repository status"""
- if repo_id in self.repos:
- self.repos[repo_id]["status"] = status
-
- def update_file_count(self, repo_id: str, count: int):
- """Update file count"""
- if repo_id in self.repos:
- self.repos[repo_id]["file_count"] = count
-
- def get_last_indexed_commit(self, repo_id: str) -> str:
- """Get last indexed commit SHA"""
- if repo_id in self.repos:
- return self.repos[repo_id].get("last_indexed_commit", "")
- return ""
-
- def update_last_commit(self, repo_id: str, commit_sha: str):
- """Update last indexed commit"""
- if repo_id in self.repos:
- self.repos[repo_id]["last_indexed_commit"] = commit_sha
From c8f721f379e0e8b0729a87adf2ddd217b2c8e586 Mon Sep 17 00:00:00 2001
From: Devanshu Rajesh Chicholikar
Date: Thu, 8 Jan 2026 17:27:52 -0500
Subject: [PATCH 2/4] chore: reorganize docs - move deployment guides to docs/
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
- Move DEPLOYMENT.md β docs/deployment.md
- Move DOCKER_QUICKSTART.md β docs/docker-quickstart.md
- Move DOCKER_TROUBLESHOOTING.md β docs/docker-troubleshooting.md
- Rename MCP_SETUP.md β docs/mcp-setup.md (consistent naming)
Cleaner root directory, all guides in one place.
Part of #180
---
DEPLOYMENT.md => docs/deployment.md | 0
DOCKER_QUICKSTART.md => docs/docker-quickstart.md | 0
DOCKER_TROUBLESHOOTING.md => docs/docker-troubleshooting.md | 0
docs/{MCP_SETUP.md => mcp-setup.md} | 0
4 files changed, 0 insertions(+), 0 deletions(-)
rename DEPLOYMENT.md => docs/deployment.md (100%)
rename DOCKER_QUICKSTART.md => docs/docker-quickstart.md (100%)
rename DOCKER_TROUBLESHOOTING.md => docs/docker-troubleshooting.md (100%)
rename docs/{MCP_SETUP.md => mcp-setup.md} (100%)
diff --git a/DEPLOYMENT.md b/docs/deployment.md
similarity index 100%
rename from DEPLOYMENT.md
rename to docs/deployment.md
diff --git a/DOCKER_QUICKSTART.md b/docs/docker-quickstart.md
similarity index 100%
rename from DOCKER_QUICKSTART.md
rename to docs/docker-quickstart.md
diff --git a/DOCKER_TROUBLESHOOTING.md b/docs/docker-troubleshooting.md
similarity index 100%
rename from DOCKER_TROUBLESHOOTING.md
rename to docs/docker-troubleshooting.md
diff --git a/docs/MCP_SETUP.md b/docs/mcp-setup.md
similarity index 100%
rename from docs/MCP_SETUP.md
rename to docs/mcp-setup.md
From 48de6b0d606a7b953c25b80a15eec4f889b18a6f Mon Sep 17 00:00:00 2001
From: Devanshu Rajesh Chicholikar
Date: Thu, 8 Jan 2026 17:28:39 -0500
Subject: [PATCH 3/4] chore: add developer config files
- Add .nvmrc (Node 20)
- Add .python-version (Python 3.11)
- Add .editorconfig (consistent code style)
- Add .github/dependabot.yml (automated dependency updates)
Makes contributor setup easier and keeps dependencies fresh.
Part of #180
---
.editorconfig | 21 +++++++++++++++++++++
.github/dependabot.yml | 34 ++++++++++++++++++++++++++++++++++
.nvmrc | 1 +
.python-version | 1 +
4 files changed, 57 insertions(+)
create mode 100644 .editorconfig
create mode 100644 .github/dependabot.yml
create mode 100644 .nvmrc
create mode 100644 .python-version
diff --git a/.editorconfig b/.editorconfig
new file mode 100644
index 0000000..b78aca5
--- /dev/null
+++ b/.editorconfig
@@ -0,0 +1,21 @@
+# EditorConfig helps maintain consistent coding styles
+# https://editorconfig.org
+
+root = true
+
+[*]
+indent_style = space
+indent_size = 2
+end_of_line = lf
+charset = utf-8
+trim_trailing_whitespace = true
+insert_final_newline = true
+
+[*.py]
+indent_size = 4
+
+[*.md]
+trim_trailing_whitespace = false
+
+[Makefile]
+indent_style = tab
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
new file mode 100644
index 0000000..26ad928
--- /dev/null
+++ b/.github/dependabot.yml
@@ -0,0 +1,34 @@
+version: 2
+updates:
+ # Python dependencies
+ - package-ecosystem: "pip"
+ directory: "/backend"
+ schedule:
+ interval: "weekly"
+ commit-message:
+ prefix: "chore(deps)"
+ labels:
+ - "dependencies"
+ - "python"
+
+ # JavaScript dependencies
+ - package-ecosystem: "npm"
+ directory: "/frontend"
+ schedule:
+ interval: "weekly"
+ commit-message:
+ prefix: "chore(deps)"
+ labels:
+ - "dependencies"
+ - "javascript"
+
+ # GitHub Actions
+ - package-ecosystem: "github-actions"
+ directory: "/"
+ schedule:
+ interval: "monthly"
+ commit-message:
+ prefix: "chore(deps)"
+ labels:
+ - "dependencies"
+ - "ci"
diff --git a/.nvmrc b/.nvmrc
new file mode 100644
index 0000000..209e3ef
--- /dev/null
+++ b/.nvmrc
@@ -0,0 +1 @@
+20
diff --git a/.python-version b/.python-version
new file mode 100644
index 0000000..2c07333
--- /dev/null
+++ b/.python-version
@@ -0,0 +1 @@
+3.11
From ca7ed959d2c5159ec3c74dbbe9f0a0f70eeef431 Mon Sep 17 00:00:00 2001
From: Devanshu Rajesh Chicholikar
Date: Thu, 8 Jan 2026 17:33:06 -0500
Subject: [PATCH 4/4] docs: complete README overhaul and fix internal links
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
README.md:
- Rename to OpenCodeIntel (consistent branding)
- Add badges (CI, license, release)
- Add Quick Links navigation
- Remove emoji headers
- Cleaner structure: features β quickstart β docs
- Leave placeholder for logo and demo screenshot
- Concise, no fluff
CONTRIBUTING.md:
- Update repo name to opencodeintel
- Remove emoji
docs/:
- Fix cross-references to renamed files
- Convert to proper markdown links
Part of #180
---
CONTRIBUTING.md | 10 +-
README.md | 308 ++++++++-------------------------
docs/docker-quickstart.md | 8 +-
docs/docker-troubleshooting.md | 2 +-
4 files changed, 86 insertions(+), 242 deletions(-)
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 8f63c02..11f7191 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,13 +1,13 @@
-# Contributing to CodeIntel
+# Contributing to OpenCodeIntel
-First off, thanks for considering contributing! CodeIntel is better because of people like you.
+Thanks for considering contributing! OpenCodeIntel is better because of people like you.
## Quick Start
```bash
# Fork the repo, then clone
-git clone https://github.com/YOUR_USERNAME/codeintel-mcp
-cd codeintel-mcp
+git clone https://github.com/YOUR_USERNAME/opencodeintel
+cd opencodeintel
# Set up backend
cd backend
@@ -138,4 +138,4 @@ Be respectful, constructive, and collaborative. We're all here to build somethin
---
-**Thanks for contributing! π**
+**Thanks for contributing!**
diff --git a/README.md b/README.md
index a160343..a38b5eb 100644
--- a/README.md
+++ b/README.md
@@ -1,154 +1,99 @@
-# CodeIntel MCP
+
+
+
OpenCodeIntel
+
+
+
+ AI-powered semantic code search for your repositories
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Quick Start β’
+ Deployment β’
+ MCP Integration β’
+ Contributing
+
-**MCP server for AI-powered codebase intelligence.** Semantic search, dependency analysis, and impact prediction for your repositories.
-
-## The Problem
-
-AI coding assistants are powerful, but they're flying blind in large codebases:
-- Can't semantically search across thousands of files
-- Don't understand dependency relationships
-- Can't predict what breaks when you change a file
-- Have no context on team coding patterns
-
-## The Solution
-
-CodeIntel is an MCP (Model Context Protocol) server that gives AI agents deep codebase understanding:
-
-```typescript
-// Ask Claude (via MCP):
-"Find authentication middleware in this repo"
-
-// CodeIntel semantically searches 10,000+ functions
-// Returns exact implementations, not keyword matches
-```
-
-**Built for production. Not a demo.**
-
-## Key Features
-
-### π Semantic Code Search
-Search by meaning, not keywords. Find `"error handling logic"` even if functions are named `processFailure()`.
-
-### π Dependency Analysis
-Visualize your entire codebase architecture. See which files are critical, which are isolated, and how everything connects.
+---
-### β‘ Impact Prediction
-Before changing a file, know exactly what breaks:
-```
-src/auth/middleware.py
-ββ 15 files affected (HIGH RISK)
- ββ src/api/routes.py
- ββ src/services/user.py
- ββ ... + 12 more
-```
+
+
-### π¨ Code Style Analysis
-Understand team patterns: naming conventions (camelCase vs snake_case), async adoption %, type hint usage.
+## What is OpenCodeIntel?
-### π Performance That Scales
+OpenCodeIntel gives AI coding assistants deep understanding of your codebase. It's an MCP server that provides semantic code search, dependency analysis, and impact prediction.
-**Batch Processing:** 100x faster indexing
-- Before: 40+ min for 1,000 functions (individual API calls)
-- After: 22.9 sec (batch embedding requests)
+**Search by meaning, not keywords.** Find "error handling logic" even when functions are named `processFailure()`.
-**Incremental Indexing:** 700x faster re-indexing
-- Full re-index: 51.4s
-- Incremental (git diff): 0.07s
-- Perfect for active development
+## Features
-**Supabase Caching:** 5x search speedup
-- Cold search: 800ms
-- Cached: 150ms
+- **Semantic Search** - Vector-based code search that understands intent
+- **Dependency Graph** - Visualize how your codebase connects
+- **Impact Analysis** - Know what breaks before you change a file
+- **Code Style Analysis** - Understand team patterns and conventions
+- **MCP Integration** - Works directly with Claude Desktop
## Quick Start
-### π³ Docker (Recommended)
-
-**Fastest way to get started:**
+### Using Docker (Recommended)
```bash
-# 1. Clone repo
git clone https://github.com/OpenCodeIntel/opencodeintel.git
cd opencodeintel
-# 2. Configure environment
cp .env.example .env
-# Edit .env with your API keys
+# Add your API keys to .env
-# 3. Start everything
docker compose up -d
-
-# Frontend: http://localhost:3000
-# Backend: http://localhost:8000
-# Docs: http://localhost:8000/docs
```
-**Full guide:** [`DOCKER_QUICKSTART.md`](./DOCKER_QUICKSTART.md)
-**Troubleshooting:** [`DOCKER_TROUBLESHOOTING.md`](./DOCKER_TROUBLESHOOTING.md)
+- Frontend: http://localhost:3000
+- Backend: http://localhost:8000
+- API Docs: http://localhost:8000/docs
----
-
-### π¦ Manual Setup
+### Manual Setup
-### Prerequisites
-- Python 3.11+
-- Node.js 20+
-- OpenAI API key
-- Pinecone account
-- Supabase project
-
-### 1. Clone & Setup Backend
+**Requirements:** Python 3.11+, Node.js 20+
```bash
+# Backend
cd backend
python -m venv venv
-source venv/bin/activate # Windows: venv\Scripts\activate
+source venv/bin/activate
pip install -r requirements.txt
-
-# Configure .env
cp .env.example .env
-# Add your API keys to .env
-```
-
-### 2. Run Backend
-
-```bash
python main.py
-# Server runs on http://localhost:8000
-```
-### 3. Setup Frontend
-
-```bash
+# Frontend (new terminal)
cd frontend
npm install
npm run dev
-# UI at http://localhost:5173
-```
-
-### 4. Add a Repository
-
-```bash
-# Via API
-curl -X POST http://localhost:8000/api/repos \
- -H "Authorization: Bearer dev-secret-key" \
- -H "Content-Type: application/json" \
- -d '{"name": "zustand", "git_url": "https://github.com/pmndrs/zustand"}'
-
-# Or use the web UI
```
## MCP Integration
-CodeIntel works as an MCP server with Claude Desktop. **[π Full MCP Setup Guide](./docs/MCP_SETUP.md)**
+Connect OpenCodeIntel to Claude Desktop for AI-powered code assistance.
-**Quick Setup:**
+Add to your Claude Desktop config:
```json
-// Add to Claude Desktop config
{
"mcpServers": {
- "codeintel": {
+ "opencodeintel": {
"command": "python",
"args": ["/path/to/opencodeintel/mcp-server/server.py"],
"env": {
@@ -160,142 +105,41 @@ CodeIntel works as an MCP server with Claude Desktop. **[π Full MCP Setup Gui
}
```
-**Available MCP Tools:**
-| Tool | Description |
-|------|-------------|
-| `search_code` | Semantic code search - finds code by meaning |
-| `list_repositories` | View all indexed repos |
-| `get_dependency_graph` | Visualize architecture and file connections |
-| `analyze_code_style` | Team conventions and patterns |
-| `analyze_impact` | Know what breaks before you change it |
-| `get_repository_insights` | High-level codebase overview |
-
-Now ask Claude: *"What's the authentication logic in the user service?"* and it searches your actual codebase.
+**Available tools:** `search_code`, `list_repositories`, `get_dependency_graph`, `analyze_code_style`, `analyze_impact`, `get_repository_insights`
-**[β Complete setup guide with troubleshooting](./docs/MCP_SETUP.md)**
+See [MCP Setup Guide](./docs/mcp-setup.md) for detailed instructions.
## Architecture
```
-βββββββββββββββ
-β Frontend β React + TypeScript + Tailwind
-β (Vite app) β Dependency graphs, search UI
-ββββββββ¬βββββββ
- β
-ββββββββΌβββββββ
-β FastAPI β Python backend
-β Backend β /api/search, /api/repos/{id}/dependencies
-ββββββββ¬βββββββ
- β
- βββββββΊ Pinecone (vector search)
- βββββββΊ OpenAI (embeddings)
- βββββββΊ Supabase (persistence)
- βββββββΊ Redis (caching)
-```
-
-**Tech Stack:**
-- **Backend:** FastAPI, tree-sitter (AST parsing), OpenAI embeddings
-- **Vector DB:** Pinecone for semantic search
-- **Database:** Supabase (PostgreSQL) for metadata + caching
-- **Cache:** Redis for 5x search speedup
-- **Frontend:** React, TypeScript, Tailwind CSS, shadcn/ui, ReactFlow
-
-## Performance Benchmarks
-
-Real numbers from indexing the Zustand repository (1,174 functions):
-
-| Metric | Value |
-|--------|-------|
-| Full indexing | 29.5s (39.7 functions/sec) |
-| Incremental re-index | 0.07s (700x faster) |
-| Batch embedding | 22.9s for 1,174 functions |
-| Search (cold) | 800ms |
-| Search (cached) | 150ms |
-
-## Use Cases
-
-**For AI Agents (via MCP):**
-- Semantic code search during pair programming
-- Understanding unfamiliar codebases
-- Finding implementation patterns
-- Impact analysis before refactoring
-
-**For Development Teams:**
-- Onboarding new engineers (visualize architecture)
-- Code review prep (see change blast radius)
-- Tech debt identification (find highly coupled files)
-- Pattern enforcement (analyze style consistency)
-
-## What Makes This Different
-
-**Most code search tools:** Keyword matching (grep, GitHub search)
-**CodeIntel:** Understands *meaning* - finds `error handling` even if the function is called `processFailure()`
-
-**Most dependency tools:** Static analysis only
-**CodeIntel:** Combines AST parsing + semantic understanding + impact prediction
-
-**Most demos:** In-memory, doesn't scale
-**CodeIntel:** Production-grade with Supabase persistence, Redis caching, incremental indexing
-
-## Deployment
-
-### π³ Local Development (Docker)
-```bash
-# Start all services
-make dev
-
-# Or using docker compose
-docker compose -f docker-compose.dev.yml up -d
-
-# Services available at:
-# - Backend: http://localhost:8000
-# - Frontend: http://localhost:3000
-# - API Docs: http://localhost:8000/docs
-```
-
-### βοΈ Production Deployment
-
-**Backend + Redis β Railway**
-```bash
-# Automated deployment
-./scripts/deploy-railway.sh
-
-# Or manually:
-railway login
-railway init
-railway up
+Frontend (React + TypeScript)
+ β
+Backend (FastAPI + Python)
+ β
+ββββββ΄βββββ¬βββββββββββββ
+Pinecone Supabase Redis
+(vectors) (database) (cache)
```
-**Frontend β Vercel**
-```bash
-# Automated deployment
-./scripts/deploy-vercel.sh
+**Stack:** FastAPI, React, TypeScript, Pinecone, Supabase, Redis, tree-sitter
-# Or manually:
-cd frontend
-vercel --prod
-```
+## Documentation
-**π Full deployment guide:** See [DEPLOYMENT.md](DEPLOYMENT.md) for complete instructions, environment variables, and troubleshooting.
+| Guide | Description |
+|-------|-------------|
+| [Docker Quickstart](./docs/docker-quickstart.md) | Get running in 5 minutes |
+| [Deployment](./docs/deployment.md) | Production deployment guide |
+| [MCP Setup](./docs/mcp-setup.md) | Claude Desktop integration |
+| [Docker Troubleshooting](./docs/docker-troubleshooting.md) | Common issues and fixes |
## Contributing
-Built in a focused 2-week sprint to demonstrate production-grade AI development tooling.
+We welcome contributions! See [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.
-Contributions welcome! Areas for improvement:
-- Support for more languages (currently: Python, JS/TS)
-- Advanced graph algorithms (find circular dependencies, suggest refactorings)
-- GitHub integration (PR impact analysis)
-- Team analytics (who writes what patterns)
+**Quick links:**
+- [Open Issues](https://github.com/OpenCodeIntel/opencodeintel/issues)
+- [Good First Issues](https://github.com/OpenCodeIntel/opencodeintel/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22)
## License
-MIT License - use it, fork it, build on it.
-
-## Built With
-
-Commitment to shipping production-grade AI tools. Not a side project. Not a demo. Real infrastructure that scales.
-
----
-
-**Questions?** Open an issue or reach out.
+MIT License - see [LICENSE](./LICENSE) for details.
diff --git a/docs/docker-quickstart.md b/docs/docker-quickstart.md
index f0b113c..b888b83 100644
--- a/docs/docker-quickstart.md
+++ b/docs/docker-quickstart.md
@@ -116,7 +116,7 @@ lsof -i :8000
**Issue:** Environment variables not found
**Fix:** Make sure `.env` exists in project root (not just backend/)
-**Full troubleshooting guide:** See `DOCKER_TROUBLESHOOTING.md`
+**Full troubleshooting guide:** See [docker-troubleshooting.md](./docker-troubleshooting.md)
## Development Mode
@@ -131,7 +131,7 @@ docker compose -f docker-compose.dev.yml up
## Next Steps
-- π Read full deployment guide: `DEPLOYMENT.md`
+- Read full deployment guide: [deployment.md](./deployment.md)
- π Deploy to Railway: `./scripts/deploy-railway.sh`
- π Deploy to Vercel: `./scripts/deploy-vercel.sh`
- π§ͺ Run tests: See `backend/README.md`
@@ -186,11 +186,11 @@ Once local dev works, deploy to production:
./scripts/deploy-vercel.sh
```
-Full deployment guide: `DEPLOYMENT.md`
+Full deployment guide: [deployment.md](./deployment.md)
---
**Need help?**
-- π Check `DOCKER_TROUBLESHOOTING.md`
+- Check [docker-troubleshooting.md](./docker-troubleshooting.md)
- π Open an issue: https://github.com/OpenCodeIntel/opencodeintel/issues
- π See full docs: `README.md`
diff --git a/docs/docker-troubleshooting.md b/docs/docker-troubleshooting.md
index 2e922ee..62f3806 100644
--- a/docs/docker-troubleshooting.md
+++ b/docs/docker-troubleshooting.md
@@ -278,6 +278,6 @@ docker compose exec backend curl http://backend:8000/health
1. Check GitHub Issues: https://github.com/OpenCodeIntel/opencodeintel/issues
2. Run verification script: `./scripts/verify-setup.sh`
-3. Check DEPLOYMENT.md for step-by-step instructions
+3. Check [deployment.md](./deployment.md) for step-by-step instructions
4. Make sure Docker Desktop has enough resources (Settings β Resources)
- Recommended: 4GB RAM, 2 CPUs minimum