feat(dna): Add CodeDNA extractor for codebase pattern analysis#205
Merged
DevanshuNEU merged 8 commits intoJan 12, 2026
Merged
Conversation
- Add DNAExtractor service that extracts architectural patterns
- Extract auth patterns (middleware, decorators, ownership checks)
- Extract service patterns (singletons, dependencies.py)
- Extract database patterns (UUID, TIMESTAMPTZ, RLS, cascades)
- Extract error handling and logging patterns
- Extract naming conventions
- Add /repos/{repo_id}/dna endpoint with json/markdown format
- Add get_codebase_dna MCP tool for AI assistants
- Cache DNA in repository_insights table
- Fixed save_to_cache to use existing table schema
- Fixed load_from_cache to read from architecture_patterns JSONB
- DNA stored as {codebase_dna: {...}} inside architecture_patterns
- Add detected_framework field to CodebaseDNA - Add _detect_framework() for FastAPI/Starlette/Flask/Django/Express/Next/Nest - Add _extract_middleware_patterns() for framework-specific middleware detection - Improve _extract_auth_patterns() with Starlette/Flask/Django patterns - Add middleware_patterns field to output - Update to_markdown() with framework and middleware sections
- Add TestPattern and ConfigPattern dataclasses - Add Django ORM, SQLAlchemy, Prisma, Tortoise ORM detection - Add Django + DRF framework detection - Add aiohttp, tornado framework detection - Add Django middleware patterns (MIDDLEWARE, MiddlewareMixin, hooks) - Add DRF permission_classes and authentication_classes detection - Add test framework detection (pytest, unittest, django.test) - Add mock library detection (unittest.mock, responses, pytest-mock) - Add config pattern detection (dotenv, environs, django-environ, pydantic) - Add secrets handling detection (AWS Secrets Manager, Vault, env vars) - Update to_markdown() with test and config sections
- Add file content cache to avoid re-reading files - Add MAX_FILE_SIZE (1MB) and MAX_FILES (5000) limits - Add _safe_read_file() with encoding fallbacks (utf-8, latin-1, cp1252) - Add binary file detection (null bytes check) - Add symlink handling in _discover_files() - Add path validation in extract_dna() - Add performance stats logging (files_read, skipped, errors, duration) - Add .venv and site-packages to SKIP_DIRS
- Add deduplication for auth_decorators list - Add logging.getLogger() pattern detection - Add structlog detection - Improve log level detection (.info, .debug, etc) - Use _safe_read_file in logging pattern extraction
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a CodeDNA extractor that analyzes codebases to extract architectural patterns, conventions, and constraints. This helps AI assistants understand how to write code consistent with existing patterns.
Changes
New Files
backend/services/dna_extractor.py- Core DNA extraction serviceModified Files
backend/routes/analysis.py- Added/repos/{repo_id}/dnaendpointbackend/dependencies.py- Added dna_extractor singletonmcp-server/server.py- Addedget_codebase_dnaMCP toolFeatures
Pattern Detection
Robustness
Output Formats
API
MCP Tool
Returns formatted DNA profile that AI assistants can use before generating code.
Known Limitations
These will be addressed in follow-up PRs based on real usage feedback.