fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119)#293
Conversation
…t indexing
When users select specific directories via include_paths (e.g. 2 packages
out of 1767 files), the Overview and Code Style pages showed counts for
the full repo instead of the indexed subset. Root cause: dependency_analyzer
respected include_paths but style_analyzer and dna_extractor did not.
Changes:
- style_analyzer.analyze_repository_style() now accepts include_paths
- dna_extractor.extract_dna() now accepts include_paths
- dna_extractor._discover_files() now accepts include_paths
- analysis.py routes pass repo.get('include_paths') to both analyzers
- include_paths sanitization (corrupt jsonb guard) in both services
- Path.parts filtering matches dependency_analyzer pattern (PR OpenCodeIntel#280)
Closes OPE-119
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughIntroduces support for an optional Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/services/dna_extractor.py (1)
947-957: Sanitization logic is correct but duplicated withstyle_analyzer.py.The defensive sanitization handles corrupt JSONB values appropriately (type filtering, backslash normalization, traversal prevention). However, this block is identical to
style_analyzer.pylines 149-159.Consider extracting a shared utility in
backend/utils/if path sanitization is needed in additional locations in the future.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/services/dna_extractor.py` around lines 947 - 957, The include_paths sanitization block in dna_extractor.py duplicates logic found in style_analyzer.py; extract this into a shared utility function (e.g., sanitize_include_paths or normalize_include_paths) placed under backend/utils (module name like backend.utils.path_utils) that accepts include_paths and returns the cleaned list or None; update dna_extractor.py (the include_paths handling) and style_analyzer.py to import and call that utility instead of duplicating the loop, preserving the exact checks (type is str, normalize backslashes to '/', strip slashes, skip empty entries and any with '..' segments) and behavior of returning cleaned or None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/services/dna_extractor.py`:
- Around line 947-957: The include_paths sanitization block in dna_extractor.py
duplicates logic found in style_analyzer.py; extract this into a shared utility
function (e.g., sanitize_include_paths or normalize_include_paths) placed under
backend/utils (module name like backend.utils.path_utils) that accepts
include_paths and returns the cleaned list or None; update dna_extractor.py (the
include_paths handling) and style_analyzer.py to import and call that utility
instead of duplicating the loop, preserving the exact checks (type is str,
normalize backslashes to '/', strip slashes, skip empty entries and any with
'..' segments) and behavior of returning cleaned or None.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f1eb1ba0-8d7b-469f-8d2f-372af59d6f94
📒 Files selected for processing (3)
backend/routes/analysis.pybackend/services/dna_extractor.pybackend/services/style_analyzer.py
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Railway changed how startCommand is evaluated; the $PORT variable was being passed as the literal string instead of being shell-expanded, causing uvicorn to crash on every boot and the /health probe to time out across 11 attempts. The Dockerfile's built-in CMD already binds to the EXPOSE'd port with --proxy-headers, so removing the override restores boot. Same railway.json shipped fine for PR #293 two months ago, and no runtime code, Dockerfile, or requirements changed between #293 and the failing #302 deploy (only docs touched). Root cause is a Railway platform behavior change. Hotfix: skipped /oci-design gate (Phase 1F warn) because prod is fully down. Backfilling an ADR or dogfood finding after recovery.
Railway assigns a dynamic $PORT and runs its healthcheck against it. The Dockerfile hardcoded --port 8000, so every deploy after healthcheckPath was added to railway.json failed with "service unavailable" on /health. #316 and #318 have both been stuck for days; prod only survived on the pre-healthcheck #293 image until that replica was knocked out. Bind ${PORT:-8000} (fallback for local/compose) and make the internal healthcheck read the same port.
What
When users select specific directories via
include_paths(e.g. 2 packages out of 1767 files in a monorepo), the Overview and Code Style pages showed counts for the full repo instead of the indexed subset.Root Cause
dependency_analyzeralready respectedinclude_paths(PR #280). Butstyle_analyzeranddna_extractordid not -- they scanned the full cloned repo.Changes
style_analyzer.analyze_repository_style()now acceptsinclude_pathsdna_extractor.extract_dna()and_discover_files()now acceptinclude_pathsanalysis.pyroutes passrepo.get('include_paths')to bothTesting
Closes OPE-119
Summary by CodeRabbit