Skip to content

fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119)#293

Merged
DevanshuNEU merged 1 commit into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/include-paths-style-dna-analyzers
Mar 11, 2026
Merged

fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119)#293
DevanshuNEU merged 1 commit into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/include-paths-style-dna-analyzers

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Mar 11, 2026

Copy link
Copy Markdown
Collaborator

What

When users select specific directories via include_paths (e.g. 2 packages out of 1767 files in a monorepo), the Overview and Code Style pages showed counts for the full repo instead of the indexed subset.

Root Cause

dependency_analyzer already respected include_paths (PR #280). But style_analyzer and dna_extractor did not -- they scanned the full cloned repo.

Changes

Testing

  • 12/12 test_style_analyzer tests pass
  • 25/25 test_analyze_repo tests pass
  • Flake8 clean
  • Optional param with default None -- no breaking changes

Closes OPE-119

Summary by CodeRabbit

  • New Features
    • Users can now specify included paths when analyzing repositories, enabling analysis of specific subdirectories instead of the entire repository.
    • Added automatic file filtering based on supported file types and specified paths.
    • Implemented path validation and sanitization for secure file filtering.

…t indexing

When users select specific directories via include_paths (e.g. 2 packages
out of 1767 files), the Overview and Code Style pages showed counts for
the full repo instead of the indexed subset. Root cause: dependency_analyzer
respected include_paths but style_analyzer and dna_extractor did not.

Changes:
- style_analyzer.analyze_repository_style() now accepts include_paths
- dna_extractor.extract_dna() now accepts include_paths
- dna_extractor._discover_files() now accepts include_paths
- analysis.py routes pass repo.get('include_paths') to both analyzers
- include_paths sanitization (corrupt jsonb guard) in both services
- Path.parts filtering matches dependency_analyzer pattern (PR OpenCodeIntel#280)

Closes OPE-119
@vercel

vercel Bot commented Mar 11, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Mar 11, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Introduces support for an optional include_paths parameter across the analysis pipeline to restrict code analysis to specified directories. The parameter is propagated from repository configuration through routes into DNA extraction and style analysis services, with sanitization and file filtering logic implemented in each service.

Changes

Cohort / File(s) Summary
Route Configuration Propagation
backend/routes/analysis.py
Updated analysis route handlers to pass include_paths from repository config to analyze_repository_style() and extract_dna() function calls.
DNA Extraction Service
backend/services/dna_extractor.py
Added include_paths parameter to extract_dna() and _discover_files() methods. Implemented path sanitization (removes unsafe characters, prevents directory traversal, normalizes slashes) and filtering logic to restrict file discovery to specified include paths.
Style Analysis Service
backend/services/style_analyzer.py
Added include_paths parameter to analyze_repository_style() with sanitization and validation. Implemented file filtering to restrict analysis to supported extensions and files matching specified include paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • #280: Implements the same include_paths-based file discovery filtering pattern in dependency_analyzer service.
  • #283: Modifies analysis.py to propagate repository include_paths configuration into downstream analysis functions.

Poem

🐰 Through directories vast, a filter we weave,
Paths sanitized, safe from deceit,
DNA and style now respect thy bounds,
Selective analysis, precise and neat! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: enabling style_analyzer and dna_extractor to respect include_paths for subset indexing, which is the core fix implemented across all modified files.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DevanshuNEU

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Mar 11, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
backend/services/dna_extractor.py (1)

947-957: Sanitization logic is correct but duplicated with style_analyzer.py.

The defensive sanitization handles corrupt JSONB values appropriately (type filtering, backslash normalization, traversal prevention). However, this block is identical to style_analyzer.py lines 149-159.

Consider extracting a shared utility in backend/utils/ if path sanitization is needed in additional locations in the future.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/dna_extractor.py` around lines 947 - 957, The include_paths
sanitization block in dna_extractor.py duplicates logic found in
style_analyzer.py; extract this into a shared utility function (e.g.,
sanitize_include_paths or normalize_include_paths) placed under backend/utils
(module name like backend.utils.path_utils) that accepts include_paths and
returns the cleaned list or None; update dna_extractor.py (the include_paths
handling) and style_analyzer.py to import and call that utility instead of
duplicating the loop, preserving the exact checks (type is str, normalize
backslashes to '/', strip slashes, skip empty entries and any with '..'
segments) and behavior of returning cleaned or None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/services/dna_extractor.py`:
- Around line 947-957: The include_paths sanitization block in dna_extractor.py
duplicates logic found in style_analyzer.py; extract this into a shared utility
function (e.g., sanitize_include_paths or normalize_include_paths) placed under
backend/utils (module name like backend.utils.path_utils) that accepts
include_paths and returns the cleaned list or None; update dna_extractor.py (the
include_paths handling) and style_analyzer.py to import and call that utility
instead of duplicating the loop, preserving the exact checks (type is str,
normalize backslashes to '/', strip slashes, skip empty entries and any with
'..' segments) and behavior of returning cleaned or None.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f1eb1ba0-8d7b-469f-8d2f-372af59d6f94

📥 Commits

Reviewing files that changed from the base of the PR and between 09513de and ccafa4a.

📒 Files selected for processing (3)
  • backend/routes/analysis.py
  • backend/services/dna_extractor.py
  • backend/services/style_analyzer.py

@vercel

vercel Bot commented Mar 11, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Mar 11, 2026 4:37am

@DevanshuNEU DevanshuNEU merged commit add41ac into OpenCodeIntel:main Mar 11, 2026
8 checks passed
DevanshuNEU added a commit that referenced this pull request May 15, 2026
Railway changed how startCommand is evaluated; the $PORT variable was
being passed as the literal string instead of being shell-expanded,
causing uvicorn to crash on every boot and the /health probe to time
out across 11 attempts. The Dockerfile's built-in CMD already binds
to the EXPOSE'd port with --proxy-headers, so removing the override
restores boot.

Same railway.json shipped fine for PR #293 two months ago, and no
runtime code, Dockerfile, or requirements changed between #293 and
the failing #302 deploy (only docs touched). Root cause is a Railway
platform behavior change.

Hotfix: skipped /oci-design gate (Phase 1F warn) because prod is
fully down. Backfilling an ADR or dogfood finding after recovery.
DevanshuNEU added a commit that referenced this pull request Jun 11, 2026
Railway assigns a dynamic $PORT and runs its healthcheck against it. The
Dockerfile hardcoded --port 8000, so every deploy after healthcheckPath was
added to railway.json failed with "service unavailable" on /health. #316 and
#318 have both been stuck for days; prod only survived on the pre-healthcheck
#293 image until that replica was knocked out. Bind ${PORT:-8000} (fallback for
local/compose) and make the internal healthcheck read the same port.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant