fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119) by DevanshuNEU · Pull Request #293 · OpenCodeIntel/opencodeintel

DevanshuNEU · 2026-03-11T04:20:37Z

What

When users select specific directories via include_paths (e.g. 2 packages out of 1767 files in a monorepo), the Overview and Code Style pages showed counts for the full repo instead of the indexed subset.

Root Cause

dependency_analyzer already respected include_paths (PR #280). But style_analyzer and dna_extractor did not -- they scanned the full cloned repo.

Changes

style_analyzer.analyze_repository_style() now accepts include_paths
dna_extractor.extract_dna() and _discover_files() now accept include_paths
analysis.py routes pass repo.get('include_paths') to both
include_paths sanitization (corrupt jsonb guard) in both services
Path.parts filtering matches dependency_analyzer pattern (PR fix: TypeScript dependency analysis -- proper parser, .js resolution, include_paths (OPE-120) #280)

Testing

12/12 test_style_analyzer tests pass
25/25 test_analyze_repo tests pass
Flake8 clean
Optional param with default None -- no breaking changes

Closes OPE-119

Summary by CodeRabbit

New Features
- Users can now specify included paths when analyzing repositories, enabling analysis of specific subdirectories instead of the entire repository.
- Added automatic file filtering based on supported file types and specified paths.
- Implemented path validation and sanitization for secure file filtering.

…t indexing When users select specific directories via include_paths (e.g. 2 packages out of 1767 files), the Overview and Code Style pages showed counts for the full repo instead of the indexed subset. Root cause: dependency_analyzer respected include_paths but style_analyzer and dna_extractor did not. Changes: - style_analyzer.analyze_repository_style() now accepts include_paths - dna_extractor.extract_dna() now accepts include_paths - dna_extractor._discover_files() now accepts include_paths - analysis.py routes pass repo.get('include_paths') to both analyzers - include_paths sanitization (corrupt jsonb guard) in both services - Path.parts filtering matches dependency_analyzer pattern (PR OpenCodeIntel#280) Closes OPE-119

vercel · 2026-03-11T04:20:41Z

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-11T04:20:56Z

📝 Walkthrough

Walkthrough

Introduces support for an optional include_paths parameter across the analysis pipeline to restrict code analysis to specified directories. The parameter is propagated from repository configuration through routes into DNA extraction and style analysis services, with sanitization and file filtering logic implemented in each service.

Changes

Cohort / File(s)	Summary
Route Configuration Propagation `backend/routes/analysis.py`	Updated analysis route handlers to pass `include_paths` from repository config to `analyze_repository_style()` and `extract_dna()` function calls.
DNA Extraction Service `backend/services/dna_extractor.py`	Added `include_paths` parameter to `extract_dna()` and `_discover_files()` methods. Implemented path sanitization (removes unsafe characters, prevents directory traversal, normalizes slashes) and filtering logic to restrict file discovery to specified include paths.
Style Analysis Service `backend/services/style_analyzer.py`	Added `include_paths` parameter to `analyze_repository_style()` with sanitization and validation. Implemented file filtering to restrict analysis to supported extensions and files matching specified include paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

#280: Implements the same include_paths-based file discovery filtering pattern in dependency_analyzer service.
#283: Modifies analysis.py to propagate repository include_paths configuration into downstream analysis functions.

Poem

🐰 Through directories vast, a filter we weave,
Paths sanitized, safe from deceit,
DNA and style now respect thy bounds,
Selective analysis, precise and neat! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and clearly summarizes the main change: enabling style_analyzer and dna_extractor to respect include_paths for subset indexing, which is the core fix implemented across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DevanshuNEU · 2026-03-11T04:21:29Z

@CodeRabbit review

coderabbitai · 2026-03-11T04:21:40Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (1)

backend/services/dna_extractor.py (1)
947-957: Sanitization logic is correct but duplicated with style_analyzer.py.

The defensive sanitization handles corrupt JSONB values appropriately (type filtering, backslash normalization, traversal prevention). However, this block is identical to style_analyzer.py lines 149-159.

Consider extracting a shared utility in backend/utils/ if path sanitization is needed in additional locations in the future.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/dna_extractor.py` around lines 947 - 957, The include_paths
sanitization block in dna_extractor.py duplicates logic found in
style_analyzer.py; extract this into a shared utility function (e.g.,
sanitize_include_paths or normalize_include_paths) placed under backend/utils
(module name like backend.utils.path_utils) that accepts include_paths and
returns the cleaned list or None; update dna_extractor.py (the include_paths
handling) and style_analyzer.py to import and call that utility instead of
duplicating the loop, preserving the exact checks (type is str, normalize
backslashes to '/', strip slashes, skip empty entries and any with '..'
segments) and behavior of returning cleaned or None.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@backend/services/dna_extractor.py`:
- Around line 947-957: The include_paths sanitization block in dna_extractor.py
duplicates logic found in style_analyzer.py; extract this into a shared utility
function (e.g., sanitize_include_paths or normalize_include_paths) placed under
backend/utils (module name like backend.utils.path_utils) that accepts
include_paths and returns the cleaned list or None; update dna_extractor.py (the
include_paths handling) and style_analyzer.py to import and call that utility
instead of duplicating the loop, preserving the exact checks (type is str,
normalize backslashes to '/', strip slashes, skip empty entries and any with
'..' segments) and behavior of returning cleaned or None.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f1eb1ba0-8d7b-469f-8d2f-372af59d6f94

📥 Commits

Reviewing files that changed from the base of the PR and between 09513de and ccafa4a.

📒 Files selected for processing (3)

backend/routes/analysis.py
backend/services/dna_extractor.py
backend/services/style_analyzer.py

vercel · 2026-03-11T04:37:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
opencodeintel	Ignored	Preview	Mar 11, 2026 4:37am

Railway changed how startCommand is evaluated; the $PORT variable was being passed as the literal string instead of being shell-expanded, causing uvicorn to crash on every boot and the /health probe to time out across 11 attempts. The Dockerfile's built-in CMD already binds to the EXPOSE'd port with --proxy-headers, so removing the override restores boot. Same railway.json shipped fine for PR #293 two months ago, and no runtime code, Dockerfile, or requirements changed between #293 and the failing #302 deploy (only docs touched). Root cause is a Railway platform behavior change. Hotfix: skipped /oci-design gate (Phase 1F warn) because prod is fully down. Backfilling an ADR or dogfood finding after recovery.

Railway assigns a dynamic $PORT and runs its healthcheck against it. The Dockerfile hardcoded --port 8000, so every deploy after healthcheckPath was added to railway.json failed with "service unavailable" on /health. #316 and #318 have both been stuck for days; prod only survived on the pre-healthcheck #293 image until that replica was knocked out. Bind ${PORT:-8000} (fallback for local/compose) and make the internal healthcheck read the same port.

coderabbitai Bot reviewed Mar 11, 2026

View reviewed changes

DevanshuNEU merged commit add41ac into OpenCodeIntel:main Mar 11, 2026
8 checks passed

DevanshuNEU mentioned this pull request May 15, 2026

fix(deploy): remove startCommand from railway.json to restore prod #310

Merged

5 tasks

DevanshuNEU mentioned this pull request Jun 11, 2026

fix: bind uvicorn to Railway's dynamic $PORT (prod deploy healthcheck failure) #319

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119)#293

fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119)#293
DevanshuNEU merged 1 commit into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/include-paths-style-dna-analyzers

DevanshuNEU commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 11, 2026

Uh oh!

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

DevanshuNEU commented Mar 11, 2026

Uh oh!

coderabbitai Bot commented Mar 11, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

vercel Bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DevanshuNEU commented Mar 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Root Cause

Changes

Testing

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 11, 2026

Uh oh!

coderabbitai Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

DevanshuNEU commented Mar 11, 2026

Uh oh!

coderabbitai Bot commented Mar 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

vercel Bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevanshuNEU commented Mar 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 11, 2026 •

edited

Loading

vercel Bot commented Mar 11, 2026 •

edited

Loading