fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162) by DevanshuNEU · Pull Request #283 · OpenCodeIntel/opencodeintel

DevanshuNEU · 2026-03-06T23:18:31Z

Bug

User indexes packages/sql (19 files) + packages/vitest (7 files) = 26 files.
Dependency graph shows 1000 nodes because it scans the entire Effect-TS clone (1,767 files).

Root cause

include_paths was passed during indexing but never saved to the repositories table.
Analysis routes called build_dependency_graph(repo['local_path']) without include_paths.

Fix

repos.py: Save include_paths to DB when indexing starts
analysis.py: All 3 routes pass repo.get('include_paths') to build_dependency_graph
analysis.py: Added force=true query param to bypass stale cache

Requires

include_paths column already added to repositories table (done by Dev in Supabase)
PR fix: TypeScript dependency analysis -- proper parser, .js resolution, include_paths (OPE-120) #280 merged (build_dependency_graph accepts include_paths param)

After merge

Existing repos need re-index to populate include_paths in DB.
New indexes automatically save include_paths.
Dependency graph for subset repos shows only the subset's files.

2 files, +16 -9 lines. 387 tests pass.

Closes OPE-162

Summary by CodeRabbit

New Features
- Optional force-refresh for dependency graphs to rebuild immediately instead of using cached data.
- Repository path configuration (include paths) now persists in metadata so selections are retained.
Bug Fixes
- Input sanitization for persisted path settings to avoid invalid values affecting analysis.
Tests
- Added test coverage for path-filtering, empty/invalid-path fallbacks, and full-scan behavior.
Chore
- Expanded diagnostic logging across dependency analysis endpoints.

…-162) Root cause of 1000-node hairball: user indexes packages/sql + packages/vitest (26 files) but dependency graph scans entire Effect-TS clone (1,767 files). Fix: 1. Save include_paths to repositories table during indexing 2. All 3 analysis routes now pass repo.get('include_paths') to build_dependency_graph 3. Added force=true query param to dependencies endpoint to bypass stale cache built without include_paths filtering After re-indexing, Effect-TS subset repos will show 26 nodes instead of 1000. Closes OPE-162

vercel · 2026-03-06T23:18:35Z

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-06T23:18:49Z

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7ed70edb-80a6-4e77-b1d8-e161749af110

📥 Commits

Reviewing files that changed from the base of the PR and between 04ac019 and 9c9f20f.

📒 Files selected for processing (2)

backend/services/dependency_analyzer.py
backend/tests/test_dependency_analyzer.py

📝 Walkthrough

Walkthrough

Added optional force parameter to dependency-graph endpoints to optionally bypass cache and rebuilt graphs. Propagated and persisted repository include_paths metadata and sanitized it before use; tests added for include_paths edge cases.

Changes

Cohort / File(s)	Summary
Dependency endpoints `backend/routes/analysis.py`	Added `force: bool = False` to `get_dependency_graph`; when `force=true` bypasses cached graph, logs and passes repo `include_paths` into `build_dependency_graph`. `analyze_impact` and `get_repository_insights` updated to propagate `include_paths` when reconstructing graphs.
Repository indexing / persistence `backend/routes/repos.py`	`_run_async_indexing` now persists `include_paths` to the repo record (via Supabase service) and clears stale file-dependency cache before publishing initial progress.
Dependency analyzer sanitization `backend/services/dependency_analyzer.py`	`build_dependency_graph` filters `include_paths` to keep only non-empty strings; treats empty result as `None` to force full scan.
Tests `backend/tests/test_dependency_analyzer.py`	Added tests for include_paths sanitization and fallback behavior (corrupt entries, empty list => full scan).
Manifest `pyproject.toml`, `requirements.txt`	Test-related/dependency manifest updated (listed changes in diff).

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as Analysis API
  participant DB as Supabase (repo metadata)
  participant Analyzer as DependencyAnalyzer
  participant Cache as FileDependencyCache

  Client->>API: GET /dependency-graph?force=true
  API->>Cache: check cached graph for repo
  alt cached && force==false
    Cache-->>API: return cached graph
    API-->>Client: cached graph response
  else rebuild (force==true or cache miss)
    API->>DB: fetch repo metadata (includes include_paths)
    DB-->>API: return repo + include_paths
    API->>Analyzer: build_dependency_graph(repo, include_paths)
    Analyzer-->>API: new graph
    API->>Cache: update cached graph
    API-->>Client: fresh graph response (log includes include_paths)
  end

sequenceDiagram
  participant Indexer as Async Indexer
  participant DB as Supabase (repo metadata)
  participant Cache as FileDependencyCache
  participant Pub as ProgressPublisher

  Indexer->>DB: update repo with include_paths
  DB-->>Indexer: ack
  Indexer->>Cache: clear stale file-dependency entries for repo
  Cache-->>Indexer: ack
  Indexer->>Pub: publish initial progress
  Pub-->>Client: progress update

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: TypeScript dependency analysis -- proper parser, .js resolution, include_paths (OPE-120) #280 — Modifies dependency graph construction to accept/propagate include_paths; directly related to propagation and analyzer changes.
feat: monorepo subset indexing -- include_paths support (OPE-106) #266 — Overlaps with _run_async_indexing changes and include_paths persistence logic.

Poem

🐰 I hopped through caches, nose twitching bright,
I nudged include_paths into the light,
If force is given — no stale crumbs remain,
Fresh graphs sprout up after clearing the grain,
Carrots, logs, and tests — a builder's delight! 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main fix: persisting include_paths to prevent over-indexing of subset repositories. It references the bug (26 nodes vs 1000) and issue number (OPE-162), making it specific and informative.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/analysis.py`:
- Around line 31-39: The dependency cache is currently keyed only by repo_id, so
calls in backend/routes/analysis.py that call
dependency_analyzer.load_from_cache(repo_id) / save_to_cache(repo_id, ...) must
be updated to include the include_paths in the cache key; update the three
places in this file that call load_from_cache/save_to_cache (the
get_dependency_graph/impact/insights routes) to pass a derived cache_key (e.g.,
combine repo_id with a stable, order-independent serialization of
repo.get("include_paths") such as a sorted tuple or JSON string), and then
update dependency_analyzer.load_from_cache and save_to_cache (and the
supabase_service key generation) to accept and use that cache_key so cached
graphs are scoped by include_paths.

In `@backend/routes/repos.py`:
- Around line 605-608: The repo persistence only updates include_paths when
include_paths is truthy, leaving stale filters when async indexing returns to
whole-repo; change the logic around include_paths so
get_supabase_service().update_repository(repo_id, {"include_paths":
include_paths}) is always called (or called with an explicit null/empty value)
so the stored include_paths is cleared when include_paths is None/[] — modify
the block around include_paths in the function that currently performs the
update (referencing include_paths, repo_id, and
get_supabase_service().update_repository) to write the empty/null value instead
of skipping the update.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0fd0edfe-e999-47a8-9c03-aff8a056cfef

📥 Commits

Reviewing files that changed from the base of the PR and between 20604de and 65d19e2.

📒 Files selected for processing (2)

backend/routes/analysis.py
backend/routes/repos.py

… sanitize corrupt DB data Adversarial review found 3 bugs: 1. CRITICAL: Re-indexing full repo didn't clear old include_paths. if include_paths was truthy before save, re-index with None skipped the DB update. Now ALWAYS writes include_paths (None clears it). 2. HIGH: Stale dep cache after re-index with different include_paths. Dependency cache persisted across re-indexes. Now clear_file_dependencies runs at start of every re-index so graph rebuilds with new include_paths. 3. MEDIUM: Corrupt jsonb from DB (e.g. [123, null]) crashed Path().parts. build_dependency_graph now filters non-string entries and falls back to full scan if all entries are invalid. 3 new tests: - test_include_paths_with_corrupt_data: mixed valid/invalid entries - test_include_paths_all_corrupt_scans_everything: all invalid -> full scan - test_include_paths_empty_list_scans_everything: [] same as None

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/dependency_analyzer.py`:
- Around line 145-149: The persisted include_paths must be normalized the same
way routes.py does before being treated as valid: replace backslashes with '/',
strip surrounding whitespace and leading/trailing slashes, remove/skip any
entries that contain path traversal segments ('.' or '..' or start with '../' or
'./') or reduce to an empty string, and only keep entries that match the
repository traversal rules used in backend/routes/repos.py (reuse or call that
normalization/traversal helper if available); after normalization, if the list
is empty set include_paths = None so the code falls back to a full scan. Ensure
you update the sanitization block that currently filters only by type/truthiness
to perform these steps (operate on the include_paths variable and keep later
prefix-check logic intact).

In `@backend/tests/test_dependency_analyzer.py`:
- Around line 283-310: The three pytest functions
test_include_paths_with_corrupt_data,
test_include_paths_all_corrupt_scans_everything, and
test_include_paths_empty_list_scans_everything need explicit type hints on their
fixture parameters and a return type of -> None; update their signatures to
annotate analyzer and ts_repo (e.g., analyzer: DependencyAnalyzer, ts_repo:
Path) and add -> None, or if the concrete fixture types are not available import
and use typing.Any for the parameters, and ensure to add the corresponding
import (typing.Any or the concrete types) at the top of the test file.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a889f14c-9ed5-452b-ac28-dcc1d36c6be9

📥 Commits

Reviewing files that changed from the base of the PR and between 65d19e2 and 04ac019.

📒 Files selected for processing (3)

backend/routes/repos.py
backend/services/dependency_analyzer.py
backend/tests/test_dependency_analyzer.py

🚧 Files skipped from review as they are similar to previous changes (1)

backend/routes/repos.py

…defense in depth Sanitizer now matches the validation in IndexConfig.sanitize_paths: - Backslashes replaced with forward slashes - Leading/trailing whitespace and slashes stripped - Path traversal (..) entries rejected - Non-string and empty entries filtered 2 new tests: traversal rejection, backslash normalization. 35 total pass. Skipped findings: - Cache keyed by include_paths: not needed, cache cleared on re-index - repos.py truthy-only write: already fixed in previous commit - Test type hints: entire suite has zero, adding to 3 tests is inconsistent

vercel · 2026-03-06T23:44:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
opencodeintel	Ignored	Preview	Mar 6, 2026 11:44pm

coderabbitai Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread backend/routes/analysis.py

Comment thread backend/routes/repos.py Outdated

coderabbitai Bot reviewed Mar 6, 2026

View reviewed changes

Comment thread backend/services/dependency_analyzer.py Outdated

Comment thread backend/tests/test_dependency_analyzer.py

DevanshuNEU merged commit cdc7c47 into OpenCodeIntel:main Mar 6, 2026
7 checks passed

coderabbitai Bot mentioned this pull request Mar 11, 2026

fix: style_analyzer and dna_extractor respect include_paths for subset indexing (OPE-119) #293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162)#283

fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162)#283
DevanshuNEU merged 3 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/persist-include-paths

DevanshuNEU commented Mar 6, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 6, 2026

Uh oh!

coderabbitai Bot commented Mar 6, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

vercel Bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

DevanshuNEU commented Mar 6, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Root cause

Fix

Requires

After merge

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 6, 2026

Uh oh!

coderabbitai Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vercel Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevanshuNEU commented Mar 6, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 6, 2026 •

edited

Loading

vercel Bot commented Mar 6, 2026 •

edited

Loading