Skip to content

fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162)#283

Merged
DevanshuNEU merged 3 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/persist-include-paths
Mar 6, 2026
Merged

fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162)#283
DevanshuNEU merged 3 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:fix/persist-include-paths

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Mar 6, 2026

Copy link
Copy Markdown
Collaborator

Bug

User indexes packages/sql (19 files) + packages/vitest (7 files) = 26 files.
Dependency graph shows 1000 nodes because it scans the entire Effect-TS clone (1,767 files).

Root cause

include_paths was passed during indexing but never saved to the repositories table.
Analysis routes called build_dependency_graph(repo['local_path']) without include_paths.

Fix

  1. repos.py: Save include_paths to DB when indexing starts
  2. analysis.py: All 3 routes pass repo.get('include_paths') to build_dependency_graph
  3. analysis.py: Added force=true query param to bypass stale cache

Requires

After merge

Existing repos need re-index to populate include_paths in DB.
New indexes automatically save include_paths.
Dependency graph for subset repos shows only the subset's files.

2 files, +16 -9 lines. 387 tests pass.

Closes OPE-162

Summary by CodeRabbit

  • New Features

    • Optional force-refresh for dependency graphs to rebuild immediately instead of using cached data.
    • Repository path configuration (include paths) now persists in metadata so selections are retained.
  • Bug Fixes

    • Input sanitization for persisted path settings to avoid invalid values affecting analysis.
  • Tests

    • Added test coverage for path-filtering, empty/invalid-path fallbacks, and full-scan behavior.
  • Chore

    • Expanded diagnostic logging across dependency analysis endpoints.

…-162)

Root cause of 1000-node hairball: user indexes packages/sql + packages/vitest
(26 files) but dependency graph scans entire Effect-TS clone (1,767 files).

Fix:
1. Save include_paths to repositories table during indexing
2. All 3 analysis routes now pass repo.get('include_paths') to
   build_dependency_graph
3. Added force=true query param to dependencies endpoint to bypass
   stale cache built without include_paths filtering

After re-indexing, Effect-TS subset repos will show 26 nodes instead of 1000.

Closes OPE-162
@vercel

vercel Bot commented Mar 6, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Mar 6, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7ed70edb-80a6-4e77-b1d8-e161749af110

📥 Commits

Reviewing files that changed from the base of the PR and between 04ac019 and 9c9f20f.

📒 Files selected for processing (2)
  • backend/services/dependency_analyzer.py
  • backend/tests/test_dependency_analyzer.py
📝 Walkthrough

Walkthrough

Added optional force parameter to dependency-graph endpoints to optionally bypass cache and rebuilt graphs. Propagated and persisted repository include_paths metadata and sanitized it before use; tests added for include_paths edge cases.

Changes

Cohort / File(s) Summary
Dependency endpoints
backend/routes/analysis.py
Added force: bool = False to get_dependency_graph; when force=true bypasses cached graph, logs and passes repo include_paths into build_dependency_graph. analyze_impact and get_repository_insights updated to propagate include_paths when reconstructing graphs.
Repository indexing / persistence
backend/routes/repos.py
_run_async_indexing now persists include_paths to the repo record (via Supabase service) and clears stale file-dependency cache before publishing initial progress.
Dependency analyzer sanitization
backend/services/dependency_analyzer.py
build_dependency_graph filters include_paths to keep only non-empty strings; treats empty result as None to force full scan.
Tests
backend/tests/test_dependency_analyzer.py
Added tests for include_paths sanitization and fallback behavior (corrupt entries, empty list => full scan).
Manifest
pyproject.toml, requirements.txt
Test-related/dependency manifest updated (listed changes in diff).

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant API as Analysis API
  participant DB as Supabase (repo metadata)
  participant Analyzer as DependencyAnalyzer
  participant Cache as FileDependencyCache

  Client->>API: GET /dependency-graph?force=true
  API->>Cache: check cached graph for repo
  alt cached && force==false
    Cache-->>API: return cached graph
    API-->>Client: cached graph response
  else rebuild (force==true or cache miss)
    API->>DB: fetch repo metadata (includes include_paths)
    DB-->>API: return repo + include_paths
    API->>Analyzer: build_dependency_graph(repo, include_paths)
    Analyzer-->>API: new graph
    API->>Cache: update cached graph
    API-->>Client: fresh graph response (log includes include_paths)
  end
Loading
sequenceDiagram
  participant Indexer as Async Indexer
  participant DB as Supabase (repo metadata)
  participant Cache as FileDependencyCache
  participant Pub as ProgressPublisher

  Indexer->>DB: update repo with include_paths
  DB-->>Indexer: ack
  Indexer->>Cache: clear stale file-dependency entries for repo
  Cache-->>Indexer: ack
  Indexer->>Pub: publish initial progress
  Pub-->>Client: progress update
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through caches, nose twitching bright,
I nudged include_paths into the light,
If force is given — no stale crumbs remain,
Fresh graphs sprout up after clearing the grain,
Carrots, logs, and tests — a builder's delight! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main fix: persisting include_paths to prevent over-indexing of subset repositories. It references the bug (26 nodes vs 1000) and issue number (OPE-162), making it specific and informative.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/analysis.py`:
- Around line 31-39: The dependency cache is currently keyed only by repo_id, so
calls in backend/routes/analysis.py that call
dependency_analyzer.load_from_cache(repo_id) / save_to_cache(repo_id, ...) must
be updated to include the include_paths in the cache key; update the three
places in this file that call load_from_cache/save_to_cache (the
get_dependency_graph/impact/insights routes) to pass a derived cache_key (e.g.,
combine repo_id with a stable, order-independent serialization of
repo.get("include_paths") such as a sorted tuple or JSON string), and then
update dependency_analyzer.load_from_cache and save_to_cache (and the
supabase_service key generation) to accept and use that cache_key so cached
graphs are scoped by include_paths.

In `@backend/routes/repos.py`:
- Around line 605-608: The repo persistence only updates include_paths when
include_paths is truthy, leaving stale filters when async indexing returns to
whole-repo; change the logic around include_paths so
get_supabase_service().update_repository(repo_id, {"include_paths":
include_paths}) is always called (or called with an explicit null/empty value)
so the stored include_paths is cleared when include_paths is None/[] — modify
the block around include_paths in the function that currently performs the
update (referencing include_paths, repo_id, and
get_supabase_service().update_repository) to write the empty/null value instead
of skipping the update.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0fd0edfe-e999-47a8-9c03-aff8a056cfef

📥 Commits

Reviewing files that changed from the base of the PR and between 20604de and 65d19e2.

📒 Files selected for processing (2)
  • backend/routes/analysis.py
  • backend/routes/repos.py

Comment thread backend/routes/analysis.py
Comment thread backend/routes/repos.py Outdated
… sanitize corrupt DB data

Adversarial review found 3 bugs:

1. CRITICAL: Re-indexing full repo didn't clear old include_paths.
   if include_paths was truthy before save, re-index with None skipped
   the DB update. Now ALWAYS writes include_paths (None clears it).

2. HIGH: Stale dep cache after re-index with different include_paths.
   Dependency cache persisted across re-indexes. Now clear_file_dependencies
   runs at start of every re-index so graph rebuilds with new include_paths.

3. MEDIUM: Corrupt jsonb from DB (e.g. [123, null]) crashed Path().parts.
   build_dependency_graph now filters non-string entries and falls back to
   full scan if all entries are invalid.

3 new tests:
- test_include_paths_with_corrupt_data: mixed valid/invalid entries
- test_include_paths_all_corrupt_scans_everything: all invalid -> full scan
- test_include_paths_empty_list_scans_everything: [] same as None

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/dependency_analyzer.py`:
- Around line 145-149: The persisted include_paths must be normalized the same
way routes.py does before being treated as valid: replace backslashes with '/',
strip surrounding whitespace and leading/trailing slashes, remove/skip any
entries that contain path traversal segments ('.' or '..' or start with '../' or
'./') or reduce to an empty string, and only keep entries that match the
repository traversal rules used in backend/routes/repos.py (reuse or call that
normalization/traversal helper if available); after normalization, if the list
is empty set include_paths = None so the code falls back to a full scan. Ensure
you update the sanitization block that currently filters only by type/truthiness
to perform these steps (operate on the include_paths variable and keep later
prefix-check logic intact).

In `@backend/tests/test_dependency_analyzer.py`:
- Around line 283-310: The three pytest functions
test_include_paths_with_corrupt_data,
test_include_paths_all_corrupt_scans_everything, and
test_include_paths_empty_list_scans_everything need explicit type hints on their
fixture parameters and a return type of -> None; update their signatures to
annotate analyzer and ts_repo (e.g., analyzer: DependencyAnalyzer, ts_repo:
Path) and add -> None, or if the concrete fixture types are not available import
and use typing.Any for the parameters, and ensure to add the corresponding
import (typing.Any or the concrete types) at the top of the test file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a889f14c-9ed5-452b-ac28-dcc1d36c6be9

📥 Commits

Reviewing files that changed from the base of the PR and between 65d19e2 and 04ac019.

📒 Files selected for processing (3)
  • backend/routes/repos.py
  • backend/services/dependency_analyzer.py
  • backend/tests/test_dependency_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/routes/repos.py

Comment thread backend/services/dependency_analyzer.py Outdated
Comment thread backend/tests/test_dependency_analyzer.py
…defense in depth

Sanitizer now matches the validation in IndexConfig.sanitize_paths:
- Backslashes replaced with forward slashes
- Leading/trailing whitespace and slashes stripped
- Path traversal (..) entries rejected
- Non-string and empty entries filtered

2 new tests: traversal rejection, backslash normalization. 35 total pass.

Skipped findings:
- Cache keyed by include_paths: not needed, cache cleared on re-index
- repos.py truthy-only write: already fixed in previous commit
- Test type hints: entire suite has zero, adding to 3 tests is inconsistent
@vercel

vercel Bot commented Mar 6, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Mar 6, 2026 11:44pm

@DevanshuNEU DevanshuNEU merged commit cdc7c47 into OpenCodeIntel:main Mar 6, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant