fix: persist include_paths -- subset repos show 26 nodes not 1000 (OPE-162)#283
Conversation
…-162)
Root cause of 1000-node hairball: user indexes packages/sql + packages/vitest
(26 files) but dependency graph scans entire Effect-TS clone (1,767 files).
Fix:
1. Save include_paths to repositories table during indexing
2. All 3 analysis routes now pass repo.get('include_paths') to
build_dependency_graph
3. Added force=true query param to dependencies endpoint to bypass
stale cache built without include_paths filtering
After re-indexing, Effect-TS subset repos will show 26 nodes instead of 1000.
Closes OPE-162
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdded optional Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant API as Analysis API
participant DB as Supabase (repo metadata)
participant Analyzer as DependencyAnalyzer
participant Cache as FileDependencyCache
Client->>API: GET /dependency-graph?force=true
API->>Cache: check cached graph for repo
alt cached && force==false
Cache-->>API: return cached graph
API-->>Client: cached graph response
else rebuild (force==true or cache miss)
API->>DB: fetch repo metadata (includes include_paths)
DB-->>API: return repo + include_paths
API->>Analyzer: build_dependency_graph(repo, include_paths)
Analyzer-->>API: new graph
API->>Cache: update cached graph
API-->>Client: fresh graph response (log includes include_paths)
end
sequenceDiagram
participant Indexer as Async Indexer
participant DB as Supabase (repo metadata)
participant Cache as FileDependencyCache
participant Pub as ProgressPublisher
Indexer->>DB: update repo with include_paths
DB-->>Indexer: ack
Indexer->>Cache: clear stale file-dependency entries for repo
Cache-->>Indexer: ack
Indexer->>Pub: publish initial progress
Pub-->>Client: progress update
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/routes/analysis.py`:
- Around line 31-39: The dependency cache is currently keyed only by repo_id, so
calls in backend/routes/analysis.py that call
dependency_analyzer.load_from_cache(repo_id) / save_to_cache(repo_id, ...) must
be updated to include the include_paths in the cache key; update the three
places in this file that call load_from_cache/save_to_cache (the
get_dependency_graph/impact/insights routes) to pass a derived cache_key (e.g.,
combine repo_id with a stable, order-independent serialization of
repo.get("include_paths") such as a sorted tuple or JSON string), and then
update dependency_analyzer.load_from_cache and save_to_cache (and the
supabase_service key generation) to accept and use that cache_key so cached
graphs are scoped by include_paths.
In `@backend/routes/repos.py`:
- Around line 605-608: The repo persistence only updates include_paths when
include_paths is truthy, leaving stale filters when async indexing returns to
whole-repo; change the logic around include_paths so
get_supabase_service().update_repository(repo_id, {"include_paths":
include_paths}) is always called (or called with an explicit null/empty value)
so the stored include_paths is cleared when include_paths is None/[] — modify
the block around include_paths in the function that currently performs the
update (referencing include_paths, repo_id, and
get_supabase_service().update_repository) to write the empty/null value instead
of skipping the update.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 0fd0edfe-e999-47a8-9c03-aff8a056cfef
📒 Files selected for processing (2)
backend/routes/analysis.pybackend/routes/repos.py
… sanitize corrupt DB data Adversarial review found 3 bugs: 1. CRITICAL: Re-indexing full repo didn't clear old include_paths. if include_paths was truthy before save, re-index with None skipped the DB update. Now ALWAYS writes include_paths (None clears it). 2. HIGH: Stale dep cache after re-index with different include_paths. Dependency cache persisted across re-indexes. Now clear_file_dependencies runs at start of every re-index so graph rebuilds with new include_paths. 3. MEDIUM: Corrupt jsonb from DB (e.g. [123, null]) crashed Path().parts. build_dependency_graph now filters non-string entries and falls back to full scan if all entries are invalid. 3 new tests: - test_include_paths_with_corrupt_data: mixed valid/invalid entries - test_include_paths_all_corrupt_scans_everything: all invalid -> full scan - test_include_paths_empty_list_scans_everything: [] same as None
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/services/dependency_analyzer.py`:
- Around line 145-149: The persisted include_paths must be normalized the same
way routes.py does before being treated as valid: replace backslashes with '/',
strip surrounding whitespace and leading/trailing slashes, remove/skip any
entries that contain path traversal segments ('.' or '..' or start with '../' or
'./') or reduce to an empty string, and only keep entries that match the
repository traversal rules used in backend/routes/repos.py (reuse or call that
normalization/traversal helper if available); after normalization, if the list
is empty set include_paths = None so the code falls back to a full scan. Ensure
you update the sanitization block that currently filters only by type/truthiness
to perform these steps (operate on the include_paths variable and keep later
prefix-check logic intact).
In `@backend/tests/test_dependency_analyzer.py`:
- Around line 283-310: The three pytest functions
test_include_paths_with_corrupt_data,
test_include_paths_all_corrupt_scans_everything, and
test_include_paths_empty_list_scans_everything need explicit type hints on their
fixture parameters and a return type of -> None; update their signatures to
annotate analyzer and ts_repo (e.g., analyzer: DependencyAnalyzer, ts_repo:
Path) and add -> None, or if the concrete fixture types are not available import
and use typing.Any for the parameters, and ensure to add the corresponding
import (typing.Any or the concrete types) at the top of the test file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a889f14c-9ed5-452b-ac28-dcc1d36c6be9
📒 Files selected for processing (3)
backend/routes/repos.pybackend/services/dependency_analyzer.pybackend/tests/test_dependency_analyzer.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/routes/repos.py
…defense in depth Sanitizer now matches the validation in IndexConfig.sanitize_paths: - Backslashes replaced with forward slashes - Leading/trailing whitespace and slashes stripped - Path traversal (..) entries rejected - Non-string and empty entries filtered 2 new tests: traversal rejection, backslash normalization. 35 total pass. Skipped findings: - Cache keyed by include_paths: not needed, cache cleared on re-index - repos.py truthy-only write: already fixed in previous commit - Test type hints: entire suite has zero, adding to 3 tests is inconsistent
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Bug
User indexes packages/sql (19 files) + packages/vitest (7 files) = 26 files.
Dependency graph shows 1000 nodes because it scans the entire Effect-TS clone (1,767 files).
Root cause
include_paths was passed during indexing but never saved to the repositories table.
Analysis routes called build_dependency_graph(repo['local_path']) without include_paths.
Fix
Requires
After merge
Existing repos need re-index to populate include_paths in DB.
New indexes automatically save include_paths.
Dependency graph for subset repos shows only the subset's files.
2 files, +16 -9 lines. 387 tests pass.
Closes OPE-162
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chore