feat: context assembly -- per-task context packaging via MCP (OPE-172) by DevanshuNEU · Pull Request #291 · OpenCodeIntel/opencodeintel

DevanshuNEU · 2026-03-10T18:38:09Z

OPE-172: Dynamic per-task context assembly

The flagship feature. OCI assembles exactly the right context for the specific coding task -- under 1,500 tokens, every time.

New MCP tool: get_context_for_task

Assembly pipeline:

Semantic search (existing search_v2): top 5 relevant files
Dependency expansion (cached dep graph): 1-hop imports/dependents
Rule matching: splits CLAUDE.md/AGENTS.md by ## headers, keeps relevant sections
Token budget enforcer: fills by priority tier, truncates to fit

New endpoint: POST /api/v1/context/assemble

Files: context_assembler.py (new), context.py (new), dependencies.py, main.py, handlers.py, tools.py, tests

MCP tests: 46 passed. Flake8 clean.

Closes OPE-172

Summary by CodeRabbit

New Features
- Added a POST API to assemble per-task context packages (semantic search + dependency expansion + project rules) with configurable token budgets (100–10,000). Returns context payload and metadata (tokens used, files found, etc.).
- Added a new tool and handler to request assembled context for a given task and repository.
Tests
- Added tests exercising the new context tool/handler and validating request payloads and responses.

New MCP tool: get_context_for_task New endpoint: POST /api/v1/context/assemble Assembly pipeline: 1. Semantic search (existing search_v2) finds top 5 relevant files 2. Dependency expansion adds 1-hop imports/dependents from cached graph 3. Rule matching splits CLAUDE.md/AGENTS.md by ## headers, keeps sections mentioning discovered files + always-relevant sections 4. Token budget enforcer fills by priority tier (search > deps > rules) Files: - backend/services/context_assembler.py (new, ~200 lines) - backend/routes/context.py (new, ~80 lines) - backend/dependencies.py (add context_assembler singleton) - backend/main.py (register context_router) - mcp-server/handlers.py (add get_context_for_task handler) - mcp-server/tools.py (add tool schema) - mcp-server/tests/ (updated expected tools + handler test) MCP tests: 46 passed. Flake8 clean.

vercel · 2026-03-10T18:38:15Z

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-03-10T18:38:26Z

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 142daa36-1cd6-4562-9cc3-e47d1761117b

📥 Commits

Reviewing files that changed from the base of the PR and between 6301f99 and 24323cb.

📒 Files selected for processing (1)

backend/services/context_assembler.py

📝 Walkthrough

Walkthrough

Adds a context-assembly feature: a POST endpoint /api/v1/context/assemble and a ContextAssembler service that combines semantic search, dependency expansion, and project rules into a token-constrained markdown context package; exposes the feature via MCP tools and tests.

Changes

Cohort / File(s)	Summary
Backend wiring `backend/dependencies.py`, `backend/main.py`	Instantiate and expose a module-level `context_assembler` and register the new context router with the API prefix.
Context API `backend/routes/context.py`	New `APIRouter` with POST `/context/assemble`, `AssembleRequest` model, authentication checks, observability hooks, error handling, and invocation of `context_assembler.assemble`.
Context service `backend/services/context_assembler.py`	New `ContextAssembler` class plus helpers and constants (RULES_FILES, ALWAYS_RELEVANT_PATTERNS). Implements semantic search, deduplication, 1-hop dependency expansion, rule-section matching, token estimation, and builds a tiered markdown package within a token budget.
MCP integration `mcp-server/tools.py`, `mcp-server/handlers.py`	Added `get_context_for_task` tool schema and handler `_handle_get_context_for_task` which POSTs to `/context/assemble` and formats the returned context; handler registered in dispatch table.
Tests `mcp-server/tests/test_handlers.py`, `mcp-server/tests/test_tools.py`	Added test for the `get_context_for_task` dispatch flow (mocking POST) and updated expected tools set to include the new tool.

Sequence Diagram

sequenceDiagram
    actor Client
    participant API as FastAPI Endpoint
    participant ContextAssembler as ContextAssembler Service
    participant Indexer as Search Indexer
    participant DB as Dependency Store

    Client->>API: POST /api/v1/context/assemble (task, repo_id, token_budget)
    API->>API: authenticate & verify repo access
    API->>ContextAssembler: assemble(task, repo_id, user_id, budget)
    ContextAssembler->>Indexer: semantic_search(task, repo_id)
    Indexer-->>ContextAssembler: relevant files + scores
    ContextAssembler->>DB: query 1-hop dependencies for files
    DB-->>ContextAssembler: dependency file list
    ContextAssembler->>ContextAssembler: dedupe, match rules, estimate tokens
    ContextAssembler->>ContextAssembler: build markdown package (Tier1/2/3 within budget)
    ContextAssembler-->>API: assembled context payload (content + metadata)
    API->>API: log metrics / breadcrumbs
    API-->>Client: HTTP 200 + context result

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I scurry through files, rules, and thread,
Hunting tokens where relevance led.
I stitch dependencies, trim with care,
A package for tasks — notes everywhere.
Hooray! A context hop, fresh and fair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.22% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing per-task context assembly via MCP tool, which is the primary objective of this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DevanshuNEU · 2026-03-10T18:43:13Z

@CodeRabbit review

coderabbitai · 2026-03-10T18:43:26Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/context.py`:
- Around line 34-37: The function assemble_context is missing an explicit return
type; update its signature for assemble_context(request: AssembleRequest, auth:
AuthContext = Depends(require_auth)) to include a return annotation -> dict[str,
Any] and ensure Any is imported (from typing import Any) so the signature reads
assemble_context(...) -> dict[str, Any].
- Around line 65-72: The logger.info call in context.py is passing arbitrary
kwargs (repo_id, files, tokens, budget, duration_ms) which triggers a TypeError;
replace that call to use the stdlib logging "extra" dict instead: build an extra
mapping with keys like "repo_id": request.repo_id, "files":
result["files_found"], "tokens": result["tokens_used"], "budget":
request.token_budget and "duration_ms": round(elapsed * 1000) and pass it as
logger.info("Context assembled", extra=extra_dict) so structured data is
attached without causing TypeError (locate the logger.info call shown in the
diff).

In `@backend/services/context_assembler.py`:
- Around line 218-269: The _build_package function can exceed the token budget
because it appends the "Relevant files" and "Depends on" tiers before checking
tokens; change the logic in _build_package so you enforce the budget before
appending each tier (Relevant files, Depends on, Rules that apply) by estimating
tokens with _estimate_tokens for the candidate tier block and skipping or
truncating that tier if adding it would exceed budget; for the Relevant files
tier iterate and add individual file lines only while remaining = budget -
_estimate_tokens(current_lines) is positive (truncate long signatures/paths as
needed), for the Depends on tier only include dep_files up to the remaining
token allowance, and keep the existing per-section token check for matched_rules
using remaining to decide include/truncate.
- Around line 94-107: The code currently turns a missing repo["local_path"] into
Path(".") and then calls _read_rules_file(local_path), which can cause the
service to scan the current checkout and inject rules from the wrong repo; fix
by guarding the rules read: after repo = get_repo_or_404(...) check that
repo.get("local_path") is non-empty and points to an existing directory (e.g.,
truthy string and Path(...).exists()), and only call
_read_rules_file(local_path) and _match_rules(...) when that guard passes; keep
the rest of the flow (_search, _expand_deps, _match_rules) unchanged and
reference the local_path variable and _read_rules_file function.
- Around line 101-106: assemble() is async but calls blocking functions
synchronously: _expand_deps() (which calls get_file_dependencies() that uses
synchronous .execute()) and _read_rules_file() (which calls Path.read_text()),
causing event-loop blocking; fix by moving the blocking work off the async
request path—either switch get_file_dependencies()/get_supabase_service() calls
to their async equivalents or run the sync DB call inside an executor, and call
Path.read_text() via asyncio.to_thread() (or use aiofiles) inside
_read_rules_file(); update references in assemble(), _expand_deps(),
get_file_dependencies(), and _read_rules_file() so the I/O is awaited
non-blockingly.

In `@mcp-server/tests/test_handlers.py`:
- Around line 65-83: The test test_context_for_task_dispatches currently only
checks the JSON payload but not the endpoint; update the test to also assert
that the mocked api_post was called with the correct path by checking
mock_post.call_args (or mock_post.call_args_list[0]) and verifying the first
positional argument equals "/context/assemble"; locate the test function
test_context_for_task_dispatches and the use of mock_post
(patch("handlers.api_post", ...)) and add this assertion alongside the existing
payload checks so the dispatch path is validated for
_handle_get_context_for_task / call_tool.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 09911ce1-3dbf-412f-b1ef-12d12fdcd695

📥 Commits

Reviewing files that changed from the base of the PR and between 487c61d and 6730648.

📒 Files selected for processing (8)

backend/dependencies.py
backend/main.py
backend/routes/context.py
backend/services/context_assembler.py
mcp-server/handlers.py
mcp-server/tests/test_handlers.py
mcp-server/tests/test_tools.py
mcp-server/tools.py

CI fixes: - Remove unused import get_repo_or_404 from context.py (F401) - Remove unused import dependency_analyzer from context_assembler.py (F401) Correctness: - Use project StructuredLogger (from services.observability) not stdlib logging.getLogger() -- matches all other routes/services - Guard local_path: skip rule reading when repo path is empty or missing instead of falling back to Path('') which resolves to cwd - Add return type annotation -> dict[str, Any] on assemble_context Async safety: - Wrap blocking get_file_dependencies() in asyncio.to_thread via _load_deps_sync helper (matches project pattern in repos.py) - Wrap blocking Path.read_text() in asyncio.to_thread via _read_rules_file_sync helper Budget enforcement: - _build_package now checks remaining budget BEFORE appending each tier (files, deps, rules). Individual file/dep entries are added only while budget allows, preventing Tier 1+2 from blowing past the token limit before rules are considered. Test: - Add endpoint path assertion: verify api_post called with '/context/assemble' (not just payload check)

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

backend/services/context_assembler.py (1)

82-134: Consider singleton pattern for consistency with other services.

The coding guidelines mention following the singleton pattern used in dependency_analyzer.py for new services in services/. Currently, ContextAssembler is instantiated directly in dependencies.py. While this works, adopting the singleton pattern would improve consistency with other services.

♻️ Optional: Add singleton accessor

+_instance: Optional["ContextAssembler"] = None
+
+
+def get_context_assembler() -> "ContextAssembler":
+    """Get or create the singleton ContextAssembler instance."""
+    global _instance
+    if _instance is None:
+        _instance = ContextAssembler()
+    return _instance
+
+
 class ContextAssembler:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/services/context_assembler.py` around lines 82 - 134, The
ContextAssembler is currently a plain class and is being instantiated directly
in dependencies.py; follow the established singleton pattern used by
dependency_analyzer.py by adding a module-level lazy singleton and accessor
(e.g., get_context_assembler()) in context_assembler.py that returns a single
ContextAssembler instance, and update dependencies.py to import and call that
accessor instead of constructing ContextAssembler directly; ensure the accessor
is thread-safe/lazy (simple module-level cache is fine), leave async methods
(assemble/_search/_expand_deps) unchanged, and keep symbol names
ContextAssembler and get_context_assembler to make locating the change
straightforward.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/context_assembler.py`:
- Around line 238-270: The tier header strings and the trailing blank lines are
currently appended to tier_lines without deducting their token cost from
remaining, which can let the prompt exceed the budget; before adding a tier,
estimate the token cost of the header (e.g., "### Relevant files" / "### Depends
on") and the trailing blank line using _estimate_tokens, ensure remaining is
reduced by that cost (or skip the entire tier if header cost alone exceeds
remaining), then proceed to loop over entries subtracting each entry's token
cost as done now; apply this change in the blocks that handle found_files and
dep_files (tier_lines, remaining, _estimate_tokens) so headers and trailing
newlines are accounted for consistently.

---

Nitpick comments:
In `@backend/services/context_assembler.py`:
- Around line 82-134: The ContextAssembler is currently a plain class and is
being instantiated directly in dependencies.py; follow the established singleton
pattern used by dependency_analyzer.py by adding a module-level lazy singleton
and accessor (e.g., get_context_assembler()) in context_assembler.py that
returns a single ContextAssembler instance, and update dependencies.py to import
and call that accessor instead of constructing ContextAssembler directly; ensure
the accessor is thread-safe/lazy (simple module-level cache is fine), leave
async methods (assemble/_search/_expand_deps) unchanged, and keep symbol names
ContextAssembler and get_context_assembler to make locating the change
straightforward.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e66c0be8-e1f7-4238-b2e8-205f697225a7

📥 Commits

Reviewing files that changed from the base of the PR and between 6730648 and 6301f99.

📒 Files selected for processing (3)

backend/routes/context.py
backend/services/context_assembler.py
mcp-server/tests/test_handlers.py

The '### Relevant files' and '### Depends on' headers plus their trailing blank lines were appended without subtracting their token cost from remaining. Three unaccounted headers compound to ~30-40 tokens of budget overrun. Now each tier deducts header_cost before iterating entries. Skipped singleton accessor nitpick: dependency_analyzer.py, style_analyzer.py, and dna_extractor.py are all plain classes instantiated in dependencies.py -- no get_*() accessor pattern exists in the project. ContextAssembler already matches.

vercel · 2026-03-10T19:10:59Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
opencodeintel	Ignored	Preview	Mar 10, 2026 7:11pm

coderabbitai Bot reviewed Mar 10, 2026

View reviewed changes

Comment thread backend/services/context_assembler.py

DevanshuNEU merged commit f0f339c into OpenCodeIntel:main Mar 10, 2026
8 checks passed

Uh oh!

Conversation

DevanshuNEU commented Mar 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OPE-172: Dynamic per-task context assembly

New MCP tool: get_context_for_task

New endpoint: POST /api/v1/context/assemble

Files: context_assembler.py (new), context.py (new), dependencies.py, main.py, handlers.py, tools.py, tests

Summary by CodeRabbit

Uh oh!

vercel Bot commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

DevanshuNEU commented Mar 10, 2026

Uh oh!

coderabbitai Bot commented Mar 10, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vercel Bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DevanshuNEU commented Mar 10, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 10, 2026 •

edited

Loading

vercel Bot commented Mar 10, 2026 •

edited

Loading