Skip to content

feat: context assembly -- per-task context packaging via MCP (OPE-172)#291

Merged
DevanshuNEU merged 3 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:feat/context-assembler-ope-172
Mar 10, 2026
Merged

feat: context assembly -- per-task context packaging via MCP (OPE-172)#291
DevanshuNEU merged 3 commits into
OpenCodeIntel:mainfrom
DevanshuNEU:feat/context-assembler-ope-172

Conversation

@DevanshuNEU

@DevanshuNEU DevanshuNEU commented Mar 10, 2026

Copy link
Copy Markdown
Collaborator

OPE-172: Dynamic per-task context assembly

The flagship feature. OCI assembles exactly the right context for the specific coding task -- under 1,500 tokens, every time.

New MCP tool: get_context_for_task

Assembly pipeline:

  1. Semantic search (existing search_v2): top 5 relevant files
  2. Dependency expansion (cached dep graph): 1-hop imports/dependents
  3. Rule matching: splits CLAUDE.md/AGENTS.md by ## headers, keeps relevant sections
  4. Token budget enforcer: fills by priority tier, truncates to fit

New endpoint: POST /api/v1/context/assemble

Files: context_assembler.py (new), context.py (new), dependencies.py, main.py, handlers.py, tools.py, tests

MCP tests: 46 passed. Flake8 clean.

Closes OPE-172

Summary by CodeRabbit

  • New Features

    • Added a POST API to assemble per-task context packages (semantic search + dependency expansion + project rules) with configurable token budgets (100–10,000). Returns context payload and metadata (tokens used, files found, etc.).
    • Added a new tool and handler to request assembled context for a given task and repository.
  • Tests

    • Added tests exercising the new context tool/handler and validating request payloads and responses.

New MCP tool: get_context_for_task
New endpoint: POST /api/v1/context/assemble

Assembly pipeline:
1. Semantic search (existing search_v2) finds top 5 relevant files
2. Dependency expansion adds 1-hop imports/dependents from cached graph
3. Rule matching splits CLAUDE.md/AGENTS.md by ## headers, keeps
   sections mentioning discovered files + always-relevant sections
4. Token budget enforcer fills by priority tier (search > deps > rules)

Files:
- backend/services/context_assembler.py (new, ~200 lines)
- backend/routes/context.py (new, ~80 lines)
- backend/dependencies.py (add context_assembler singleton)
- backend/main.py (register context_router)
- mcp-server/handlers.py (add get_context_for_task handler)
- mcp-server/tools.py (add tool schema)
- mcp-server/tests/ (updated expected tools + handler test)

MCP tests: 46 passed. Flake8 clean.
@vercel

vercel Bot commented Mar 10, 2026

Copy link
Copy Markdown

@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai

coderabbitai Bot commented Mar 10, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@DevanshuNEU has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 142daa36-1cd6-4562-9cc3-e47d1761117b

📥 Commits

Reviewing files that changed from the base of the PR and between 6301f99 and 24323cb.

📒 Files selected for processing (1)
  • backend/services/context_assembler.py
📝 Walkthrough

Walkthrough

Adds a context-assembly feature: a POST endpoint /api/v1/context/assemble and a ContextAssembler service that combines semantic search, dependency expansion, and project rules into a token-constrained markdown context package; exposes the feature via MCP tools and tests.

Changes

Cohort / File(s) Summary
Backend wiring
backend/dependencies.py, backend/main.py
Instantiate and expose a module-level context_assembler and register the new context router with the API prefix.
Context API
backend/routes/context.py
New APIRouter with POST /context/assemble, AssembleRequest model, authentication checks, observability hooks, error handling, and invocation of context_assembler.assemble.
Context service
backend/services/context_assembler.py
New ContextAssembler class plus helpers and constants (RULES_FILES, ALWAYS_RELEVANT_PATTERNS). Implements semantic search, deduplication, 1-hop dependency expansion, rule-section matching, token estimation, and builds a tiered markdown package within a token budget.
MCP integration
mcp-server/tools.py, mcp-server/handlers.py
Added get_context_for_task tool schema and handler _handle_get_context_for_task which POSTs to /context/assemble and formats the returned context; handler registered in dispatch table.
Tests
mcp-server/tests/test_handlers.py, mcp-server/tests/test_tools.py
Added test for the get_context_for_task dispatch flow (mocking POST) and updated expected tools set to include the new tool.

Sequence Diagram

sequenceDiagram
    actor Client
    participant API as FastAPI Endpoint
    participant ContextAssembler as ContextAssembler Service
    participant Indexer as Search Indexer
    participant DB as Dependency Store

    Client->>API: POST /api/v1/context/assemble (task, repo_id, token_budget)
    API->>API: authenticate & verify repo access
    API->>ContextAssembler: assemble(task, repo_id, user_id, budget)
    ContextAssembler->>Indexer: semantic_search(task, repo_id)
    Indexer-->>ContextAssembler: relevant files + scores
    ContextAssembler->>DB: query 1-hop dependencies for files
    DB-->>ContextAssembler: dependency file list
    ContextAssembler->>ContextAssembler: dedupe, match rules, estimate tokens
    ContextAssembler->>ContextAssembler: build markdown package (Tier1/2/3 within budget)
    ContextAssembler-->>API: assembled context payload (content + metadata)
    API->>API: log metrics / breadcrumbs
    API-->>Client: HTTP 200 + context result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I scurry through files, rules, and thread,
Hunting tokens where relevance led.
I stitch dependencies, trim with care,
A package for tasks — notes everywhere.
Hooray! A context hop, fresh and fair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing per-task context assembly via MCP tool, which is the primary objective of this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DevanshuNEU

Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai

coderabbitai Bot commented Mar 10, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/routes/context.py`:
- Around line 34-37: The function assemble_context is missing an explicit return
type; update its signature for assemble_context(request: AssembleRequest, auth:
AuthContext = Depends(require_auth)) to include a return annotation -> dict[str,
Any] and ensure Any is imported (from typing import Any) so the signature reads
assemble_context(...) -> dict[str, Any].
- Around line 65-72: The logger.info call in context.py is passing arbitrary
kwargs (repo_id, files, tokens, budget, duration_ms) which triggers a TypeError;
replace that call to use the stdlib logging "extra" dict instead: build an extra
mapping with keys like "repo_id": request.repo_id, "files":
result["files_found"], "tokens": result["tokens_used"], "budget":
request.token_budget and "duration_ms": round(elapsed * 1000) and pass it as
logger.info("Context assembled", extra=extra_dict) so structured data is
attached without causing TypeError (locate the logger.info call shown in the
diff).

In `@backend/services/context_assembler.py`:
- Around line 218-269: The _build_package function can exceed the token budget
because it appends the "Relevant files" and "Depends on" tiers before checking
tokens; change the logic in _build_package so you enforce the budget before
appending each tier (Relevant files, Depends on, Rules that apply) by estimating
tokens with _estimate_tokens for the candidate tier block and skipping or
truncating that tier if adding it would exceed budget; for the Relevant files
tier iterate and add individual file lines only while remaining = budget -
_estimate_tokens(current_lines) is positive (truncate long signatures/paths as
needed), for the Depends on tier only include dep_files up to the remaining
token allowance, and keep the existing per-section token check for matched_rules
using remaining to decide include/truncate.
- Around line 94-107: The code currently turns a missing repo["local_path"] into
Path(".") and then calls _read_rules_file(local_path), which can cause the
service to scan the current checkout and inject rules from the wrong repo; fix
by guarding the rules read: after repo = get_repo_or_404(...) check that
repo.get("local_path") is non-empty and points to an existing directory (e.g.,
truthy string and Path(...).exists()), and only call
_read_rules_file(local_path) and _match_rules(...) when that guard passes; keep
the rest of the flow (_search, _expand_deps, _match_rules) unchanged and
reference the local_path variable and _read_rules_file function.
- Around line 101-106: assemble() is async but calls blocking functions
synchronously: _expand_deps() (which calls get_file_dependencies() that uses
synchronous .execute()) and _read_rules_file() (which calls Path.read_text()),
causing event-loop blocking; fix by moving the blocking work off the async
request path—either switch get_file_dependencies()/get_supabase_service() calls
to their async equivalents or run the sync DB call inside an executor, and call
Path.read_text() via asyncio.to_thread() (or use aiofiles) inside
_read_rules_file(); update references in assemble(), _expand_deps(),
get_file_dependencies(), and _read_rules_file() so the I/O is awaited
non-blockingly.

In `@mcp-server/tests/test_handlers.py`:
- Around line 65-83: The test test_context_for_task_dispatches currently only
checks the JSON payload but not the endpoint; update the test to also assert
that the mocked api_post was called with the correct path by checking
mock_post.call_args (or mock_post.call_args_list[0]) and verifying the first
positional argument equals "/context/assemble"; locate the test function
test_context_for_task_dispatches and the use of mock_post
(patch("handlers.api_post", ...)) and add this assertion alongside the existing
payload checks so the dispatch path is validated for
_handle_get_context_for_task / call_tool.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 09911ce1-3dbf-412f-b1ef-12d12fdcd695

📥 Commits

Reviewing files that changed from the base of the PR and between 487c61d and 6730648.

📒 Files selected for processing (8)
  • backend/dependencies.py
  • backend/main.py
  • backend/routes/context.py
  • backend/services/context_assembler.py
  • mcp-server/handlers.py
  • mcp-server/tests/test_handlers.py
  • mcp-server/tests/test_tools.py
  • mcp-server/tools.py

Comment thread backend/routes/context.py Outdated
Comment thread backend/routes/context.py
Comment thread backend/services/context_assembler.py
Comment thread backend/services/context_assembler.py Outdated
Comment thread backend/services/context_assembler.py
Comment thread mcp-server/tests/test_handlers.py
CI fixes:
- Remove unused import get_repo_or_404 from context.py (F401)
- Remove unused import dependency_analyzer from context_assembler.py (F401)

Correctness:
- Use project StructuredLogger (from services.observability) not stdlib
  logging.getLogger() -- matches all other routes/services
- Guard local_path: skip rule reading when repo path is empty or missing
  instead of falling back to Path('') which resolves to cwd
- Add return type annotation -> dict[str, Any] on assemble_context

Async safety:
- Wrap blocking get_file_dependencies() in asyncio.to_thread via
  _load_deps_sync helper (matches project pattern in repos.py)
- Wrap blocking Path.read_text() in asyncio.to_thread via
  _read_rules_file_sync helper

Budget enforcement:
- _build_package now checks remaining budget BEFORE appending each
  tier (files, deps, rules). Individual file/dep entries are added
  only while budget allows, preventing Tier 1+2 from blowing past
  the token limit before rules are considered.

Test:
- Add endpoint path assertion: verify api_post called with
  '/context/assemble' (not just payload check)

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/services/context_assembler.py (1)

82-134: Consider singleton pattern for consistency with other services.

The coding guidelines mention following the singleton pattern used in dependency_analyzer.py for new services in services/. Currently, ContextAssembler is instantiated directly in dependencies.py. While this works, adopting the singleton pattern would improve consistency with other services.

♻️ Optional: Add singleton accessor
+_instance: Optional["ContextAssembler"] = None
+
+
+def get_context_assembler() -> "ContextAssembler":
+    """Get or create the singleton ContextAssembler instance."""
+    global _instance
+    if _instance is None:
+        _instance = ContextAssembler()
+    return _instance
+
+
 class ContextAssembler:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/context_assembler.py` around lines 82 - 134, The
ContextAssembler is currently a plain class and is being instantiated directly
in dependencies.py; follow the established singleton pattern used by
dependency_analyzer.py by adding a module-level lazy singleton and accessor
(e.g., get_context_assembler()) in context_assembler.py that returns a single
ContextAssembler instance, and update dependencies.py to import and call that
accessor instead of constructing ContextAssembler directly; ensure the accessor
is thread-safe/lazy (simple module-level cache is fine), leave async methods
(assemble/_search/_expand_deps) unchanged, and keep symbol names
ContextAssembler and get_context_assembler to make locating the change
straightforward.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/services/context_assembler.py`:
- Around line 238-270: The tier header strings and the trailing blank lines are
currently appended to tier_lines without deducting their token cost from
remaining, which can let the prompt exceed the budget; before adding a tier,
estimate the token cost of the header (e.g., "### Relevant files" / "### Depends
on") and the trailing blank line using _estimate_tokens, ensure remaining is
reduced by that cost (or skip the entire tier if header cost alone exceeds
remaining), then proceed to loop over entries subtracting each entry's token
cost as done now; apply this change in the blocks that handle found_files and
dep_files (tier_lines, remaining, _estimate_tokens) so headers and trailing
newlines are accounted for consistently.

---

Nitpick comments:
In `@backend/services/context_assembler.py`:
- Around line 82-134: The ContextAssembler is currently a plain class and is
being instantiated directly in dependencies.py; follow the established singleton
pattern used by dependency_analyzer.py by adding a module-level lazy singleton
and accessor (e.g., get_context_assembler()) in context_assembler.py that
returns a single ContextAssembler instance, and update dependencies.py to import
and call that accessor instead of constructing ContextAssembler directly; ensure
the accessor is thread-safe/lazy (simple module-level cache is fine), leave
async methods (assemble/_search/_expand_deps) unchanged, and keep symbol names
ContextAssembler and get_context_assembler to make locating the change
straightforward.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e66c0be8-e1f7-4238-b2e8-205f697225a7

📥 Commits

Reviewing files that changed from the base of the PR and between 6730648 and 6301f99.

📒 Files selected for processing (3)
  • backend/routes/context.py
  • backend/services/context_assembler.py
  • mcp-server/tests/test_handlers.py

Comment thread backend/services/context_assembler.py
The '### Relevant files' and '### Depends on' headers plus their
trailing blank lines were appended without subtracting their token
cost from remaining. Three unaccounted headers compound to ~30-40
tokens of budget overrun. Now each tier deducts header_cost before
iterating entries.

Skipped singleton accessor nitpick: dependency_analyzer.py,
style_analyzer.py, and dna_extractor.py are all plain classes
instantiated in dependencies.py -- no get_*() accessor pattern
exists in the project. ContextAssembler already matches.
@vercel

vercel Bot commented Mar 10, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
opencodeintel Ignored Ignored Preview Mar 10, 2026 7:11pm

@DevanshuNEU DevanshuNEU merged commit f0f339c into OpenCodeIntel:main Mar 10, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant