docs: tool framework design — multi-type tool support by ashwing · Pull Request #67 · vllm-project/agentic-api

ashwing · 2026-06-19T03:39:43Z

Summary

Design proposal for a generic tool framework that handles heterogeneous tool types (function, mcp, web_search, file_search, code_interpreter) through a single pipeline with type-specific handlers.

Key architectural decisions:

ResponsesTool becomes a #[serde(tag = "type")] enum (backward-compatible)
Request-scoped ToolRegistry routes post-inference function_call items by name lookup
ToolHandler trait — new types implement it without touching the executor loop
function type is client-owned (requires_action); all other types are gateway-executed
LoopDecision::ContinuePartial handles mixed requests (gateway + client tools in one call)

References ADR-01 D7 (MCP as primary tool interface) and ADR-03 D3 (tool registry in agentic-core).

Looking for feedback on:

The function type as client-owned passthrough vs gateway-executed — is this the right split?
ContinuePartial semantics for mixed tool requests
The ToolHandler trait surface area — too much? too little?
PR decomposition and ordering

Test Plan

Doc-only change, no code
cargo build --workspace && cargo test --workspace passes (no functional changes)

Introduces a design doc for a generic tool framework that handles the full lifecycle of heterogeneous tool types (function, mcp, web_search, file_search, code_interpreter) through a single pipeline with type-specific handlers. Key ideas: - ResponsesTool becomes a tagged enum (backward-compatible serde) - Request-scoped ToolRegistry routes function_call items by name - ToolHandler trait allows adding new types without touching the loop - function type is client-owned (requires_action); all others gateway-executed Signed-off-by: Ashwin Giridharan <girida@amazon.com>

Enumerates five alternatives (reject, ignore+warn, search MCP, require executor, configurable per-request) and explains why passthrough with requires_action was chosen. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing · 2026-06-19T03:48:40Z

@franciscojavierarceo @noobHappylife @maralbahari — would appreciate your eyes on this design proposal.

Key questions I'd love feedback on:

Is function as client-owned passthrough (requires_action) the right split? (see Alternatives section)
Does the ToolHandler trait surface area feel right for extensibility?
Does the ContinuePartial loop decision make sense for mixed tool requests?

This is meant to be the framework that MCP, web_search, file_search all plug into.

maralbahari · 2026-06-19T06:33:16Z

@ashwing thank you for the design document.
The overall flow sounds correct. user controls the tool choices then agentic-api need to normalize the tools into function_calls the way that the model sees them and then inference results in which function_call to be execution either on agentic-api or the client side.
to determine how to execute the function_call s would depend on how we design. let's say execute them in parallel in case of multiple tools being executed in parallel would need to consider a timeout if any of the job was not completed in given time to drop them.

there is a case we need two function_call executions in parallel let's say one execution would need to take place in agentic-api (gateway) that is bound by timeout and one needs to be executed on client side (referred in this design as function) which requires action by user then we have several options here either user declines and all tools stops or user responses with another prompt and doesnt complete the execution of the current function would this case be handled in continuePartial? or the continuePartial is only when user confirms the function.

there is some complication there which I feel we could implement in small stages without partial complication and leave that last stage. Note that the tool calls are not actually part of MVP. with that being said we do not need to rush to get into this stage we need to plan the stages to reach to full agentic-loop.

I think one starting point to get the loop sequence right we could observe and record a multi-turn conversation cassettes from OpenAI with some tool choices to clarify the map of execution_loop with simple cases. then build from there.
The steps I could think of that would benefit us to implement first to enable us get closer to a wholesome agentic-loop iteration with tool options :

record multi-turn conversation with function_calls from OpenAI (cassettes).
filtering tool options and tool normalizer
implement a simple execution_loop with simple LoopDecision based on the recorded cassettes.
mcp executor
support of other tools like file_search (currently in progress using OGX)
refine execution_loop and LoopDecision with new tools

ashwing · 2026-06-23T19:01:47Z

@maralbahari On ContinuePartial — fair point about the decline/redirect edge case. I'll defer that to a follow-up and start with the clean two-way split: gateway-owned tools loop, client-owned tools halt with requires_action. Mixed case is additive once both paths work.

Cassettes with tool calls are already part of the plan — we'll record multi-turn OpenAI sessions with function + MCP tools as the first PR, same approach as #66.

On ordering — I'd push back on full sequential. PR A (types + registry + normalize) is pure interfaces with no execution logic. Once that lands, PR B (dispatch) and PR C (MCP handler) can develop in parallel — B tests against mock handlers, C tests against a mock MCP server, neither imports the other. They only meet at the integration layer after both land. Serializing them adds wait time for no technical reason — decoupling "how to route" from "how to execute" is the whole point of the trait-based design.

Proposed: cassettes → types/registry/normalize → dispatch + MCP in parallel → integration.

Record 8 realistic multi-turn scenarios from a "data pipeline debug" story against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy weather/stock cassettes with conversations that exercise the tool types from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents). Scenarios cover: full investigation (5 turns), investigate-and-restart, quick triage, parallel compare, deep runbook analysis, web+internal search, mixed gateway+client tools, and streaming multi-turn. Also fixes FunctionToolCall deserialization for gpt-oss which emits status:null — a custom serde deserializer defaults it to "completed". Includes the Python recording script used to generate the cassettes. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

Record 8 realistic multi-turn scenarios from a "data pipeline debug" story against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy weather/stock cassettes with conversations that exercise the tool types from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents). Scenarios cover: full investigation (5 turns), investigate-and-restart, quick triage, parallel compare, deep runbook analysis, web+internal search, mixed gateway+client tools, and streaming multi-turn. Also fixes FunctionToolCall.status to use MessageStatus enum instead of a raw String, with a custom serde deserializer that defaults null (emitted by gpt-oss) to MessageStatus::Completed. Cassettes recorded using tests/cassettes/record_cassette.py against live vLLM instances. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing · 2026-06-30T02:08:28Z

@maralbahari @franciscojavierarceo Can we merge this if there are no further comments?

maralbahari · 2026-07-01T05:48:15Z

@maralbahari @franciscojavierarceo Can we merge this if there are no further comments?

looks good to me.
@franciscojavierarceo have any comments on this?

Implements the gateway-side agentic tool loop on top of the ToolRegistry and GatewayExecutor traits landed in PR A (vllm-project#80): - executor/dispatch.rs: LoopDecision enum (#[non_exhaustive]) + dispatch_tools() — classifies FunctionToolCall items via ToolRegistry::gateway_owned(), executes in parallel with 30s per-call timeout, maps failures to error-JSON FunctionCallOutput items (never aborts the loop on tool error). - executor/agentic_loop.rs: execute_loop() — multi-turn orchestrator that clears all three persistence triggers before looping and restores original IDs on the final payload. Rejects stream=true (StreamTee is a future PR). Hard guard of 128 iterations, soft cap via max_iterations param (default: 10). Client-owned function tools (ToolType::Function) return Done for now; RequiresAction and ContinuePartial are deferred per staging agreement in PR vllm-project#67 — LoopDecision is #[non_exhaustive] to make the addition safe. MCP tool names are absent from the registry until PR C adds discovery; any function_call for an MCP tool name is treated as client-owned. 244 tests pass; cargo clippy --workspace --all-targets -- -D warnings clean. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing added 2 commits June 18, 2026 20:38

docs: add alternatives considered for function tool handling

793675b

Enumerates five alternatives (reject, ignore+warn, search MCP, require executor, configurable per-request) and explains why passthrough with requires_action was chosen. Signed-off-by: Ashwin Giridharan <girida@amazon.com>

ashwing marked this pull request as ready for review June 19, 2026 03:48

ashwing requested review from bbrowning, franciscojavierarceo, jiahuei, leseb, maralbahari, noobHappylife, qandrew and tjtanaa as code owners June 19, 2026 03:48

This was referenced Jun 23, 2026

docs: Codex integration design #68

Merged

test: stateful multi-turn tool-call cassettes with context retention #77

Merged

Merge branch 'main' into docs/tool-framework-design

188eace

Merge branch 'main' into docs/tool-framework-design

7286c49

ashwing self-assigned this Jun 30, 2026

This was referenced Jul 2, 2026

feat: add dispatch_tools + execute_loop (PR B — tool dispatch layer) #83

Open

feat: add tool dispatch layer — ToolContext, traits, and LoopDecision #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: tool framework design — multi-type tool support#67

docs: tool framework design — multi-type tool support#67
ashwing wants to merge 4 commits into
vllm-project:mainfrom
ashwing:docs/tool-framework-design

ashwing commented Jun 19, 2026

Uh oh!

ashwing commented Jun 19, 2026

Uh oh!

maralbahari commented Jun 19, 2026

Uh oh!

ashwing commented Jun 23, 2026 •

edited

Loading

Uh oh!

ashwing commented Jun 30, 2026

Uh oh!

maralbahari commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ashwing commented Jun 19, 2026

Summary

Test Plan

Uh oh!

ashwing commented Jun 19, 2026

Uh oh!

maralbahari commented Jun 19, 2026

Uh oh!

ashwing commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ashwing commented Jun 30, 2026

Uh oh!

maralbahari commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashwing commented Jun 23, 2026 •

edited

Loading