docs: tool framework design — multi-type tool support#67
Conversation
Introduces a design doc for a generic tool framework that handles the full lifecycle of heterogeneous tool types (function, mcp, web_search, file_search, code_interpreter) through a single pipeline with type-specific handlers. Key ideas: - ResponsesTool becomes a tagged enum (backward-compatible serde) - Request-scoped ToolRegistry routes function_call items by name - ToolHandler trait allows adding new types without touching the loop - function type is client-owned (requires_action); all others gateway-executed Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Enumerates five alternatives (reject, ignore+warn, search MCP, require executor, configurable per-request) and explains why passthrough with requires_action was chosen. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
|
@franciscojavierarceo @noobHappylife @maralbahari — would appreciate your eyes on this design proposal. Key questions I'd love feedback on:
This is meant to be the framework that MCP, web_search, file_search all plug into. |
|
@ashwing thank you for the design document. there is a case we need two there is some complication there which I feel we could implement in small stages without partial complication and leave that last stage. Note that the tool calls are not actually part of MVP. with that being said we do not need to rush to get into this stage we need to plan the stages to reach to full agentic-loop. I think one starting point to get the loop sequence right we could observe and record a multi-turn conversation cassettes from OpenAI with some tool choices to clarify the map of
|
|
@maralbahari On Cassettes with tool calls are already part of the plan — we'll record multi-turn OpenAI sessions with function + MCP tools as the first PR, same approach as #66. On ordering — I'd push back on full sequential. PR A (types + registry + normalize) is pure interfaces with no execution logic. Once that lands, PR B (dispatch) and PR C (MCP handler) can develop in parallel — B tests against mock handlers, C tests against a mock MCP server, neither imports the other. They only meet at the integration layer after both land. Serializing them adds wait time for no technical reason — decoupling "how to route" from "how to execute" is the whole point of the trait-based design. Proposed: cassettes → types/registry/normalize → dispatch + MCP in parallel → integration. |
Record 8 realistic multi-turn scenarios from a "data pipeline debug" story against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy weather/stock cassettes with conversations that exercise the tool types from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents). Scenarios cover: full investigation (5 turns), investigate-and-restart, quick triage, parallel compare, deep runbook analysis, web+internal search, mixed gateway+client tools, and streaming multi-turn. Also fixes FunctionToolCall deserialization for gpt-oss which emits status:null — a custom serde deserializer defaults it to "completed". Includes the Python recording script used to generate the cassettes. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Record 8 realistic multi-turn scenarios from a "data pipeline debug" story against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy weather/stock cassettes with conversations that exercise the tool types from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents). Scenarios cover: full investigation (5 turns), investigate-and-restart, quick triage, parallel compare, deep runbook analysis, web+internal search, mixed gateway+client tools, and streaming multi-turn. Also fixes FunctionToolCall.status to use MessageStatus enum instead of a raw String, with a custom serde deserializer that defaults null (emitted by gpt-oss) to MessageStatus::Completed. Cassettes recorded using tests/cassettes/record_cassette.py against live vLLM instances. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
|
@maralbahari @franciscojavierarceo Can we merge this if there are no further comments? |
looks good to me. |
Implements the gateway-side agentic tool loop on top of the ToolRegistry and GatewayExecutor traits landed in PR A (vllm-project#80): - executor/dispatch.rs: LoopDecision enum (#[non_exhaustive]) + dispatch_tools() — classifies FunctionToolCall items via ToolRegistry::gateway_owned(), executes in parallel with 30s per-call timeout, maps failures to error-JSON FunctionCallOutput items (never aborts the loop on tool error). - executor/agentic_loop.rs: execute_loop() — multi-turn orchestrator that clears all three persistence triggers before looping and restores original IDs on the final payload. Rejects stream=true (StreamTee is a future PR). Hard guard of 128 iterations, soft cap via max_iterations param (default: 10). Client-owned function tools (ToolType::Function) return Done for now; RequiresAction and ContinuePartial are deferred per staging agreement in PR vllm-project#67 — LoopDecision is #[non_exhaustive] to make the addition safe. MCP tool names are absent from the registry until PR C adds discovery; any function_call for an MCP tool name is treated as client-owned. 244 tests pass; cargo clippy --workspace --all-targets -- -D warnings clean. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Summary
Design proposal for a generic tool framework that handles heterogeneous tool types (
function,mcp,web_search,file_search,code_interpreter) through a single pipeline with type-specific handlers.Key architectural decisions:
ResponsesToolbecomes a#[serde(tag = "type")]enum (backward-compatible)ToolRegistryroutes post-inferencefunction_callitems by name lookupToolHandlertrait — new types implement it without touching the executor loopfunctiontype is client-owned (requires_action); all other types are gateway-executedLoopDecision::ContinuePartialhandles mixed requests (gateway + client tools in one call)References ADR-01 D7 (MCP as primary tool interface) and ADR-03 D3 (tool registry in agentic-core).
Looking for feedback on:
functiontype as client-owned passthrough vs gateway-executed — is this the right split?ContinuePartialsemantics for mixed tool requestsToolHandlertrait surface area — too much? too little?Test Plan
cargo build --workspace && cargo test --workspacepasses (no functional changes)