Skip to content

docs: tool framework design — multi-type tool support#67

Open
ashwing wants to merge 4 commits into
vllm-project:mainfrom
ashwing:docs/tool-framework-design
Open

docs: tool framework design — multi-type tool support#67
ashwing wants to merge 4 commits into
vllm-project:mainfrom
ashwing:docs/tool-framework-design

Conversation

@ashwing

@ashwing ashwing commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

Design proposal for a generic tool framework that handles heterogeneous tool types (function, mcp, web_search, file_search, code_interpreter) through a single pipeline with type-specific handlers.

Key architectural decisions:

  • ResponsesTool becomes a #[serde(tag = "type")] enum (backward-compatible)
  • Request-scoped ToolRegistry routes post-inference function_call items by name lookup
  • ToolHandler trait — new types implement it without touching the executor loop
  • function type is client-owned (requires_action); all other types are gateway-executed
  • LoopDecision::ContinuePartial handles mixed requests (gateway + client tools in one call)

References ADR-01 D7 (MCP as primary tool interface) and ADR-03 D3 (tool registry in agentic-core).

Looking for feedback on:

  1. The function type as client-owned passthrough vs gateway-executed — is this the right split?
  2. ContinuePartial semantics for mixed tool requests
  3. The ToolHandler trait surface area — too much? too little?
  4. PR decomposition and ordering

Test Plan

  • Doc-only change, no code
  • cargo build --workspace && cargo test --workspace passes (no functional changes)

ashwing added 2 commits June 18, 2026 20:38
Introduces a design doc for a generic tool framework that handles the
full lifecycle of heterogeneous tool types (function, mcp, web_search,
file_search, code_interpreter) through a single pipeline with
type-specific handlers.

Key ideas:
- ResponsesTool becomes a tagged enum (backward-compatible serde)
- Request-scoped ToolRegistry routes function_call items by name
- ToolHandler trait allows adding new types without touching the loop
- function type is client-owned (requires_action); all others gateway-executed

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Enumerates five alternatives (reject, ignore+warn, search MCP, require
executor, configurable per-request) and explains why passthrough with
requires_action was chosen.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
@ashwing

ashwing commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo @noobHappylife @maralbahari — would appreciate your eyes on this design proposal.

Key questions I'd love feedback on:

  1. Is function as client-owned passthrough (requires_action) the right split? (see Alternatives section)
  2. Does the ToolHandler trait surface area feel right for extensibility?
  3. Does the ContinuePartial loop decision make sense for mixed tool requests?

This is meant to be the framework that MCP, web_search, file_search all plug into.

@maralbahari

Copy link
Copy Markdown
Collaborator

@ashwing thank you for the design document.
The overall flow sounds correct. user controls the tool choices then agentic-api need to normalize the tools into function_calls the way that the model sees them and then inference results in which function_call to be execution either on agentic-api or the client side.
to determine how to execute the function_call s would depend on how we design. let's say execute them in parallel in case of multiple tools being executed in parallel would need to consider a timeout if any of the job was not completed in given time to drop them.

there is a case we need two function_call executions in parallel let's say one execution would need to take place in agentic-api (gateway) that is bound by timeout and one needs to be executed on client side (referred in this design as function) which requires action by user then we have several options here either user declines and all tools stops or user responses with another prompt and doesnt complete the execution of the current function would this case be handled in continuePartial? or the continuePartial is only when user confirms the function.

there is some complication there which I feel we could implement in small stages without partial complication and leave that last stage. Note that the tool calls are not actually part of MVP. with that being said we do not need to rush to get into this stage we need to plan the stages to reach to full agentic-loop.

I think one starting point to get the loop sequence right we could observe and record a multi-turn conversation cassettes from OpenAI with some tool choices to clarify the map of execution_loop with simple cases. then build from there.
The steps I could think of that would benefit us to implement first to enable us get closer to a wholesome agentic-loop iteration with tool options :

  • record multi-turn conversation with function_calls from OpenAI (cassettes).
  • filtering tool options and tool normalizer
  • implement a simple execution_loop with simple LoopDecision based on the recorded cassettes.
  • mcp executor
  • support of other tools like file_search (currently in progress using OGX)
  • refine execution_loop and LoopDecision with new tools

@ashwing

ashwing commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@maralbahari On ContinuePartial — fair point about the decline/redirect edge case. I'll defer that to a follow-up and start with the clean two-way split: gateway-owned tools loop, client-owned tools halt with requires_action. Mixed case is additive once both paths work.

Cassettes with tool calls are already part of the plan — we'll record multi-turn OpenAI sessions with function + MCP tools as the first PR, same approach as #66.

On ordering — I'd push back on full sequential. PR A (types + registry + normalize) is pure interfaces with no execution logic. Once that lands, PR B (dispatch) and PR C (MCP handler) can develop in parallel — B tests against mock handlers, C tests against a mock MCP server, neither imports the other. They only meet at the integration layer after both land. Serializing them adds wait time for no technical reason — decoupling "how to route" from "how to execute" is the whole point of the trait-based design.

Proposed: cassettes → types/registry/normalize → dispatch + MCP in parallel → integration.

ashwing added a commit to ashwing/agentic-api that referenced this pull request Jun 23, 2026
Record 8 realistic multi-turn scenarios from a "data pipeline debug" story
against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy
weather/stock cassettes with conversations that exercise the tool types
from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents).

Scenarios cover: full investigation (5 turns), investigate-and-restart,
quick triage, parallel compare, deep runbook analysis, web+internal search,
mixed gateway+client tools, and streaming multi-turn.

Also fixes FunctionToolCall deserialization for gpt-oss which emits
status:null — a custom serde deserializer defaults it to "completed".

Includes the Python recording script used to generate the cassettes.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
ashwing added a commit to ashwing/agentic-api that referenced this pull request Jun 24, 2026
Record 8 realistic multi-turn scenarios from a "data pipeline debug" story
against both Qwen3-30B-A3B-FP8 and gpt-oss-20b. These replace the toy
weather/stock cassettes with conversations that exercise the tool types
from PR vllm-project#67 (function, mcp, web_search, code_interpreter equivalents).

Scenarios cover: full investigation (5 turns), investigate-and-restart,
quick triage, parallel compare, deep runbook analysis, web+internal search,
mixed gateway+client tools, and streaming multi-turn.

Also fixes FunctionToolCall.status to use MessageStatus enum instead of a
raw String, with a custom serde deserializer that defaults null (emitted by
gpt-oss) to MessageStatus::Completed.

Cassettes recorded using tests/cassettes/record_cassette.py against live
vLLM instances.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
@ashwing

ashwing commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator Author

@maralbahari @franciscojavierarceo Can we merge this if there are no further comments?

@ashwing ashwing self-assigned this Jun 30, 2026
@maralbahari

Copy link
Copy Markdown
Collaborator

@maralbahari @franciscojavierarceo Can we merge this if there are no further comments?

looks good to me.
@franciscojavierarceo have any comments on this?

ashwing added a commit to ashwing/agentic-api that referenced this pull request Jul 2, 2026
Implements the gateway-side agentic tool loop on top of the ToolRegistry
and GatewayExecutor traits landed in PR A (vllm-project#80):

- executor/dispatch.rs: LoopDecision enum (#[non_exhaustive]) +
  dispatch_tools() — classifies FunctionToolCall items via
  ToolRegistry::gateway_owned(), executes in parallel with 30s
  per-call timeout, maps failures to error-JSON FunctionCallOutput
  items (never aborts the loop on tool error).

- executor/agentic_loop.rs: execute_loop() — multi-turn orchestrator
  that clears all three persistence triggers before looping and
  restores original IDs on the final payload. Rejects stream=true
  (StreamTee is a future PR). Hard guard of 128 iterations, soft cap
  via max_iterations param (default: 10).

Client-owned function tools (ToolType::Function) return Done for now;
RequiresAction and ContinuePartial are deferred per staging agreement
in PR vllm-project#67 — LoopDecision is #[non_exhaustive] to make the addition safe.

MCP tool names are absent from the registry until PR C adds discovery;
any function_call for an MCP tool name is treated as client-owned.

244 tests pass; cargo clippy --workspace --all-targets -- -D warnings clean.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants