Skip to content

feat: add dispatch_tools + execute_loop (PR B — tool dispatch layer)#83

Open
ashwing wants to merge 9 commits into
vllm-project:mainfrom
ashwing:feat/tool-dispatch-loop
Open

feat: add dispatch_tools + execute_loop (PR B — tool dispatch layer)#83
ashwing wants to merge 9 commits into
vllm-project:mainfrom
ashwing:feat/tool-dispatch-loop

Conversation

@ashwing

@ashwing ashwing commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds the gateway-side agentic tool loop on top of the ToolRegistry and GatewayExecutor traits from PR A (#80). The executor can now run multi-turn inference with gateway-owned tool calls (MCP, web_search, file_search) dispatched between turns.

This is PR B in the tool framework sequence from the design doc (#67):

Changes

executor/dispatch.rs (new, 143 lines)

  • LoopDecision enum — #[non_exhaustive]: Continue(Vec<InputItem>) / Done / Incomplete(String)
  • dispatch_tools() — classifies FunctionToolCall items via ToolRegistry::gateway_owned(), executes gateway-owned calls in parallel with 30s per-call timeout, isolates failures as error-JSON FunctionCallOutput items (never aborts the loop on tool error)
  • Client-owned type: "function" calls return Done for MVP; RequiresAction is deferred per staging recommendation in docs: tool framework design — multi-type tool support #67

executor/agentic_loop.rs (new, 173 lines)

  • execute_loop() — multi-turn orchestrator: clears all three persistence triggers before looping, restores original IDs on the final payload, hard guard of 128 iterations (inclusive 0..=effective_max range so Incomplete fires correctly), rejects stream=true
  • Fixes orphaned FC history: only injects FunctionCall input items whose call_id has a matching FunctionCallOutput — unmatched pairs cause vLLM to reject the conversation history

executor/mod.rs / lib.rs — wired and re-exported

Test Plan

  • cargo test --workspace — 248 tests pass (0 failures)
  • cargo test --workspace -- --include-ignored — 252 tests pass (vLLM cassettes included)
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --check — clean

Test files:

tests/dispatch_test.rs (new, 18 unit tests) — moved from dispatch.rs inline block:

  • All dispatch_tools edge cases: Done/Incomplete/Continue paths, parallel execution, error JSON isolation, call_id preservation, mixed gateway+client batches

executor/agentic_loop.rs (8 unit tests inline):

  • Streaming rejection, persistence trigger clearing, loop guard boundary, constant assertions

tests/dispatch_loop_test.rs (new, 12 integration tests via MockServer):

  • 7 behavioral tests: no-tools baseline, 1+2 parallel gateway tools, max_iterations → incomplete, tool error recovery, prev_id/conv_id restoration
  • 5 gpt-4o cassette tests (required, always run): parallel 2-FC, 3-turn FC-only, tool-output-only Items input, 5-turn, branch
  • 4 vLLM cassette tests (#[ignore] — run with --include-ignored)

Shared test infrastructureMockGatewayExecutor and FailingExecutor added to tests/support/mod.rs for reuse by future integration tests.

Notes

Relation to PR #82 (Maral's MCP design): PR #82 proposes embedding handler inside ToolEntry and adding ToolRegistry::dispatch(). This PR uses a separate HashMap<ToolType, Arc<dyn GatewayExecutor>> passed to dispatch_tools. These two dispatch models need alignment before either the MCP loop PR or PR C can proceed. Raised this directly in PR #82 comments.

MCP discovery: ToolRegistry::build() runs without tools/list discovery; MCP tool names are absent until PR C. Any function_call for an MCP tool name is treated as client-owned (skipped) — documented in execute_loop doc comment.

Persistence: execute_loop never writes to the DB. Caller is responsible for persist_response using the RequestContext from rehydrate_conversation.

RequiresAction / ContinuePartial: Deferred per Maral's feedback in #67. LoopDecision is #[non_exhaustive] so adding these variants later won't break existing match arms.

AI assistance was used in drafting this PR.

ashwing added 4 commits July 2, 2026 15:31
Implements the gateway-side agentic tool loop on top of the ToolRegistry
and GatewayExecutor traits landed in PR A (vllm-project#80):

- executor/dispatch.rs: LoopDecision enum (#[non_exhaustive]) +
  dispatch_tools() — classifies FunctionToolCall items via
  ToolRegistry::gateway_owned(), executes in parallel with 30s
  per-call timeout, maps failures to error-JSON FunctionCallOutput
  items (never aborts the loop on tool error).

- executor/agentic_loop.rs: execute_loop() — multi-turn orchestrator
  that clears all three persistence triggers before looping and
  restores original IDs on the final payload. Rejects stream=true
  (StreamTee is a future PR). Hard guard of 128 iterations, soft cap
  via max_iterations param (default: 10).

Client-owned function tools (ToolType::Function) return Done for now;
RequiresAction and ContinuePartial are deferred per staging agreement
in PR vllm-project#67 — LoopDecision is #[non_exhaustive] to make the addition safe.

MCP tool names are absent from the registry until PR C adds discovery;
any function_call for an MCP tool name is treated as client-owned.

244 tests pass; cargo clippy --workspace --all-targets -- -D warnings clean.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
- Fix loop guard bug: change `0..MAX_LOOP_GUARD` to `0..=effective_max`
  so dispatch_tools is called at iteration == effective_max and returns
  Incomplete rather than silently exiting with status "completed"
- Add #[must_use] to dispatch_tools and execute_loop (api-must-use rule)
- Fix incorrect # Errors doc on dispatch_tools (iteration limit returns
  Ok(Incomplete), not Err)
- Correct persistence contract doc: RequestContext is not returned, so
  describe the caller's responsibility accurately
- Fix mixed gateway+client tool history: only include FunctionCall input
  items whose call_id has a matching tool result, preventing orphaned
  function_call entries that most LLM backends reject
- Remove dead `let _ = original_store` (Concern-4)
- Fix line exceeding 120-char rustfmt limit in dispatch.rs

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
… + execute_loop

Adds 28 new tests across three files:

dispatch.rs (inline, +7 tests):
- zero_max_iterations_returns_incomplete
- mixed_batch_gateway_and_client_owned
- incomplete_message_contains_iteration_counts
- multiple_tool_types_dispatch_to_correct_executor
- error_json_output_is_valid_json
- call_id_preserved_in_output
- output_with_only_non_fc_items_returns_done

agentic_loop.rs (inline, +3 tests):
- streaming_request_returns_err_without_llm_call
- loop_guard_boundary_formula_is_correct
- default_and_guard_constants_are_sane (const assert)

tests/dispatch_loop_test.rs (integration, 10 tests):
- No-tool baseline, one + two parallel gateway tools
- max_iterations → Incomplete with 3 LLM calls verified
- Tool error feeds error JSON to model; loop continues
- prev_id and conv_id restored on final payload
- 3 cassette tests: responses_tool_calls_{3turn,parallel,5turn}
  verify all function-type tools → Done in 1 LLM call, no executor
  invocations

Also adds inline code comments to dispatch.rs and agentic_loop.rs
explaining the clone rationale, inclusive loop bound, orphaned FC
filtering, and payload stub initialisation.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…esolution)

PR vllm-project#75 split engine.rs into inference.rs, persist.rs, rehydrate.rs and
removed client_auth from ExecutionContext::new(). Update executor/mod.rs
re-exports and test helpers accordingly.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
@ashwing ashwing force-pushed the feat/tool-dispatch-loop branch from f66ae62 to fb075f3 Compare July 2, 2026 22:34
ashwing added 5 commits July 2, 2026 15:43
Two new integration tests in dispatch_loop_test.rs:

- test_cassette_openai_parallel_two_fcs_in_output: verifies the
  openai_responses_tool_calls_parallel cassette produces exactly 2
  FunctionCall items (get_job_status + web_search) — the only cassette
  with confirmed parallel FCs. Proves the accumulator handles genuine
  parallel output.

- test_cassette_tool_output_only_items_input_path: exercises the
  ResponsesInput::Items input path. Turn 1 uses Text input, extracts
  the resulting FC call_id, then turn 2 starts with function_call_output
  as Items input. Validates the accumulator handles item-list input
  correctly end-to-end — a path all other cassette tests skip.

Also adapts ExecutionContext::new() calls in unit tests to the 4-arg
signature introduced by PR vllm-project#75 (client_auth removed from constructor).

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Two new cassette tests covering the remaining OpenAI-recorded scenarios:

- test_cassette_openai_3turn_no_reasoning_fc_only: gpt-4o produces no
  reasoning items — FC is at output[0] directly. Confirms the accumulator
  handles FC-only output (no preceding Reasoning item) and terminates in
  1 LLM call with the correct name.

- test_cassette_openai_tool_output_only_items_input: gpt-4o variant of
  the tool-output-only scenario. Two-turn test: t1 Text input -> FC
  (no reasoning), t2 Items(function_call_output) input -> message.
  Validates Items input path against gpt-4o-recorded data, distinct from
  the vLLM variant which has reasoning in t1.

All 6 OpenAI cassettes are now covered (parallel 2-FC, 3turn, tool-output-
only). Streaming cassettes remain untested (blocked by StreamTee PR).

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…red baseline

The 4 vLLM-recorded cassette tests are now skipped by default with
#[ignore]. The 3 gpt-4o cassette tests (openai_parallel_two_fcs,
openai_3turn_no_reasoning, openai_tool_output_only) run on every
`cargo test` as the required baseline.

vLLM cassettes still pass — run with `--include-ignored` for full
coverage (e.g. when validating against a live vLLM deployment).

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
All 5 non-streaming gpt-4o cassettes are now covered:
- openai_parallel    (2 FCs)              ✓ existing
- openai_3turn       (1 FC, no reasoning) ✓ existing
- openai_tool_output (Items input path)   ✓ existing
- openai_5turn       (1 FC, t1 only)      ✓ new
- openai_branch      (1 FC, t1 only)      ✓ new

openai_3turn_streaming is blocked (execute_loop rejects stream=true).
vLLM cassettes remain present but #[ignore]d — run with --include-ignored.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Three changes:

1. Move dispatch.rs inline tests → tests/dispatch_test.rs
   dispatch.rs was 624 lines (77% tests). Production module is now
   144 lines of pure logic. 18 unit tests live in the dedicated file,
   consistent with engine.rs, registry.rs, inference.rs (all zero
   inline tests).

2. Add MockGatewayExecutor + FailingExecutor to tests/support/mod.rs
   Both were defined locally in dispatch_loop_test.rs. Moving them to
   the shared support module makes them available to any future
   integration test without redefinition.

3. Extract assert_single_fc_no_reasoning helper in dispatch_loop_test.rs
   The 3-turn, 5-turn, and branch gpt-4o cassette tests had identical
   assertion bodies (4 assertions each). One helper call replaces ~20
   lines in each test.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
@ashwing ashwing marked this pull request as ready for review July 2, 2026 23:36
@ashwing

ashwing commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

cc @maralbahari @franciscojavierarceo — PR B from the tool framework design (#67). Builds on the types/registry from #80. The main open question is the dispatch model conflict with #82 (raised there already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant