feat: add dispatch_tools + execute_loop (PR B — tool dispatch layer)#83
Open
ashwing wants to merge 9 commits into
Open
feat: add dispatch_tools + execute_loop (PR B — tool dispatch layer)#83ashwing wants to merge 9 commits into
ashwing wants to merge 9 commits into
Conversation
Implements the gateway-side agentic tool loop on top of the ToolRegistry and GatewayExecutor traits landed in PR A (vllm-project#80): - executor/dispatch.rs: LoopDecision enum (#[non_exhaustive]) + dispatch_tools() — classifies FunctionToolCall items via ToolRegistry::gateway_owned(), executes in parallel with 30s per-call timeout, maps failures to error-JSON FunctionCallOutput items (never aborts the loop on tool error). - executor/agentic_loop.rs: execute_loop() — multi-turn orchestrator that clears all three persistence triggers before looping and restores original IDs on the final payload. Rejects stream=true (StreamTee is a future PR). Hard guard of 128 iterations, soft cap via max_iterations param (default: 10). Client-owned function tools (ToolType::Function) return Done for now; RequiresAction and ContinuePartial are deferred per staging agreement in PR vllm-project#67 — LoopDecision is #[non_exhaustive] to make the addition safe. MCP tool names are absent from the registry until PR C adds discovery; any function_call for an MCP tool name is treated as client-owned. 244 tests pass; cargo clippy --workspace --all-targets -- -D warnings clean. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
- Fix loop guard bug: change `0..MAX_LOOP_GUARD` to `0..=effective_max` so dispatch_tools is called at iteration == effective_max and returns Incomplete rather than silently exiting with status "completed" - Add #[must_use] to dispatch_tools and execute_loop (api-must-use rule) - Fix incorrect # Errors doc on dispatch_tools (iteration limit returns Ok(Incomplete), not Err) - Correct persistence contract doc: RequestContext is not returned, so describe the caller's responsibility accurately - Fix mixed gateway+client tool history: only include FunctionCall input items whose call_id has a matching tool result, preventing orphaned function_call entries that most LLM backends reject - Remove dead `let _ = original_store` (Concern-4) - Fix line exceeding 120-char rustfmt limit in dispatch.rs Signed-off-by: Ashwin Giridharan <girida@amazon.com>
… + execute_loop
Adds 28 new tests across three files:
dispatch.rs (inline, +7 tests):
- zero_max_iterations_returns_incomplete
- mixed_batch_gateway_and_client_owned
- incomplete_message_contains_iteration_counts
- multiple_tool_types_dispatch_to_correct_executor
- error_json_output_is_valid_json
- call_id_preserved_in_output
- output_with_only_non_fc_items_returns_done
agentic_loop.rs (inline, +3 tests):
- streaming_request_returns_err_without_llm_call
- loop_guard_boundary_formula_is_correct
- default_and_guard_constants_are_sane (const assert)
tests/dispatch_loop_test.rs (integration, 10 tests):
- No-tool baseline, one + two parallel gateway tools
- max_iterations → Incomplete with 3 LLM calls verified
- Tool error feeds error JSON to model; loop continues
- prev_id and conv_id restored on final payload
- 3 cassette tests: responses_tool_calls_{3turn,parallel,5turn}
verify all function-type tools → Done in 1 LLM call, no executor
invocations
Also adds inline code comments to dispatch.rs and agentic_loop.rs
explaining the clone rationale, inclusive loop bound, orphaned FC
filtering, and payload stub initialisation.
Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…esolution) PR vllm-project#75 split engine.rs into inference.rs, persist.rs, rehydrate.rs and removed client_auth from ExecutionContext::new(). Update executor/mod.rs re-exports and test helpers accordingly. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
f66ae62 to
fb075f3
Compare
Two new integration tests in dispatch_loop_test.rs: - test_cassette_openai_parallel_two_fcs_in_output: verifies the openai_responses_tool_calls_parallel cassette produces exactly 2 FunctionCall items (get_job_status + web_search) — the only cassette with confirmed parallel FCs. Proves the accumulator handles genuine parallel output. - test_cassette_tool_output_only_items_input_path: exercises the ResponsesInput::Items input path. Turn 1 uses Text input, extracts the resulting FC call_id, then turn 2 starts with function_call_output as Items input. Validates the accumulator handles item-list input correctly end-to-end — a path all other cassette tests skip. Also adapts ExecutionContext::new() calls in unit tests to the 4-arg signature introduced by PR vllm-project#75 (client_auth removed from constructor). Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Two new cassette tests covering the remaining OpenAI-recorded scenarios: - test_cassette_openai_3turn_no_reasoning_fc_only: gpt-4o produces no reasoning items — FC is at output[0] directly. Confirms the accumulator handles FC-only output (no preceding Reasoning item) and terminates in 1 LLM call with the correct name. - test_cassette_openai_tool_output_only_items_input: gpt-4o variant of the tool-output-only scenario. Two-turn test: t1 Text input -> FC (no reasoning), t2 Items(function_call_output) input -> message. Validates Items input path against gpt-4o-recorded data, distinct from the vLLM variant which has reasoning in t1. All 6 OpenAI cassettes are now covered (parallel 2-FC, 3turn, tool-output- only). Streaming cassettes remain untested (blocked by StreamTee PR). Signed-off-by: Ashwin Giridharan <girida@amazon.com>
…red baseline The 4 vLLM-recorded cassette tests are now skipped by default with #[ignore]. The 3 gpt-4o cassette tests (openai_parallel_two_fcs, openai_3turn_no_reasoning, openai_tool_output_only) run on every `cargo test` as the required baseline. vLLM cassettes still pass — run with `--include-ignored` for full coverage (e.g. when validating against a live vLLM deployment). Signed-off-by: Ashwin Giridharan <girida@amazon.com>
All 5 non-streaming gpt-4o cassettes are now covered: - openai_parallel (2 FCs) ✓ existing - openai_3turn (1 FC, no reasoning) ✓ existing - openai_tool_output (Items input path) ✓ existing - openai_5turn (1 FC, t1 only) ✓ new - openai_branch (1 FC, t1 only) ✓ new openai_3turn_streaming is blocked (execute_loop rejects stream=true). vLLM cassettes remain present but #[ignore]d — run with --include-ignored. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Three changes: 1. Move dispatch.rs inline tests → tests/dispatch_test.rs dispatch.rs was 624 lines (77% tests). Production module is now 144 lines of pure logic. 18 unit tests live in the dedicated file, consistent with engine.rs, registry.rs, inference.rs (all zero inline tests). 2. Add MockGatewayExecutor + FailingExecutor to tests/support/mod.rs Both were defined locally in dispatch_loop_test.rs. Moving them to the shared support module makes them available to any future integration test without redefinition. 3. Extract assert_single_fc_no_reasoning helper in dispatch_loop_test.rs The 3-turn, 5-turn, and branch gpt-4o cassette tests had identical assertion bodies (4 assertions each). One helper call replaces ~20 lines in each test. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Collaborator
Author
|
cc @maralbahari @franciscojavierarceo — PR B from the tool framework design (#67). Builds on the types/registry from #80. The main open question is the dispatch model conflict with #82 (raised there already). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the gateway-side agentic tool loop on top of the
ToolRegistryandGatewayExecutortraits from PR A (#80). The executor can now run multi-turn inference with gateway-owned tool calls (MCP, web_search, file_search) dispatched between turns.This is PR B in the tool framework sequence from the design doc (#67):
ToolType,ResponsesTool,ToolRegistry,ToolHandler,GatewayExecutordispatch_tools,LoopDecision,execute_loopHttpMcpHandler: first realGatewayExecutorimpl + MCP discoveryChanges
executor/dispatch.rs(new, 143 lines)LoopDecisionenum —#[non_exhaustive]:Continue(Vec<InputItem>)/Done/Incomplete(String)dispatch_tools()— classifiesFunctionToolCallitems viaToolRegistry::gateway_owned(), executes gateway-owned calls in parallel with 30s per-call timeout, isolates failures as error-JSONFunctionCallOutputitems (never aborts the loop on tool error)type: "function"calls returnDonefor MVP;RequiresActionis deferred per staging recommendation in docs: tool framework design — multi-type tool support #67executor/agentic_loop.rs(new, 173 lines)execute_loop()— multi-turn orchestrator: clears all three persistence triggers before looping, restores original IDs on the final payload, hard guard of 128 iterations (inclusive0..=effective_maxrange soIncompletefires correctly), rejectsstream=trueFunctionCallinput items whosecall_idhas a matchingFunctionCallOutput— unmatched pairs cause vLLM to reject the conversation historyexecutor/mod.rs/lib.rs— wired and re-exportedTest Plan
cargo test --workspace— 248 tests pass (0 failures)cargo test --workspace -- --include-ignored— 252 tests pass (vLLM cassettes included)cargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --check— cleanTest files:
tests/dispatch_test.rs(new, 18 unit tests) — moved fromdispatch.rsinline block:dispatch_toolsedge cases: Done/Incomplete/Continue paths, parallel execution, error JSON isolation, call_id preservation, mixed gateway+client batchesexecutor/agentic_loop.rs(8 unit tests inline):tests/dispatch_loop_test.rs(new, 12 integration tests viaMockServer):#[ignore]— run with--include-ignored)Shared test infrastructure —
MockGatewayExecutorandFailingExecutoradded totests/support/mod.rsfor reuse by future integration tests.Notes
Relation to PR #82 (Maral's MCP design): PR #82 proposes embedding
handlerinsideToolEntryand addingToolRegistry::dispatch(). This PR uses a separateHashMap<ToolType, Arc<dyn GatewayExecutor>>passed todispatch_tools. These two dispatch models need alignment before either the MCP loop PR or PR C can proceed. Raised this directly in PR #82 comments.MCP discovery:
ToolRegistry::build()runs withouttools/listdiscovery; MCP tool names are absent until PR C. Anyfunction_callfor an MCP tool name is treated as client-owned (skipped) — documented inexecute_loopdoc comment.Persistence:
execute_loopnever writes to the DB. Caller is responsible forpersist_responseusing theRequestContextfromrehydrate_conversation.RequiresAction/ContinuePartial: Deferred per Maral's feedback in #67.LoopDecisionis#[non_exhaustive]so adding these variants later won't break existing match arms.AI assistance was used in drafting this PR.