refactor: split executor/engine.rs and eliminate ExecutionContext clone per request#75
Conversation
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
ashwing
left a comment
There was a problem hiding this comment.
run_blocking and run_stream are private which is fine — rehydrate_conversation, call_inference, and persist_response are pub so Praxis filters can still compose steps individually. One thing worth thinking about before tool dispatch lands: ExecuteRequest::run() is currently a straight pipeline. Tool calls need to re-enter inference after execution, so run() will eventually need to loop or accept some kind of decision callback. Might be worth leaving that door open in the shape of ExecuteRequest now rather than having to reshape a stable API later.
…cutor Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
@ashwing Thank you for the feedback. I think we should modify APIs as needed when the feature actually lands. Right now we have no tool calls, so any shape we add to |
|
Makes sense — no point adding shape we'll likely have to undo. Will revisit when tool dispatch is real. |
…cutor Signed-off-by: maral <maralbahari.98@gmail.com>
|
@franciscojavierarceo @leseb ready for review. |
…esolution) PR vllm-project#75 split engine.rs into inference.rs, persist.rs, rehydrate.rs and removed client_auth from ExecutionContext::new(). Update executor/mod.rs re-exports and test helpers accordingly. Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Two new integration tests in dispatch_loop_test.rs: - test_cassette_openai_parallel_two_fcs_in_output: verifies the openai_responses_tool_calls_parallel cassette produces exactly 2 FunctionCall items (get_job_status + web_search) — the only cassette with confirmed parallel FCs. Proves the accumulator handles genuine parallel output. - test_cassette_tool_output_only_items_input_path: exercises the ResponsesInput::Items input path. Turn 1 uses Text input, extracts the resulting FC call_id, then turn 2 starts with function_call_output as Items input. Validates the accumulator handles item-list input correctly end-to-end — a path all other cassette tests skip. Also adapts ExecutionContext::new() calls in unit tests to the 4-arg signature introduced by PR vllm-project#75 (client_auth removed from constructor). Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Summary
ExecutionContextowns long-lived shared state: DB connection pool, HTTP client, and server config. Cloning it per request means each clone holds independent references to those resources, making it easy for future fields to silently diverge between the shared instance and the per-request copy. This refactor removesclient_auth(the only per-request field) fromExecutionContext, making the struct purely shared and immutable for the server lifetime.client_auth: Option<String>removed fromExecutionContext. Supplied per-request viaExecuteRequest::with_auth()instead, threaded as a parameter throughrun_blockingandrun_stream.ExecuteRequestbuilder added:ExecuteRequest::new(payload, exec_ctx).with_auth(token).run().await. The boundary between shared config and per-request state is now explicit in the type.resolve_exec_ctxinagebtic-server/handlers/common.rsdeleted. Replaced byextract_bearer(headers, config_key)and a directExecuteRequestconstruction, zero clones per request.extract_bearerreturns the request-level bearer token if present, and falls back toopenai_api_keyfrom `AppStateexecute()kept as a one-liner shim overExecuteRequestso existing call sites compile without changes.engine.rssplit into four focused modules:inference.rs(HTTP transport),rehydrate.rs(history rehydration),persist.rs(response persistence),engine.rs(orchestration only). All public names re-exported fromexecutor/mod.rs, no import paths changed.Test Plan
cargo clippy --all-targets -- -D warningscleancargo test -- --test-threads=$(nproc): 201 tests pass, 0 failed