Skip to content

refactor: split executor/engine.rs and eliminate ExecutionContext clone per request#75

Merged
leseb merged 7 commits into
vllm-project:mainfrom
EmbeddedLLM:refactor-stateful-executor
Jul 2, 2026
Merged

refactor: split executor/engine.rs and eliminate ExecutionContext clone per request#75
leseb merged 7 commits into
vllm-project:mainfrom
EmbeddedLLM:refactor-stateful-executor

Conversation

@maralbahari

@maralbahari maralbahari commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Summary

ExecutionContext owns long-lived shared state: DB connection pool, HTTP client, and server config. Cloning it per request means each clone holds independent references to those resources, making it easy for future fields to silently diverge between the shared instance and the per-request copy. This refactor removes client_auth (the only per-request field) from ExecutionContext, making the struct purely shared and immutable for the server lifetime.

  • client_auth: Option<String> removed from ExecutionContext. Supplied per-request via ExecuteRequest::with_auth() instead, threaded as a parameter through run_blocking and run_stream.
  • ExecuteRequest builder added: ExecuteRequest::new(payload, exec_ctx).with_auth(token).run().await. The boundary between shared config and per-request state is now explicit in the type.
  • resolve_exec_ctx in agebtic-server/handlers/common.rs deleted. Replaced by extract_bearer(headers, config_key) and a direct ExecuteRequest construction, zero clones per request. extract_bearer returns the request-level bearer token if present, and falls back to openai_api_key from `AppState
  • execute() kept as a one-liner shim over ExecuteRequest so existing call sites compile without changes.
  • engine.rs split into four focused modules: inference.rs (HTTP transport), rehydrate.rs (history rehydration), persist.rs (response persistence), engine.rs (orchestration only). All public names re-exported from executor/mod.rs, no import paths changed.

Test Plan

  • cargo clippy --all-targets -- -D warnings clean
  • cargo test -- --test-threads=$(nproc): 201 tests pass, 0 failed

maralbahari and others added 3 commits June 23, 2026 14:20
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
Comment thread crates/agentic-server/src/handler.rs Outdated

@ashwing ashwing left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_blocking and run_stream are private which is fine — rehydrate_conversation, call_inference, and persist_response are pub so Praxis filters can still compose steps individually. One thing worth thinking about before tool dispatch lands: ExecuteRequest::run() is currently a straight pipeline. Tool calls need to re-enter inference after execution, so run() will eventually need to loop or accept some kind of decision callback. Might be worth leaving that door open in the shape of ExecuteRequest now rather than having to reshape a stable API later.

Comment thread crates/agentic-core/src/executor/request.rs
Comment thread crates/agentic-server/tests/common/mod.rs
…cutor

Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
@maralbahari

maralbahari commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator Author

run_blocking and run_stream are private which is fine — rehydrate_conversation, call_inference, and persist_response are pub so Praxis filters can still compose steps individually. One thing worth thinking about before tool dispatch lands: ExecuteRequest::run() is currently a straight pipeline. Tool calls need to re-enter inference after execution, so run() will eventually need to loop or accept some kind of decision callback. Might be worth leaving that door open in the shape of ExecuteRequest now rather than having to reshape a stable API later.

@ashwing Thank you for the feedback. I think we should modify APIs as needed when the feature actually lands. Right now we have no tool calls, so any shape we add to ExecuteRequest would be based on guesses about how tool dispatch will work in practice. Those guesses will almost certainly be wrong once we see the real requirements.
Adding a callback or loop shape today means future implementors have to work around our speculation rather than design from real constraints. The "open door" would end up being dead weight or worse, the wrong door entirely.
When tool execution is actually implemented, reshaping run() is cheap. ExecuteRequest is not a public stable API with many external callers. We can make any structural changes the real implementation needs at that point.

@ashwing

ashwing commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Makes sense — no point adding shape we'll likely have to undo. Will revisit when tool dispatch is real.

@maralbahari

Copy link
Copy Markdown
Collaborator Author

@franciscojavierarceo @leseb ready for review.

@leseb leseb merged commit a15e762 into vllm-project:main Jul 2, 2026
3 checks passed
ashwing added a commit to ashwing/agentic-api that referenced this pull request Jul 2, 2026
…esolution)

PR vllm-project#75 split engine.rs into inference.rs, persist.rs, rehydrate.rs and
removed client_auth from ExecutionContext::new(). Update executor/mod.rs
re-exports and test helpers accordingly.

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
ashwing added a commit to ashwing/agentic-api that referenced this pull request Jul 2, 2026
Two new integration tests in dispatch_loop_test.rs:

- test_cassette_openai_parallel_two_fcs_in_output: verifies the
  openai_responses_tool_calls_parallel cassette produces exactly 2
  FunctionCall items (get_job_status + web_search) — the only cassette
  with confirmed parallel FCs. Proves the accumulator handles genuine
  parallel output.

- test_cassette_tool_output_only_items_input_path: exercises the
  ResponsesInput::Items input path. Turn 1 uses Text input, extracts
  the resulting FC call_id, then turn 2 starts with function_call_output
  as Items input. Validates the accumulator handles item-list input
  correctly end-to-end — a path all other cassette tests skip.

Also adapts ExecutionContext::new() calls in unit tests to the 4-arg
signature introduced by PR vllm-project#75 (client_auth removed from constructor).

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants