Skip to content

feat: add Codex-compatible GET v1/models and WebSocket session bug fixes#79

Merged
franciscojavierarceo merged 2 commits into
vllm-project:mainfrom
EmbeddedLLM:support-v1-models-router
Jun 30, 2026
Merged

feat: add Codex-compatible GET v1/models and WebSocket session bug fixes#79
franciscojavierarceo merged 2 commits into
vllm-project:mainfrom
EmbeddedLLM:support-v1-models-router

Conversation

@maralbahari

@maralbahari maralbahari commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

handler.rs had grown to 630 lines mixing WebSocket logic, HTTP handlers, proxy utilities, and model transformation code in a single file. This PR splits it into a focused module tree and adds a Codex CLI-compatible /v1/models endpoint.

GET /v1/models: Codex CLI compatibility

  • Added proxy_get(path, headers, state) to agentic-core/proxy.rs: a GET variant of proxy_request that applies the same header filtering and auth injection.
  • Codex CLI is detected via ?client_version=<ver> query param (typed ModelsParams extractor).
  • Non-Codex clients: upstream vLLM response streamed back unchanged via proxy_get.
  • Codex clients: vLLM { "object": "list", "data": [...] } transformed to { "models": [...] } with the full ModelInfo shape Codex expects (slug, context_window, auto_review_model_override, apply_patch_tool_type, capabilities, etc.).
  • Static ModelInfo fields built once via OnceLock and cloned per model; only the five per-model fields are patched at response time.

WebSocket session bug fixes

Identified and fixed two bugs in websocket/responses.rs that prevented Codex CLI from persisting history to the database:

  • Pipelined requests caused connection resets: Codex sends the next response.create on the same WebSocket connection while the current stream is still active. The old handler returned a ConcurrentMessage error and closed the connection, forcing Codex to reconnect on every turn. Fixed by introducing a VecDeque queue: incoming requests that arrive mid-stream are enqueued and processed in order after the current stream completes. The ConcurrentMessage error variant was removed as it is no longer reachable.

  • store: false bypassed the database: Codex CLI explicitly sends "store": false in every request body, which caused the gateway to skip persistence entirely and proxy straight to vLLM. Fixed by forcing payload.store = true on the WebSocket path; the gateway is the stateful layer and should always persist regardless of what the client sends.

Together these fixes ensure every completed Codex turn is written to the DB with its full SSE history, and that multi-turn conversations chain correctly via previous_response_id.

Handler reorganization

  • handler.rs deleted. Replaced by handler/ module tree:
    • common.rs: shared utilities used across handlers: convert_response, executor_error_response, read_bytes, resolve_exec_ctx, sse_response
    • http/conversations.rs: POST /v1/conversations
    • http/models.rs: GET /health, GET /ready, GET /v1/models + Codex model transform logic
    • http/responses.rs: POST /v1/responses (proxy and stateful paths)
    • websocket/responses.rs: WebSocket /v1/responses handler and streaming loop
    • websocket/error.rs: WsError enum with status/code/frame helpers
  • handler/mod.rs re-exports all public handlers; no import paths in app.rs changed.

Codex CLI setup

Create $CODEX_HOME/config.toml (e.g. ~/.codex/config.toml) pointing at agentic-api:

model_provider = "agentic-api"

[model_providers.agentic-api]
name = "agentic-api"
base_url = "http://localhost:9000/v1"
wire_api = "responses"
requires_openai_auth = false
supports_websockets = true

Run with:

codex --disable image_generation -c model_provider=agentic-api -m "model_name"

Test Plan

  • cargo clippy --all-targets -- -D warnings clean
  • cargo test: all tests pass, 0 failed

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: maral <maralbahari.98@gmail.com>
…t requests instead of closing connection

Signed-off-by: maral <maralbahari.98@gmail.com>
@maralbahari maralbahari changed the title feat: add Codex-compatible GET v1/models and split handler into modules feat: add Codex-compatible GET v1/models and WebSocket session bug fixes Jun 30, 2026
@franciscojavierarceo franciscojavierarceo merged commit f9e2fc5 into vllm-project:main Jun 30, 2026
3 checks passed
ashwing added a commit to ashwing/agentic-api that referenced this pull request Jun 30, 2026
Rebased on main after vllm-project#79 and vllm-project#81 merged.

Adds src/tool/ — the behavioral layer that complements the wire types
already in types/tools/ (merged via PR vllm-project#79):

- tool/handler.rs  — ToolHandler trait, ToolOutput, ToolError
- tool/registry.rs — ToolType, ToolEntry, ToolRegistry::build/lookup/etc
- tool/function.rs — FunctionHandler + From<&FunctionToolParam> for FunctionTool
- tool/normalize.rs — ResponsesTool::to_function_tool(), From<ToolOutput>

Also adds types/tools/ wire types (params.rs with ResponsesTool enum,
param structs, NonEmptyToolName), EmptyToolNameError, and wires normalize
into RequestPayload::to_upstream_request() so vLLM always receives
Vec<FunctionTool>.

12 cassette-based tests in tool_normalization_test.rs validate the full
pipeline against real multi-turn tool-call cassettes.

Addresses all PR vllm-project#80 review feedback:
- types/ → wire shapes only; tool/ → behaviors
- From<&FunctionToolParam> for FunctionTool (typed conversion)
- MCP registry entries deferred to PR C (discovery not yet wired)
- EmptyToolNameError in types/ (no cross-layer import)
- ToolOutput derives Debug + Clone

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants