Skip to content

feat: codex integration#84

Draft
haoshan98 wants to merge 8 commits into
vllm-project:mainfrom
EmbeddedLLM:codex-integration
Draft

feat: codex integration#84
haoshan98 wants to merge 8 commits into
vllm-project:mainfrom
EmbeddedLLM:codex-integration

Conversation

@haoshan98

@haoshan98 haoshan98 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Codex CLI Tool Call Responses Compatibility

Summary

This PR adds Codex CLI compatibility to agentic-api as a stateful Responses API gateway in front of an
OpenAI-compatible vLLM endpoint.

The primary runtime path is:

Codex CLI -> agentic-api /v1 -> vLLM /v1 -> served model

A typical local validation path is:

Codex CLI -> agentic-api http://127.0.0.1:3018/v1 -> vLLM http://<vllm-host>:<port> -> served model

The gateway keeps Codex-facing request and response shapes stable while adapting tool declarations and tool calls to
the upstream model server's stricter function-tool surface. This lets Codex use agentic-api with both HTTP/SSE and
WebSocket Responses transports, while preserving stateful continuation through previous_response_id and stored
conversation history.

What Changed

Codex Responses Gateway

  • Added/merged Codex-compatible /v1/responses behavior for the latest main HTTP and WebSocket server paths.
  • Preserved latest main's more general tool framework and routed Codex-specific namespace flattening through that
    framework instead of bypassing it.
  • Kept HTTP store=false requests without continuation IDs on the proxy path, with compatibility normalization for model
    aliases, instructions, namespace tools, tool choice, and returned tool calls.
  • Kept HTTP store=true and WebSocket sessions on the stateful executor path with response storage, continuation, and
    hydration.
  • Added optional model alias rewriting, for example:
codex-compatible=Qwen/Qwen3.6-35B-A3B

Tool Shape Support

The typed Responses request path now accepts the Codex tool shapes emitted by Codex CLI:

  • function
  • namespace
  • tool_search
  • custom
  • unknown/raw tool objects

Known tool types preserve extra fields, and unknown tool types remain raw JSON rather than being discarded. On the typed
stateful executor path, namespace members are flattened into upstream-safe function tools for vLLM. Non-function Codex
declarations remain accepted and preserved in metadata for continuations.

Namespace Flattening And Restoration

Codex namespace tools, such as MCP tools, are represented as grouped namespace/member declarations:

{
  "type": "namespace",
  "name": "mcp__agentic_fixture",
  "tools": [
    { "type": "function", "name": "add_numbers" }
  ]
}

To avoid upstreams that reserve or reject mcp__ function names, namespace members are flattened using an
upstream-visible name with an agentic-specific prefix:

agentic_ns__mcp__agentic_fixture__add_numbers

When the upstream model returns that flat tool call, agentic-api restores the Codex-facing shape:

{
  "type": "function_call",
  "namespace": "mcp__agentic_fixture",
  "name": "add_numbers"
}

The normalization layer also accepts observed variants where safe:

  • legacy dotted flat names such as mcp__agentic_fixture.add_numbers
  • underscore aliases such as mcp__agentic_fixture_add_numbers
  • unambiguous bare member names
  • namespace container calls when the intended member is unambiguous

Collision checks prevent unsafe flattening when a namespaced member would conflict with a top-level function name.

Tool Choice Preservation

tool_choice now preserves optional namespace information. Explicit namespaced choices are flattened only for upstream
requests and restored for Codex-facing response/storage paths.

Stateful Continuation

The executor stores the effective request metadata needed for continuations:

  • tools
  • tool choice
  • instructions
  • previous response linkage
  • conversation linkage

Later requests using previous_response_id or conversation_id hydrate the prior context and reuse the effective
Codex-compatible tool metadata unless the client explicitly overrides it.

Storage Rehydration Cleanup

Stored input/output item markers are used internally to avoid ambiguity, but raw rehydrated items strip internal markers
such as _agentic_item_kind before returning data to the client or forwarding hydrated history upstream.

Model Listing And Readiness

  • Supports the latest main model-list route by adapting /v1/models behavior for Codex clients while preserving
    upstream proxy behavior where applicable.
  • Keeps skip_llm_ready_check support for the startup LLM readiness check used by hosted/OpenAI-compatible vLLM
    endpoints where /health may not exist. The /ready route still keeps its existing upstream health probe semantics.

WebSocket Support

Codex can now be configured with:

[model_providers.agentic-local]
name = "agentic-api local"
base_url = "http://127.0.0.1:3018/v1"
wire_api = "responses"
supports_websockets = true

With supports_websockets = true, Codex uses the Responses WebSocket path when available. The gateway preserves the same
tool normalization, model aliasing, and stateful continuation semantics across HTTP/SSE and WebSocket transports.

Running A Codex Session

The PR includes scripts/codex-start-gateway.sh to start agentic-server against a configured vLLM endpoint and register
the Codex-facing model alias.

Start the gateway:

GATEWAY_PORT=3018 \
DATABASE_URL="sqlite:///tmp/agentic_api_codex_3018.db" \
V_API_BASE="http://<vllm-host>:<port>" \
V_API_KEY="" \
V_MODEL="Qwen/Qwen3.6-35B-A3B" \
./scripts/codex-start-gateway.sh

From the repo root, run a Codex HTTP/SSE session:

codex \
  --disable image_generation \
  -C "$PWD" \
  -m codex-compatible \
  -c model_reasoning_effort=low \
  -c model_provider=agentic-local \
  -c 'model_providers.agentic-local.name="agentic-api local"' \
  -c 'model_providers.agentic-local.base_url="http://127.0.0.1:3018/v1"' \
  -c 'model_providers.agentic-local.wire_api="responses"'

From the repo root, run a Codex WebSocket session:

codex \
  --disable image_generation \
  -C "$PWD" \
  -m codex-compatible \
  -c model_reasoning_effort=low \
  -c model_provider=agentic-local \
  -c 'model_providers.agentic-local.name="agentic-api local"' \
  -c 'model_providers.agentic-local.base_url="http://127.0.0.1:3018/v1"' \
  -c 'model_providers.agentic-local.wire_api="responses"' \
  -c model_providers.agentic-local.supports_websockets=true

Cassette Recording

This PR adds Codex cassette recording support under:

crates/agentic-core/tests/cassettes/

The main recorder wrapper is:

./crates/agentic-core/tests/cassettes/record_codex_cli_tool_call_cassettes.sh

It records YAML replay cassettes for the matrix covered by tests:

  • gateway HTTP/SSE function tool
  • gateway HTTP/SSE namespace tool
  • gateway WebSocket function tool
  • gateway WebSocket namespace tool
  • direct vLLM HTTP/SSE function tool
  • direct vLLM HTTP/SSE flattened namespace tool
  • OpenAI HTTPS/SSE function tool baseline
  • OpenAI HTTPS/SSE namespace tool baseline
  • OpenAI WebSocket function tool baseline
  • OpenAI WebSocket namespace tool baseline

The all target now regenerates every cassette used by the Codex cassette tests. Direct vLLM recording requires an
explicit VLLM_URL or V_API_BASE instead of defaulting to a private LAN endpoint.

Example:

OPENAI_API_KEY="$OPENAI_API_KEY" \
VLLM_URL="http://<vllm-host>:<port>" \
V_MODEL="Qwen/Qwen3.6-35B-A3B" \
GATEWAY_URL="http://127.0.0.1:3018" \
GATEWAY_CASSETTE_MODEL="Qwen/Qwen3.6-35B-A3B" \
./crates/agentic-core/tests/cassettes/record_codex_cli_tool_call_cassettes.sh all

Cassette Replay Tests

Replay and normalization tests were added for the recorded Codex cassettes:

  • crates/agentic-core/tests/accumulator_cassette_test.rs
    • verifies gateway HTTP and WebSocket function/namespace responses
    • verifies direct vLLM function and flat namespace upstream behavior
    • verifies OpenAI HTTP and WebSocket function/namespace baselines
    • verifies second-turn previous_response_id plus function_call_output continuation shape
  • crates/agentic-core/tests/tool_normalization_test.rs
    • parses all recorded Codex request payloads as typed RequestPayload
    • verifies WebSocket request bodies do not carry the HTTP stream field
    • verifies namespace tools flatten to agentic_ns__mcp__agentic_fixture__add_numbers
    • verifies direct vLLM flat namespace cassettes remain plain function tools

These tests make the Codex integration behavior replayable without requiring live OpenAI, vLLM, or Codex CLI access in
normal CI-style test runs.

Scope Boundaries

This PR does not implement:

  • gateway-side MCP execution
  • a full Codex runtime
  • hosted tools such as file search, web search, or code interpreter
  • a complete automatic tool-dispatch loop inside agentic-api
  • chat completions compatibility

Codex still owns MCP tool discovery and execution. agentic-api preserves, normalizes, forwards, stores, and restores
Responses API data so Codex can continue its loop correctly.

Test Plan

Formatting and diff checks:

cargo fmt -- --check
git diff --check
git diff --staged --check

Core Rust tests:

cargo test -p agentic-core
cargo test -p agentic-core --test accumulator_cassette_test --test tool_normalization_test

Additional checks run during review:

cargo clippy -p agentic-core --tests -- -D warnings
bash -n crates/agentic-core/tests/cassettes/record_codex_cli_tool_call_cassettes.sh
bash -n scripts/codex-start-gateway.sh
python3 -m py_compile crates/agentic-core/tests/cassettes/record_cassette.py

Cassette regeneration was run successfully with:

./crates/agentic-core/tests/cassettes/record_codex_cli_tool_call_cassettes.sh all

and cargo test passed afterward.

Manual live validation during development covered:

  • HTTP/SSE text path through agentic-api
  • WebSocket path through agentic-api with supports_websockets = true
  • MCP namespace tool round trip for mcp__agentic_fixture.add_numbers
  • direct vLLM baseline cassettes for upstream-visible function shapes
  • OpenAI baseline cassettes for original Responses API function and namespace behavior

Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
@haoshan98 haoshan98 changed the title Codex integration feat: codex integration Jul 2, 2026
Signed-off-by: haoshan98 <haoshanw@gmail.com>
@haoshan98 haoshan98 marked this pull request as draft July 2, 2026 11:05
@haoshan98 haoshan98 marked this pull request as ready for review July 2, 2026 12:50
@haoshan98 haoshan98 marked this pull request as draft July 2, 2026 13:20
@franciscojavierarceo

Copy link
Copy Markdown
Collaborator

awesome @haoshan98 let me know when this is ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants