Skip to content

docs: Codex integration design#68

Merged
maralbahari merged 3 commits into
vllm-project:mainfrom
EmbeddedLLM:designdocs-codex-integration
Jun 29, 2026
Merged

docs: Codex integration design#68
maralbahari merged 3 commits into
vllm-project:mainfrom
EmbeddedLLM:designdocs-codex-integration

Conversation

@haoshan98

Copy link
Copy Markdown
Contributor

No description provided.

@ashwing ashwing left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Codex compatibility concepts here — tool registry, client vs gateway ownership, normalize for upstream, loop decision with client action — overlap significantly with #67 which proposes a generic tool framework covering all tool types. The registry, normalize pipeline, and RequiresAction loop variant are the same infrastructure. Worth coordinating so we don't end up with two parallel type systems solving the same routing problem.

Separately, this reads better as a standalone docs/design/codex-integration.md. The core-public-api.md doc is specifically about the executor loop phases and its ownership is already defined in the header.

Comment thread docs/design/core-public-api.md Outdated

```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(untagged)]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untagged with a Known/Unknown wrapper is fragile — serde tries variants in declaration order and takes the first successful parse. Debugging deserialization mismatches becomes painful. #[serde(tag = "type")] with a catch-all variant is more explicit and gives clearer errors when a new type appears. See #67 for that approach.

Comment thread docs/design/core-public-api.md Outdated
pub defer_loading: bool,
}

pub struct CodexNamespaceTool {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

namespace, tool_search, and custom are Codex CLI internals — they are not part of the OpenAI Responses API spec. Baking them as first-class types in agentic-core couples us to one client's implementation details. Alternative: accept via a catch-all variant, normalize to function tools for inference, and preserve the raw shape in the registry for passthrough on response. That way we don't break if Codex changes their internal wire format.

Comment thread docs/design/core-public-api.md Outdated
The current `ResponsesTool = FunctionTool` alias is too narrow for Codex. Replace it with a tagged tool enum that
preserves unknown shapes while giving Codex-used variants first-class names.

Do not implement this as `#[serde(tag = "type")]` wrapping the existing `FunctionTool` struct directly, because

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue described here — "FunctionTool already stores the wire type field" — is a fixable implementation detail, not a reason to avoid tagged enums. The fix is to remove the redundant type_: String from the payload struct (serde consumes the tag during deserialization, so it shouldn't also be a field). #67 takes this approach and it works cleanly with #[serde(tag = "type")].

Comment thread docs/design/core-public-api.md Outdated
pub registry: ToolRegistry,
}

pub struct ToolName {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespace splitting adds complexity the gateway likely doesn't need. vLLM emits flat function names — the model doesn't know about namespaces. If Codex organizes tools into namespaces client-side, the gateway can preserve the original declaration verbatim in the registry and return it on requires_action without parsing or reconstructing namespace structure. Simpler and less coupled to Codex's naming convention.

Comment thread docs/design/core-public-api.md Outdated
```rust
pub enum LoopDecision {
Continue(Vec<InputItem>),
RequiresClientAction(Vec<OutputItem>),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vec<OutputItem> is too broad here. The only items that require client action are function calls — not messages, not reasoning. Using Vec<FunctionToolCall> makes the type system enforce this constraint rather than relying on callers to only put the right variants in.

Comment thread docs/design/core-public-api.md Outdated
pub parameters: Option<Value>,
pub strict: Option<bool>,
#[serde(default)]
pub defer_loading: bool,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer_loading appears as a named field but there's no description of what the gateway does with it. If it's a client hint the gateway ignores, it should live in the catch-all raw Value rather than as a typed field — otherwise it implies the gateway acts on it.

Comment thread docs/design/core-public-api.md Outdated
configuration:

```toml
[model_aliases]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model aliases are practical, but the issue #54 context suggests this also changes approval behavior (auto-review = skip human confirmation). Conflating model routing with approval semantics in a single alias is surprising — a request to codex-auto-review silently changes safety behavior. If both are needed, they should be separate config knobs.

@maralbahari

maralbahari commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

The Codex compatibility concepts here — tool registry, client vs gateway ownership, normalize for upstream, loop decision with client action — overlap significantly with #67 which proposes a generic tool framework covering all tool types. The registry, normalize pipeline, and RequiresAction loop variant are the same infrastructure. Worth coordinating so we don't end up with two parallel type systems solving the same routing problem.

Separately, this reads better as a standalone docs/design/codex-integration.md. The core-public-api.md doc is specifically about the executor loop phases and its ownership is already defined in the header.

@haoshan98 it would be good to have separate docs/design/codex-integration.md for codex integration.

@ashwing note that the codex integration is from tool angle using codex for tool calls. regarding the overlap what is shared between any tools are execution in the execution_loop. however at this stage for tool implementation we can have a generic interface Tool where each would expose common trait like normalize and execute this way each contributors/developer they can implement their tools without interfering and overlapping then after all components are in we add them into execution_loop.
for instance currently we are handling the stream events by categorizing the items into reasoning, function_call and message since they're being added into a buffer we can use the interface I just mentioned above to identify the tool type then call tool.execute().
and we would need tool.normalize() after receiving the RequestPayload to get user selected tools filter the tool options normalize before call_inference().
while the execution_loop and LoopDecision involves orchestrating the all tools on flight and live streaming. while currently we are buffering the inflight item responses for us to add support each tool types at a time and verify their execution and correctness in execution() (in agentic-core/executor/engine.rs) and ResponsesAcumulator.process_event() one all that done then we implement the whole live orchestration with execution_loop and LoopDecision.
codex as tool call would be mostly use to bridge the gap of agentic-api tool execution and we do not own it is considered as custom tool. since we do not have any support of any function_call execution for fastest prototype we use use codex to demo agentic-api with tool call until we start implementing agentic-api built-in tools for file_search, code_interpreter etc.

@haoshan98 @ashwing let me know your thoughts and feedback on this approach. then @haoshan98 in a separate docs/design/codex-integration.md would write a more specific implementation that would be implement by her for CustomCodexTool and what components would be shared with design in #67

@ashwing

ashwing commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

@maralbahari @haoshan98 Agree with this layered approach — it maps cleanly to what #67 proposes. A few thoughts from the tool framework design side:

On the Tool trait with normalize() and execute(): This is exactly the ToolHandler trait in #67. The interface is validate → discover → normalize → execute → emit. Each tool type (Codex custom tool, MCP, web_search, etc.) implements that trait independently without touching the executor. Haoshan's CustomCodexTool would be one ToolHandler impl — discovery returns Codex's tool manifest, normalize flattens to function defs for vLLM, execute dispatches to Codex's sandbox.

On phased implementation: Agree we don't need ContinuePartial or live orchestration in phase 1. The order I'd propose (aligns with what you described):

  1. Tool types + registry (PR A from docs: tool framework design — multi-type tool support #67's shipping plan) — ResponsesTool enum, ToolRegistry, ToolHandler trait skeleton, FunctionHandler as the trivial passthrough
  2. Simple execution() dispatch — registry lookup determines gateway vs client routing, returns requires_action for function tools
  3. Codex handler (haoshan) + MCP handler — first real execute() impls against the trait
  4. Full execution_loop + LoopDecision with streaming — the live orchestration layer

On Codex as "custom tool" / bridge: Makes sense as a stopgap. From #67's perspective, Codex is just another ToolHandler that happens to route execute() to an external agent runtime. It slots in at step 3 above without any special-casing in the executor.

On the shared interface: The pieces both PRs need are identical — ToolRegistry for routing, normalize() for the vLLM boundary, execute() for dispatch. Happy to have #67 land the trait + registry skeleton, then haoshan builds CustomCodexTool against it. Avoids two parallel type systems.

The multi-turn tool call cassettes are already recorded (PR #77, now merged) — 12 cassettes covering linear, streaming, branching, parallel, and tool-output-only patterns against both vLLM and OpenAI. Those can feed directly into the simple execution_loop implementation at step 2.

@haoshan98

Copy link
Copy Markdown
Contributor Author

Thanks for the thoughtful feedbacks @ashwing @maralbahari and for clarifying the overlap with #67.

I agree that the generic tool framework should own the formal normalization, registry, ownership classification, execution, and loop-decision abstractions. I do not want this PR to create a parallel tool framework.

I updated the Codex design to frame this PR as a small MVP compatibility slice: accept and preserve Codex-used Responses tool shapes, preserve namespaced/function/custom/tool-search call items through request/response/storage paths, support previous_response_id continuation, and add model alias routing for vLLM-backed models.

The helper types in this PR are intentionally lightweight. They document the Codex requirements and let us verify the wire-shape/continuation behavior now, but #67 should formalize the canonical ToolHandler, registry, normalization lifecycle, ownership model, and requires-action loop decision. Once #67 lands, this slice can plug into or be refactored onto those shared abstractions.

This PR should do only the minimum needed for Codex compatibility:

- Add this standalone design doc.
- Accept Codex-used tool declarations without rejecting requests.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably log the tool call so we can create an issue for it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franciscojavierarceo by log do you mean like record a cassettes with some request, response payloads using codex cli?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no i meant we can emit a log on the server so the inference request can re read and reused for debugging.

@haoshan98 haoshan98 force-pushed the designdocs-codex-integration branch from 4ec3d54 to 2a6f2ab Compare June 26, 2026 03:25
Signed-off-by: haoshan98 <haoshanw@gmail.com>
This reverts commit e1852fc.

Signed-off-by: haoshan98 <haoshanw@gmail.com>
Signed-off-by: haoshan98 <haoshanw@gmail.com>
@haoshan98 haoshan98 force-pushed the designdocs-codex-integration branch from 2a6f2ab to 80fe882 Compare June 26, 2026 03:30
@maralbahari maralbahari merged commit 647e49d into vllm-project:main Jun 29, 2026
1 check passed
maralbahari pushed a commit that referenced this pull request Jul 1, 2026
## Summary

Implements PR A from the [tool framework
design](docs/design/tool-framework.md):

- **`ResponsesTool`** — replaces `pub type ResponsesTool = FunctionTool`
alias with a `#[serde(tag = "type")]` tagged enum covering all five
Responses API tool types (`function`, `mcp`, `web_search_preview`,
`file_search`, `code_interpreter`). Wire-compatible: existing
`{"type":"function",...}` requests deserialize unchanged. Marked
`#[non_exhaustive]` for future API additions.
- **`ToolRegistry`** — request-scoped `HashMap<name → ToolEntry>` built
from `RequestPayload.tools` + any server-discovered tools. Provides
`gateway_owned` / `client_owned` split for post-inference dispatch
routing (PR B).
- **`ToolHandler` trait** — `validate / discover / normalize / execute`
lifecycle. `Send + Sync` for `Arc<dyn ToolHandler>`. Default
`discover()` is a no-op; `execute()` must be implemented per handler
type.
- **`FunctionHandler`** — trivial `ToolHandler` impl for `type:
"function"` tools. Normalizes to `FunctionTool`, never gateway-executes
(client-owned).
- **normalize pipeline** — `ResponsesTool::to_function_tool()` wired
into `RequestPayload::to_upstream_request()`. vLLM always receives
`Vec<FunctionTool>`, never raw `ResponsesTool` variants. Function tools
pass through unchanged; gateway-managed types (`mcp`,
`web_search_preview`, `file_search`, `code_interpreter`) are dropped
with a `tracing::debug!` until their handlers land in PR B/C.
- **`From<ToolOutput> for FunctionToolResultMessage`** — zero-friction
conversion for PR B's dispatch loop.
- `McpToolParam` includes `headers: Option<HashMap<String,String>>` for
per-server auth forwarding.

### What vLLM sees after this PR

| Client sends | vLLM receives |
|---|---|
| `{"type":"function","name":"foo",...}` |
`{"type":"function","name":"foo",...}` ✅ unchanged |
| `{"type":"mcp","server_label":"db",...}` | nothing — dropped until PR
C |
| `{"type":"web_search_preview"}` | nothing — dropped until PR C |

Gateway-type tools are parsed, validated, and stored in
`ResponseMetadata.effective_tools` for rehydration across turns — they
just don't reach vLLM until real `normalize()` implementations land.

No execution logic in this PR: no `dispatch_tools`, no `LoopDecision`,
no changes to `engine.rs` / `rehydrate.rs` / `persist.rs`.

Depends on: #64 (io types refactor — merged). Unblocks: PR B (dispatch),
PR C (MCP handler), CustomCodexTool (#68).

## Test Plan

```bash
cargo build --workspace     # 0 errors
cargo clippy --workspace --all-targets -- -D warnings  # 0 warnings
cargo test --workspace      # 246 tests pass
```

Key new tests in `types::tools::tests`:
- All 5 `ResponsesTool` variants round-trip through serde
- `ToolRegistry` lookup, `gateway_owned` / `client_owned` split,
`allowed_tools` filtering, duplicate-name warning, empty-name skip
- `FunctionHandler::validate()` rejects empty names
- `to_upstream_request()` with mixed tools — only function tool reaches
vLLM
- All-gateway tools → `tools: null` (empty array never sent to vLLM)

---------

Signed-off-by: Ashwin Giridharan <girida@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants