Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
310 changes: 310 additions & 0 deletions docs/design/tool-framework.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,310 @@
# Design: Tool Framework

> Status: Proposal
> References: [ADR-01 D7](../adr/ADR-01_core.md), [ADR-03 D3](../adr/ADR-03_gateway_integration.md)

---

## Problem

Clients send heterogeneous tool types (`function`, `mcp`, `web_search`, `file_search`, `code_interpreter`). vLLM only speaks function calling — it produces `function_call` output items regardless of tool origin. The gateway must bridge both directions: normalize inbound tools for inference, and route outbound calls to their correct executors.

Today `ResponsesTool = FunctionTool`. This design replaces that with a type-aware framework that handles the full tool lifecycle for any tool type through a single pipeline.

---

## Principles

1. **One pipeline, many types.** The tool lifecycle is the same for all types. What varies is the behavior at each stage.
2. **vLLM is function-only.** Every tool type normalizes to `type: "function"` before inference. Permanent constraint.
3. **Routing by registry, not heuristics.** After inference, `function_call` items are looked up in a request-scoped registry that maps names back to origin type and config.
4. **Function tools are client-owned.** `type: "function"` is never gateway-executed. The response returns `status: "requires_action"` and the client resolves it. All other types are gateway-executed.
5. **Additive.** New tool types implement a trait and register. The executor loop doesn't change.

---

## Architecture

```mermaid
graph TD
subgraph "Request Phase (once per request)"
REQ["Client Request<br>tools: mixed types"]
PARSE["Parse + Validate<br>per-type schemas"]
DISC["Discover<br>MCP: tools/list"]
NORM["Normalize<br>all → type: function"]
REG["Build Registry<br>name → type + config"]
end

subgraph "Inference"
VLLM["vLLM<br>sees only function tools"]
end

subgraph "Execution Phase (per iteration)"
ROUTE["Route<br>registry lookup per call"]
EXEC_GW["Gateway Execute<br>mcp / web / file / code"]
PASS["Passthrough<br>function → requires_action"]
LOOP["Inject Results<br>re-enter inference"]
end

REQ --> PARSE --> DISC --> NORM --> REG
REG --> VLLM
VLLM --> ROUTE
ROUTE -->|gateway-owned| EXEC_GW
ROUTE -->|client-owned| PASS
EXEC_GW --> LOOP --> VLLM

style REQ fill:#1a5c2a,color:#e0e0e0
style VLLM fill:#1a5c2a,color:#e0e0e0
style PARSE fill:#2a4a8a,color:#e0e0e0
style DISC fill:#2a4a8a,color:#e0e0e0
style NORM fill:#2a4a8a,color:#e0e0e0
style REG fill:#2a4a8a,color:#e0e0e0
style ROUTE fill:#2a4a8a,color:#e0e0e0
style EXEC_GW fill:#2a4a8a,color:#e0e0e0
style PASS fill:#2a4a8a,color:#e0e0e0
style LOOP fill:#2a4a8a,color:#e0e0e0
```

---

## Pipeline Stages

Every request with tools passes through 7 stages. Stages 1–4 run once at request start. Stages 5–7 repeat per inference iteration.

| # | Stage | Generic (framework) | Type-Specific (handler) |
|---|-------|---------------------|-------------------------|
| 1 | **Parse** | Deserialize `tools[]`, classify by `type` | Validate required fields per type |
| 2 | **Discover** | Iterate handlers, collect discovered tools | MCP: `tools/list`. Others: no-op |
| 3 | **Normalize** | Flatten all into `Vec<FunctionTool>` for vLLM | MCP: schema → parameters. WebSearch: synthetic def |
| 4 | **Register** | Build `HashMap<name, ToolEntry>` | Each handler declares ownership of its tool names |
| 5 | **Route** | Lookup `function_call.name` in registry | Determine: gateway-execute or client-passthrough |
| 6 | **Execute** | Parallel execution with timeout + error isolation | MCP: JSON-RPC. WebSearch: HTTP API. Function: skip |
| 7 | **Emit** | Forward type-specific SSE events to client | MCP: 7 events. WebSearch: 2 events. Function: 0 |

Stages 1–4 produce two artifacts:
- **Normalized tools** — `Vec<FunctionTool>` forwarded to vLLM
- **Tool registry** — `ToolRegistry` consumed by dispatch for routing

---

## Core Types

### Tool Classification

```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum ToolType {
Function,
Mcp,
WebSearch,
FileSearch,
CodeInterpreter,
}
```

### Request-Side Tool Param

Replaces `pub type ResponsesTool = FunctionTool`:

```rust
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum ResponsesTool {
#[serde(rename = "function")]
Function(FunctionToolParam),

#[serde(rename = "mcp")]
Mcp(McpToolParam),

#[serde(rename = "web_search_preview")]
WebSearch(WebSearchToolParam),

#[serde(rename = "file_search")]
FileSearch(FileSearchToolParam),

#[serde(rename = "code_interpreter")]
CodeInterpreter(CodeInterpreterToolParam),
}
```

`#[serde(tag = "type")]` makes this wire-compatible with existing `{"type":"function",...}` requests.

### Tool Registry

```rust
pub struct ToolEntry {
pub tool_type: ToolType,
pub config: Value,
pub server_label: Option<String>,
}

pub struct ToolRegistry {
entries: HashMap<String, ToolEntry>,
}

impl ToolRegistry {
pub fn lookup(&self, tool_name: &str) -> Option<&ToolEntry>;
pub fn gateway_owned_calls<'a>(&self, calls: &'a [FunctionToolCall]) -> Vec<&'a FunctionToolCall>;
pub fn client_owned_calls<'a>(&self, calls: &'a [FunctionToolCall]) -> Vec<&'a FunctionToolCall>;
}
```

### Loop Decision

```rust
#[derive(Debug)]
#[non_exhaustive]
pub enum LoopDecision {
/// Gateway tools executed — inject results and re-infer.
Continue(Vec<InputItem>),

/// No tool calls — return response as completed.
Done,

/// Only client-owned function calls — return requires_action.
RequiresAction(Vec<FunctionToolCall>),

/// Mixed: gateway tools executed AND client calls pending.
/// Loop back; on next pass if only client calls remain → RequiresAction.
ContinuePartial {
results: Vec<InputItem>,
pending_client_calls: Vec<FunctionToolCall>,
},

/// Safety cap reached.
Incomplete(String),
}
```

---

## The ToolHandler Trait

Each tool type implements this:

```rust
#[async_trait]
pub trait ToolHandler: Send + Sync {
fn tool_type(&self) -> ToolType;

fn validate(&self, param: &Value) -> Result<(), ToolError>;

async fn discover(&self, param: &Value) -> Result<Vec<DiscoveredTool>, ToolError> {
Ok(vec![]) // default: no discovery needed
}

fn normalize(&self, param: &Value, discovered: &[DiscoveredTool]) -> Vec<FunctionTool>;

async fn execute(
&self,
tool_name: &str,
arguments: &str,
config: &Value,
) -> Result<ToolOutput, ToolError>;

fn event_prefix(&self) -> Option<&'static str> {
None // default: no special SSE events
}

fn output_item_type(&self) -> &'static str;
}
```

Adding a new tool type = implementing this trait + registering it. No changes to the executor loop, accumulator, or streaming path.

---

## Per-Type Behavior

| Stage | `function` | `mcp` | `web_search` | `file_search` | `code_interpreter` |
|-------|-----------|-------|-------------|--------------|-------------------|
| Validate | name required | server_url required | (none) | vector_store_ids required | (none) |
| Discover | no-op | `tools/list` on server | no-op | no-op | no-op |
| Normalize | passthrough | McpToolDef → FunctionTool | synthetic `web_search(query)` | synthetic `file_search(query)` | synthetic `code_interpreter(code)` |
| Route | → client | → gateway | → gateway | → gateway | → gateway |
| Execute | N/A | JSON-RPC `tools/call` | HTTP search API | vector store query | sandboxed container |
| SSE events | `function_call_arguments.*` | `mcp_call.*` (7 events) | `web_search_call.*` (2) | `file_search_call.*` (2) | `code_interpreter_call.*` |
| Response status | `requires_action` | `completed` | `completed` | `completed` | `completed` |

---

## Mixed-Tool Request Walkthrough

Request:
```json
{
"tools": [
{"type": "function", "name": "run_shell", "parameters": {...}},
{"type": "mcp", "server_label": "db", "server_url": "http://db-mcp:8080"},
{"type": "web_search_preview"}
],
"input": "Find papers on RLHF, check our DB, then run the import script"
}
```

**Preparation:**
- Discover: MCP server returns `[query_papers, insert_paper]`
- Registry: `run_shell → Function`, `query_papers → Mcp`, `insert_paper → Mcp`, `web_search → WebSearch`
- vLLM sees 4 function tools

**Iteration 1:** Model calls `web_search("RLHF papers")` → gateway executes → loop back

**Iteration 2:** Model calls `query_papers("topic=RLHF")` → gateway executes via JSON-RPC → loop back

**Iteration 3:** Model calls `run_shell("python import.py")` → registry lookup → `Function` → **client-owned** → response returns `status: "requires_action"`

Client executes locally, submits `function_call_output`, inference continues.

---

## Shipping Plan

| PR | Scope | Depends on |
|----|-------|------------|
| **A: Tool Types + Registry** | `ToolType` enum, `ResponsesTool` enum, `ToolRegistry`, `ToolHandler` trait, `FunctionHandler`, normalize pipeline. No execution logic. | io types refactor |
| **B: Type-Aware Dispatch** | Registry-based routing in `dispatch_tools`, `LoopDecision::RequiresAction` + `ContinuePartial`, `HandlerRegistry`. | PR A |
| **C: MCP Handler** | First real `ToolHandler` impl — `tools/list` + `tools/call` via JSON-RPC. Stateless HTTP client. | PR A |
| **D: Tool SSE Events** | Type-specific event emission during execution. Extends `SSEEventType`. | PR B + streaming |
| **E: Output Item Types** | `OutputItem::McpCall`, `OutputItem::WebSearchCall`, etc. Storage + serialization. | PR B |

PR A lands independently. PR C can parallelize with PR B. Future handlers (web_search, file_search, code_interpreter) implement the same trait.

---

## Design Decisions

| # | Decision | Rationale |
|---|----------|-----------|
| D1 | Registry-based routing | Name prefixes leak implementation into the model's tool namespace. Registry is invisible to inference. |
| D2 | Request-scoped registry | Different requests may target different MCP servers. Global state would require sync and conflict resolution. |
| D3 | `function` never gateway-executed | Matches OpenAI spec. Enables agent clients (Codex, etc.) that own their tool implementations. "No client delegation" means the gateway doesn't punt *its* work — not that function tools can't exist. |
| D4 | `ContinuePartial` in LoopDecision | Mixed requests need to execute gateway tools and loop, while tracking that client tools also exist. Without this, we'd skip gateway tools or lose client tools. |
| D5 | MCP client is stateless | Each request opens fresh connections. Connection pooling per `server_url` is a follow-up optimization. |
| D6 | `ResponsesTool` uses `#[serde(tag = "type")]` | Wire-compatible with existing `{"type":"function",...}` — no client migration needed. |

---

## Alternatives Considered for `function` Tool Handling

Decision D3 (`function` is never gateway-executed, returns `requires_action`) is the most debatable choice. Here are the alternatives we evaluated:

| # | Alternative | Behavior | Why rejected |
|---|-------------|----------|--------------|
| A | **Reject function tools entirely** | Validate at parse time — if `type: "function"` is present, return 400. Force clients to back all tools with MCP servers. | Breaks OpenAI spec compatibility. Prevents agent clients (Codex, Claude Code) from using their natural pattern. Unnecessarily opinionated. |
| B | **Ignore + warn** | Accept `function` tools, normalize to vLLM, but if model calls one: drop the call silently, log a warning, and continue inference without it. | Silent data loss. Model asked for a tool result and gets nothing — produces hallucinated or degraded responses. Violates least-surprise. |
| C | **Search MCP servers for matching name** | When model calls a `function` tool, check if any registered MCP server happens to expose a tool with that name. If found, execute via MCP. If not, fall back to `requires_action`. | Spooky action at a distance. Client declares `type: "function"` expecting to own execution, but gateway silently intercepts it if an MCP server has a name collision. Also adds latency (extra `tools/list` queries). |
| D | **Gateway-execute all (require registered executor)** | Every `function` tool must have a backing executor configured in gateway config. No `requires_action` at all. | Requires operators to pre-configure every tool. Impossible for dynamic agent clients that generate tool definitions at runtime. Breaks the most common agentic pattern. |
| E | **Configurable per-request** | Add a field like `function_execution: "client" \| "gateway"` to let the client choose. | Over-engineering for MVP. Adds complexity to every code path. If a real use case emerges, we can add it later without breaking the default. |

**Chosen: passthrough with `requires_action`** — matches OpenAI spec exactly, zero surprise for clients, and cleanly separates "tools the gateway owns" from "tools the client owns" based solely on the `type` field the client already provides.

---

## Open Questions

| # | Question | Proposed Answer |
|---|----------|-----------------|
| Q1 | What if MCP `tools/list` returns a name colliding with a `function` tool? | Function wins (client-defined takes precedence). Emit warning log. |
| Q2 | How does `ContinuePartial` look to the streaming client? | Gateway tool events stream in real-time. Final status is `requires_action`. Client already handles incremental events. |
| Q3 | Should `tool_choice: {function: {name: "x"}}` work for MCP-discovered tools? | Yes. vLLM sees all normalized functions. If the forced name is MCP-originated, the call routes through MCP naturally. |
| Q4 | Should `prepare_tools` be a Praxis filter or part of `execute_loop`? | Part of `execute_loop` in core. Praxis wraps the whole loop, not individual tool stages. |