Scope: How tool schemas are built, how function_call items are executed, and how Open WebUI tool sources (registry + Direct Tool Servers) are attached.
Quick Navigation: 📘 Docs Home | ⚙️ Configuration | 🏗️ Architecture | 🔒 Security
This pipe supports OpenRouter tool calling either via an internal execution pipeline (pipe-run tools) or via Open WebUI pass-through (OWUI-run tools). Tool sources and integrations:
- Open WebUI tool registry tools (server-side Python tools).
- Open WebUI Direct Tool Servers (client-side OpenAPI tools executed in the browser via Socket.IO).
- OpenRouter web-search (as a
pluginsentry, not a function tool).
This pipe supports two tool execution backends. Choose based on whether you want the pipe to run tools itself, or you want Open WebUI to run them.
The pipe runs the tool loop itself:
- Provider returns
function_callitems. - The pipe executes those tools (Open WebUI registry tools + Direct Tool Servers where available).
- The pipe appends
function_call_outputitems and re-calls the provider until the model stops requesting tools orMAX_FUNCTION_CALL_LOOPSis reached (at which point the model gets a synthesis turn).
You gain:
- Pipe-level concurrency controls, batching, retries/timeouts, and breaker protections around tool execution.
- Optional persistence/replay of tool results via the pipe artifact store (
PERSIST_TOOL_RESULTS,TOOL_OUTPUT_RETENTION_TURNS), which can reduce repeated tool calls and help with long chats. - Optional strictification of tool schemas (
ENABLE_STRICT_TOOL_CALLING) for more predictable function calling.
You lose / trade off:
- Tool execution behavior is “owned” by the pipe rather than Open WebUI’s native tool runner (so Open WebUI UX/logs may not exactly match the built-in tool flow).
The pipe does not execute tools. Instead, it returns tool calls in an OpenAI-compatible tool_calls shape and expects Open WebUI to:
- execute tools locally (registry tools and/or Direct Tool Servers), and then
- replay tool outputs back through the pipe as
role:"tool"messages on the next request.
You gain:
- Open WebUI-native tool execution behavior and UI (tool boxes, retries, and tool server flows are handled by OWUI).
- A simpler “adapter-only” path: the pipe focuses on transport translation between Open WebUI and OpenRouter.
- Better compatibility with OpenRouter streaming quirks: OpenRouter
/responsescan emit tool calls witharguments:""early; in this mode the pipe will never emitarguments:""to Open WebUI (it waits for complete args or normalizes to{}).
You lose / trade off:
- The pipe does not run tool batching/retries/breakers; Open WebUI’s behavior governs execution.
- Tool result persistence in the pipe artifact store is disabled (even if
PERSIST_TOOL_RESULTS=True). Tool outputs still exist in chat history, but large tool outputs may increase context size/cost versus persistence-based replay. - In pass-through, the pipe does not strictify or mutate tool schemas; Open WebUI’s schemas are forwarded as-is.
Tool schemas are assembled by build_tools(...) and attached to the outgoing Responses request as tools.
- Tools are only attached when the selected model is recognized as supporting
function_calling. - In
TOOL_EXECUTION_MODE="Open-WebUI", the pipe does not block tools based on its model capability registry (it forwards tools as Open WebUI provided them).
-
Open WebUI tool registry (
__tools__dict)- Converted to OpenAI tool specs (
{"type":"function","name",...}) viaResponsesBody.transform_owui_tools(...). - When
TOOL_EXECUTION_MODE="Pipeline"andENABLE_STRICT_TOOL_CALLING=true, each tool schema is strictified:- Object nodes get
additionalProperties: false. - All declared properties are marked required; properties that were not explicitly required become nullable (their type gains
"null"). - Missing property
typevalues are inferred defensively (object/array) so schemas remain valid. - A small LRU cache (size 128) avoids repeated strictification work for identical schemas.
- Object nodes get
- Converted to OpenAI tool specs (
-
Open WebUI Direct Tool Servers (
__metadata__["tool_servers"])- These are user-configured OpenAPI tool servers that Open WebUI executes client-side.
- Open WebUI includes the selected servers in the request body as
tool_servers; for pipes this arrives under__metadata__["tool_servers"]. - This pipe:
- advertises the tools to the model using OpenAPI
operationIdvalues as tool names (no namespacing, collisions overwrite; OWUI-compatible), and - executes tool calls via the Socket.IO bridge (
__event_call__) by emittingexecute:toolso the browser performs the request.
- advertises the tools to the model using OpenAPI
- Direct tools are only advertised when
__event_call__is available; without an active Socket.IO session there is no safe execution path, so the pipe skips them.
-
Extra tools (
extra_tools)- A caller-provided list of already OpenAI-format tool specs is appended as-is (non-dict entries are ignored).
After assembly, tools are deduplicated by (type, name) identity. If duplicates exist, the later entry wins.
Tool execution happens in the request loop that follows each Responses API call:
- The pipe calls the provider (streaming mode for normal chats).
- When a
response.completedevent arrives, the pipe inspects the responseoutputlist. - Any
outputitems withtype == "function_call"are treated as tool calls to execute locally. - The pipe executes the tools and converts each result into
function_call_outputitems. - The
function_callitems (normalized) and their outputs are appended to the next request’sinput[], and the loop continues until either:- no more
function_callitems are returned, or MAX_FUNCTION_CALL_LOOPSis reached — pending tool calls receive stub responses and the model gets one additional turn to synthesize a final answer.
- no more
Notes:
- If a tool name is missing or not present in the tool registry, the pipe returns a structured
function_call_outputindicating the failure. - The pipe does not “stream” tool outputs mid-request. Tools are executed between Responses calls.
MAX_FUNCTION_CALL_LOOPSonly applies whenTOOL_EXECUTION_MODE=”Pipeline”. In Open-WebUI mode, loop control is managed by Open WebUI.
This section documents the dynamic context-budget guard used when TOOL_EXECUTION_MODE="Pipeline".
In long tool loops, the request can become context-saturated (large replayed artifacts + new tool outputs + reasoning state). A common symptom is:
- tool loops continue, but the model eventually returns no useful assistant text (or an incomplete response) because the prompt budget is exhausted.
The pipe now applies adaptive, model-aware budgeting instead of fixed output caps:
- It derives prompt limits from model metadata (
max_prompt_tokens, thencontext_length/max_completion_tokens, with safe fallbacks). - It estimates request/input size and omits oversized
function_call_outputpayloads by replacing them with a short model-visible stub that advises the model to retry with a narrower query. - The model retains full tool access throughout the conversation and can recover from oversized results by retrying with tighter parameters.
This keeps the loop alive, informs the model in-band, and lets the model decide whether to summarize, stop tools, or ask for narrower tool queries.
- Some tool outputs may be replaced by an omission stub when they would likely exceed remaining context budget.
- Failed or omitted tool outputs are still provided to the model for continuity, but they are not persisted and not shown as tool cards.
- If tool loops complete without any assistant content growth and no actionable continuation remains, the pipe emits a fallback assistant message instead of staying silent.
To reduce omissions and improve reliability:
- Prefer tools that support tight server-side limits (
limit,top_k, date ranges, filters). - Have tools return concise summaries plus references/IDs instead of full raw blobs.
- For bulky outputs (search results, logs, traces), expose pagination/continuation parameters so the model can request smaller chunks.
- Keep
PERSIST_TOOL_RESULTSenabled where possible; replay + adaptive omission is safer than repeatedly re-fetching large payloads.
When SHOW_TOOL_CARDS is enabled, the pipe displays collapsible cards in the chat UI showing tool execution status:
- In-progress cards: Appear when a tool starts executing, showing the tool name and arguments.
- Completed cards: Replace in-progress cards when execution finishes, showing tool name, arguments, and results.
- Failed/omitted outputs: Not rendered as tool cards (they are model-visible only for in-loop recovery).
By default, SHOW_TOOL_CARDS is disabled for a cleaner chat experience. Tools execute silently without visual indicators.
Enable this valve when you want:
- Debugging visibility into tool execution
- Users to see what tools are running and their outputs
- Transparency about tool arguments and results
This setting is available as both an admin valve and a user valve (users can override the admin default).
Note: This feature only applies when TOOL_EXECUTION_MODE="Pipeline". In Open-WebUI mode, the pipe doesn't execute tools itself, so it cannot display execution cards.
Tools are executed via a per-request worker pool backed by a bounded queue:
- Queue size: 50 tool calls per request (bounded).
- Worker count:
MAX_PARALLEL_TOOLS_PER_REQUEST. - Per-request semaphore: limits concurrent tool executions per request.
- Global semaphore:
MAX_PARALLEL_TOOLS_GLOBALlimits tool executions across all requests.
Batching behavior:
- Tool calls may be batched when they share the same tool name and do not declare dependency/ordering blockers in arguments.
- If tool arguments include any of:
depends_on,_depends_on,sequential,no_batch, the call is treated as non-batchable. - Batching does not require identical arguments; it is a concurrency optimization, not a deduplication mechanism.
Timeouts and retries:
- Each tool call is run with a per-call timeout (
TOOL_TIMEOUT_SECONDS). - Tool calls are retried up to 2 attempts (per call) when they raise exceptions.
- Tool batches are guarded by a batch timeout (derived from
TOOL_BATCH_TIMEOUT_SECONDSand the per-call timeout). - If the tool queue stays idle for
TOOL_IDLE_TIMEOUT_SECONDS, the worker loop cancels pending work and surfaces an error.
The pipe applies a shared breaker window (BREAKER_MAX_FAILURES within BREAKER_WINDOW_SECONDS) across different subsystems:
- Per-user request breaker: prevents repeated failing requests from thrashing the system.
- Per-user, per-tool-type breaker: temporarily disables executing tool calls of a given type (for example,
function) for a user after repeated tool failures. - Per-user DB breaker: can temporarily suppress persistence-related work after repeated database failures.
When a tool breaker is open, tool calls are skipped and a status message is emitted to the UI (best effort).
The web-search integration is attached as a plugins entry (not as a tools function):
- If the selected model supports
web_search_tooland the OpenRouter Search toggle is enabled for the request (per chat, or enabled by default via the model’s Default Filters), the pipe appends{ "id": "web" }toplugins. - If
WEB_SEARCH_MAX_RESULTSis set, it is included asmax_results. - If reasoning effort is
minimal, the pipe skips adding the web-search plugin.
Important: Open WebUI also has a separate built-in Web Search toggle (Open WebUI-native). OpenRouter Search and Open WebUI Web Search are different systems. See: Web Search (Open WebUI) vs OpenRouter Search.
OpenRouter offers a response-healing plugin that can attempt to repair malformed outputs. This pipe does not expose that plugin on purpose:
- We prefer failing fast when a model returns malformed JSON or invalid structured output.
- Silent repairs can hide real model issues (bad prompts, low token budgets, provider quirks) and make debugging harder.
If you want auto-healing, integrate it explicitly in your own request layer so it is visible and auditable.
Direct Tool Servers are configured and executed by Open WebUI, but advertised/executed through this pipe:
- Configure servers in User Settings → External Tools → Manage Tool Servers (and ensure the server is enabled/toggled).
- Select tool servers for a chat in the tool picker (Open WebUI sends the selected servers in
tool_servers). - When the model calls a direct tool, the pipe emits
execute:toolvia__event_call__and the browser performs the OpenAPI request.
Failure handling:
- Direct tool execution is wrapped in
try/except; tool crashes never crash the pipe/session. - On failure the tool returns an error payload to the model (and the pipe may emit an OWUI notification best-effort).
This pipe no longer implements “remote MCP server connectivity” (previously surfaced as REMOTE_MCP_SERVERS_JSON) because it bypasses Open WebUI’s tool server configuration surface and RBAC/permissions model.
If you want MCP tools in Open WebUI, use an MCP→OpenAPI proxy/aggregator (for example MCPO or MetaMCP) and add the resulting OpenAPI server through Open WebUI’s tool server UI so access control and future tool server changes remain centralized in OWUI.
For persistence behavior and replay rules of tool artifacts, see: