Skip to content

Latest commit

 

History

History
251 lines (160 loc) · 14.4 KB

File metadata and controls

251 lines (160 loc) · 14.4 KB

Tools, plugins, and integrations

Scope: How tool schemas are built, how function_call items are executed, and how Open WebUI tool sources (registry + Direct Tool Servers) are attached.

Quick Navigation: 📘 Docs Home | ⚙️ Configuration | 🏗️ Architecture | 🔒 Security

This pipe supports OpenRouter tool calling either via an internal execution pipeline (pipe-run tools) or via Open WebUI pass-through (OWUI-run tools). Tool sources and integrations:

  • Open WebUI tool registry tools (server-side Python tools).
  • Open WebUI Direct Tool Servers (client-side OpenAPI tools executed in the browser via Socket.IO).
  • OpenRouter web-search (as a plugins entry, not a function tool).

Tool backends (TOOL_EXECUTION_MODE)

This pipe supports two tool execution backends. Choose based on whether you want the pipe to run tools itself, or you want Open WebUI to run them.

Pipeline (default)

The pipe runs the tool loop itself:

  • Provider returns function_call items.
  • The pipe executes those tools (Open WebUI registry tools + Direct Tool Servers where available).
  • The pipe appends function_call_output items and re-calls the provider until the model stops requesting tools or MAX_FUNCTION_CALL_LOOPS is reached (at which point the model gets a synthesis turn).

You gain:

  • Pipe-level concurrency controls, batching, retries/timeouts, and breaker protections around tool execution.
  • Optional persistence/replay of tool results via the pipe artifact store (PERSIST_TOOL_RESULTS, TOOL_OUTPUT_RETENTION_TURNS), which can reduce repeated tool calls and help with long chats.
  • Optional strictification of tool schemas (ENABLE_STRICT_TOOL_CALLING) for more predictable function calling.

You lose / trade off:

  • Tool execution behavior is “owned” by the pipe rather than Open WebUI’s native tool runner (so Open WebUI UX/logs may not exactly match the built-in tool flow).

Open-WebUI (tool bypass / pass-through)

The pipe does not execute tools. Instead, it returns tool calls in an OpenAI-compatible tool_calls shape and expects Open WebUI to:

  • execute tools locally (registry tools and/or Direct Tool Servers), and then
  • replay tool outputs back through the pipe as role:"tool" messages on the next request.

You gain:

  • Open WebUI-native tool execution behavior and UI (tool boxes, retries, and tool server flows are handled by OWUI).
  • A simpler “adapter-only” path: the pipe focuses on transport translation between Open WebUI and OpenRouter.
  • Better compatibility with OpenRouter streaming quirks: OpenRouter /responses can emit tool calls with arguments:"" early; in this mode the pipe will never emit arguments:"" to Open WebUI (it waits for complete args or normalizes to {}).

You lose / trade off:

  • The pipe does not run tool batching/retries/breakers; Open WebUI’s behavior governs execution.
  • Tool result persistence in the pipe artifact store is disabled (even if PERSIST_TOOL_RESULTS=True). Tool outputs still exist in chat history, but large tool outputs may increase context size/cost versus persistence-based replay.
  • In pass-through, the pipe does not strictify or mutate tool schemas; Open WebUI’s schemas are forwarded as-is.

Tool schema assembly (build_tools)

Tool schemas are assembled by build_tools(...) and attached to the outgoing Responses request as tools.

Preconditions

  • Tools are only attached when the selected model is recognized as supporting function_calling.
  • In TOOL_EXECUTION_MODE="Open-WebUI", the pipe does not block tools based on its model capability registry (it forwards tools as Open WebUI provided them).

Tool sources (in order)

  1. Open WebUI tool registry (__tools__ dict)

    • Converted to OpenAI tool specs ({"type":"function","name",...}) via ResponsesBody.transform_owui_tools(...).
    • When TOOL_EXECUTION_MODE="Pipeline" and ENABLE_STRICT_TOOL_CALLING=true, each tool schema is strictified:
      • Object nodes get additionalProperties: false.
      • All declared properties are marked required; properties that were not explicitly required become nullable (their type gains "null").
      • Missing property type values are inferred defensively (object/array) so schemas remain valid.
      • A small LRU cache (size 128) avoids repeated strictification work for identical schemas.
  2. Open WebUI Direct Tool Servers (__metadata__["tool_servers"])

    • These are user-configured OpenAPI tool servers that Open WebUI executes client-side.
    • Open WebUI includes the selected servers in the request body as tool_servers; for pipes this arrives under __metadata__["tool_servers"].
    • This pipe:
      • advertises the tools to the model using OpenAPI operationId values as tool names (no namespacing, collisions overwrite; OWUI-compatible), and
      • executes tool calls via the Socket.IO bridge (__event_call__) by emitting execute:tool so the browser performs the request.
    • Direct tools are only advertised when __event_call__ is available; without an active Socket.IO session there is no safe execution path, so the pipe skips them.
  3. Extra tools (extra_tools)

    • A caller-provided list of already OpenAI-format tool specs is appended as-is (non-dict entries are ignored).

Deduplication

After assembly, tools are deduplicated by (type, name) identity. If duplicates exist, the later entry wins.


Tool execution lifecycle (Responses API loop)

Tool execution happens in the request loop that follows each Responses API call:

  1. The pipe calls the provider (streaming mode for normal chats).
  2. When a response.completed event arrives, the pipe inspects the response output list.
  3. Any output items with type == "function_call" are treated as tool calls to execute locally.
  4. The pipe executes the tools and converts each result into function_call_output items.
  5. The function_call items (normalized) and their outputs are appended to the next request’s input[], and the loop continues until either:
    • no more function_call items are returned, or
    • MAX_FUNCTION_CALL_LOOPS is reached — pending tool calls receive stub responses and the model gets one additional turn to synthesize a final answer.

Notes:

  • If a tool name is missing or not present in the tool registry, the pipe returns a structured function_call_output indicating the failure.
  • The pipe does not “stream” tool outputs mid-request. Tools are executed between Responses calls.
  • MAX_FUNCTION_CALL_LOOPS only applies when TOOL_EXECUTION_MODE=”Pipeline”. In Open-WebUI mode, loop control is managed by Open WebUI.

Adaptive tool output budgeting (Pipeline mode)

This section documents the dynamic context-budget guard used when TOOL_EXECUTION_MODE="Pipeline".

Problem users observe

In long tool loops, the request can become context-saturated (large replayed artifacts + new tool outputs + reasoning state). A common symptom is:

  • tool loops continue, but the model eventually returns no useful assistant text (or an incomplete response) because the prompt budget is exhausted.

Conceptual fix

The pipe now applies adaptive, model-aware budgeting instead of fixed output caps:

  • It derives prompt limits from model metadata (max_prompt_tokens, then context_length/max_completion_tokens, with safe fallbacks).
  • It estimates request/input size and omits oversized function_call_output payloads by replacing them with a short model-visible stub that advises the model to retry with a narrower query.
  • The model retains full tool access throughout the conversation and can recover from oversized results by retrying with tighter parameters.

This keeps the loop alive, informs the model in-band, and lets the model decide whether to summarize, stop tools, or ask for narrower tool queries.

User-visible behavior changes

  • Some tool outputs may be replaced by an omission stub when they would likely exceed remaining context budget.
  • Failed or omitted tool outputs are still provided to the model for continuity, but they are not persisted and not shown as tool cards.
  • If tool loops complete without any assistant content growth and no actionable continuation remains, the pipe emits a fallback assistant message instead of staying silent.

Operator guidance

To reduce omissions and improve reliability:

  • Prefer tools that support tight server-side limits (limit, top_k, date ranges, filters).
  • Have tools return concise summaries plus references/IDs instead of full raw blobs.
  • For bulky outputs (search results, logs, traces), expose pagination/continuation parameters so the model can request smaller chunks.
  • Keep PERSIST_TOOL_RESULTS enabled where possible; replay + adaptive omission is safer than repeatedly re-fetching large payloads.

Tool execution cards (SHOW_TOOL_CARDS)

When SHOW_TOOL_CARDS is enabled, the pipe displays collapsible cards in the chat UI showing tool execution status:

  • In-progress cards: Appear when a tool starts executing, showing the tool name and arguments.
  • Completed cards: Replace in-progress cards when execution finishes, showing tool name, arguments, and results.
  • Failed/omitted outputs: Not rendered as tool cards (they are model-visible only for in-loop recovery).

By default, SHOW_TOOL_CARDS is disabled for a cleaner chat experience. Tools execute silently without visual indicators.

Enable this valve when you want:

  • Debugging visibility into tool execution
  • Users to see what tools are running and their outputs
  • Transparency about tool arguments and results

This setting is available as both an admin valve and a user valve (users can override the admin default).

Note: This feature only applies when TOOL_EXECUTION_MODE="Pipeline". In Open-WebUI mode, the pipe doesn't execute tools itself, so it cannot display execution cards.


Concurrency, batching, and timeouts (per request)

Tools are executed via a per-request worker pool backed by a bounded queue:

  • Queue size: 50 tool calls per request (bounded).
  • Worker count: MAX_PARALLEL_TOOLS_PER_REQUEST.
  • Per-request semaphore: limits concurrent tool executions per request.
  • Global semaphore: MAX_PARALLEL_TOOLS_GLOBAL limits tool executions across all requests.

Batching behavior:

  • Tool calls may be batched when they share the same tool name and do not declare dependency/ordering blockers in arguments.
  • If tool arguments include any of: depends_on, _depends_on, sequential, no_batch, the call is treated as non-batchable.
  • Batching does not require identical arguments; it is a concurrency optimization, not a deduplication mechanism.

Timeouts and retries:

  • Each tool call is run with a per-call timeout (TOOL_TIMEOUT_SECONDS).
  • Tool calls are retried up to 2 attempts (per call) when they raise exceptions.
  • Tool batches are guarded by a batch timeout (derived from TOOL_BATCH_TIMEOUT_SECONDS and the per-call timeout).
  • If the tool queue stays idle for TOOL_IDLE_TIMEOUT_SECONDS, the worker loop cancels pending work and surfaces an error.

Breakers (stability controls)

The pipe applies a shared breaker window (BREAKER_MAX_FAILURES within BREAKER_WINDOW_SECONDS) across different subsystems:

  • Per-user request breaker: prevents repeated failing requests from thrashing the system.
  • Per-user, per-tool-type breaker: temporarily disables executing tool calls of a given type (for example, function) for a user after repeated tool failures.
  • Per-user DB breaker: can temporarily suppress persistence-related work after repeated database failures.

When a tool breaker is open, tool calls are skipped and a status message is emitted to the UI (best effort).


OpenRouter web-search plugin

The web-search integration is attached as a plugins entry (not as a tools function):

  • If the selected model supports web_search_tool and the OpenRouter Search toggle is enabled for the request (per chat, or enabled by default via the model’s Default Filters), the pipe appends { "id": "web" } to plugins.
  • If WEB_SEARCH_MAX_RESULTS is set, it is included as max_results.
  • If reasoning effort is minimal, the pipe skips adding the web-search plugin.

Important: Open WebUI also has a separate built-in Web Search toggle (Open WebUI-native). OpenRouter Search and Open WebUI Web Search are different systems. See: Web Search (Open WebUI) vs OpenRouter Search.


OpenRouter response-healing plugin (intentionally not exposed)

OpenRouter offers a response-healing plugin that can attempt to repair malformed outputs. This pipe does not expose that plugin on purpose:

  • We prefer failing fast when a model returns malformed JSON or invalid structured output.
  • Silent repairs can hide real model issues (bad prompts, low token budgets, provider quirks) and make debugging harder.

If you want auto-healing, integrate it explicitly in your own request layer so it is visible and auditable.


Open WebUI Direct Tool Servers

Direct Tool Servers are configured and executed by Open WebUI, but advertised/executed through this pipe:

  • Configure servers in User Settings → External Tools → Manage Tool Servers (and ensure the server is enabled/toggled).
  • Select tool servers for a chat in the tool picker (Open WebUI sends the selected servers in tool_servers).
  • When the model calls a direct tool, the pipe emits execute:tool via __event_call__ and the browser performs the OpenAPI request.

Failure handling:

  • Direct tool execution is wrapped in try/except; tool crashes never crash the pipe/session.
  • On failure the tool returns an error payload to the model (and the pipe may emit an OWUI notification best-effort).

MCP note (removed)

This pipe no longer implements “remote MCP server connectivity” (previously surfaced as REMOTE_MCP_SERVERS_JSON) because it bypasses Open WebUI’s tool server configuration surface and RBAC/permissions model.

If you want MCP tools in Open WebUI, use an MCP→OpenAPI proxy/aggregator (for example MCPO or MetaMCP) and add the resulting OpenAPI server through Open WebUI’s tool server UI so access control and future tool server changes remain centralized in OWUI.

For persistence behavior and replay rules of tool artifacts, see: