Skip to content

L7 inference proxy silently drops tool_calls chunks on large streaming responses #829

@MitchFuchs

Description

@MitchFuchs

Agent Diagnostic

The OpenShell L7 inference proxy passes reasoning tokens through correctly but silently drops tool_calls delta fields from SSE streaming responses when the tool call payload is large (>~5KB). Small tool calls (~1-2KB) pass through. The model generates valid responses — confirmed by bypassing the proxy and connecting directly to the inference backend.

Description

When an LLM streams a response with tool_calls via /v1/chat/completions (OpenAI-compatible, stream: true, tools parameter), the OpenShell inference routing proxy (inference.local) strips the tool_calls fields from the SSE chunks. The stream completes with reasoning content only — no tool call, no content, no finish_reason. The sandbox OCSF log shows NET:FAIL [LOW] inference.local:443 after the stream ends.

This affects all models — tested with both nemotron-3-super (120B MoE) and gemma4:e4b via Ollama. Both produce valid tool call JSON when accessed directly, but fail identically through the proxy.

Reproduction Steps

  1. Configure an OpenAI-compatible inference provider (e.g., Ollama) with tool-calling-capable models
  2. Set up inference routing: openshell inference set --provider <name> --model <model> --timeout 1800
  3. From inside the sandbox, send a streaming request with tools that requires a large response:
# Through proxy (FAILS - tool_calls dropped):
# URL: https://inference.local/v1/chat/completions
#
# Direct to backend (WORKS - 23KB valid JSON tool call):
# URL: http://<backend-host>:<port>/v1/chat/completions

payload = {
"model": "<model>",
"messages": [{"role": "user", "content": "Write a comprehensive 2000-word project plan. Save it to project_plan.md"}],
"tools": [{"type": "function", "function": {"name": "write", "description": "Write content to a file.", "parameters": {"type": "object", "properties": {"file_path": {"type": "string"}, "content": {"type": "string"}}, "required": ["file_path", "content"]}}}],
"stream": True,
"max_tokens": 16384
}

  1. Expected: SSE stream includes delta.tool_calls chunks, finishes with finish_reason: "tool_calls"
  2. Actual: SSE stream includes only delta.reasoning chunks, then [DONE] with no tool call data. OCSF log shows NET:FAIL [LOW] inference.local:443

Test results summary:

Test Direct to backend Through proxy
nemotron-3-super short tool call (~1.8KB) works works
nemotron-3-super long tool call (~23KB) 324s, valid JSON reasoning only, tool call dropped
gemma4:e4b long tool call (~13KB) 101s, valid JSON reasoning only, tool call dropped
Plain chat (no tools) works works

Environment

OpenShell: 0.0.25 (CLI and gateway)
NemoClaw: 0.0.10
Host: Raspberry Pi 5 (8GB), Ubuntu Server 24.04, aarch64
Inference backend: Ollama (remote via SSH tunnel), models: nemotron-3-super, gemma4:e4b
Inference route configured with protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Proxy path: sandbox → inference.local:443 → OpenShell L7 proxy → backend endpoint

Logs

Sandbox OCSF log around failure:


[timestamp] NET:OPEN [INFO] ALLOWED inference.local:443
[timestamp] [openshell_router] routing proxy inference request (streaming) endpoint=http://<backend>:11435/v1 method=POST path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
# ... sendChatAction calls every ~3s (typing indicator) ...
[timestamp] NET:FAIL [LOW] inference.local:443
[timestamp] HTTP:POST [INFO] ALLOWED POST http://api.telegram.org/bot[CREDENTIAL]/deleteMessage [policy:telegram]
The deleteMessage confirms OpenClaw received no usable response and cleaned up the partial reasoning message from the chat channel.

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

Labels

state:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issue

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions