L7 inference proxy silently drops tool_calls chunks on large streaming responses

### Agent Diagnostic

The OpenShell L7 inference proxy passes reasoning tokens through correctly but silently drops tool_calls delta fields from SSE streaming responses when the tool call payload is large (>~5KB). Small tool calls (~1-2KB) pass through. The model generates valid responses — confirmed by bypassing the proxy and connecting directly to the inference backend.

### Description

When an LLM streams a response with tool_calls via /v1/chat/completions (OpenAI-compatible, stream: true, tools parameter), the OpenShell inference routing proxy (inference.local) strips the tool_calls fields from the SSE chunks. The stream completes with reasoning content only — no tool call, no content, no finish_reason. The sandbox OCSF log shows NET:FAIL [LOW] inference.local:443 after the stream ends.

This affects all models — tested with both nemotron-3-super (120B MoE) and gemma4:e4b via Ollama. Both produce valid tool call JSON when accessed directly, but fail identically through the proxy.

### Reproduction Steps

<html>
<body>
<ol style="padding-inline-start: 2em; color: rgb(191, 191, 191); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(18, 19, 20); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li>Configure an OpenAI-compatible inference provider (e.g., Ollama) with tool-calling-capable models</li><li>Set up inference routing:<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">openshell inference set --provider &lt;name&gt; --model &lt;model&gt; --timeout 1800</code></li><li>From inside the sandbox, send a streaming request with tools that requires a large response:</li></ol><div class="codeBlockWrapper_-a7MRw" style="position: relative; margin: 8px 0px; color: rgb(191, 191, 191); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(18, 19, 20); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><button class="copyButton_CEmTFw copyButton_-a7MRw" title="Copy code" aria-label="Copy code to clipboard" style="color: rgb(191, 191, 191); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, sans-serif; font-size: 13px; background: none 0% 0% / auto repeat scroll padding-box border-box rgb(18, 19, 20); border-color: rgb(42, 43, 44); border-style: solid; border-width: 0.8px; border-image: none 100% / 1 / 0 stretch; cursor: pointer; opacity: 0; display: flex; border-radius: 4px; justify-content: center; align-items: center; padding: 4px; transition: opacity 0.15s, background 0.15s; position: absolute; top: 4px; right: 4px;"><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20" fill="currentColor" aria-hidden="true" data-slot="icon" class="copyIcon_CEmTFw"><path fill-rule="evenodd" d="M15.988 3.012A2.25 2.25 0 0 1 18 5.25v6.5A2.25 2.25 0 0 1 15.75 14H13.5v-3.379a3 3 0 0 0-.879-2.121l-3.12-3.121a3 3 0 0 0-1.402-.791 2.252 2.252 0 0 1 1.913-1.576A2.25 2.25 0 0 1 12.25 1h1.5a2.25 2.25 0 0 1 2.238 2.012ZM11.5 3.25a.75.75 0 0 1 .75-.75h1.5a.75.75 0 0 1 .75.75v.25h-3v-.25Z" clip-rule="evenodd"></path><path d="M3.5 6A1.5 1.5 0 0 0 2 7.5v9A1.5 1.5 0 0 0 3.5 18h7a1.5 1.5 0 0 0 1.5-1.5v-5.879a1.5 1.5 0 0 0-.44-1.06L8.44 6.439A1.5 1.5 0 0 0 7.378 6H3.5Z"></path></svg></button><pre style="overflow-x: auto; white-space: pre; box-sizing: border-box; border-radius: 4px; max-width: 100%; margin: 0px; padding: 8px;"><code class="language-python" style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 0px; border-radius: 3px; word-break: break-word; font-size: 0.9em;"># Through proxy (FAILS - tool_calls dropped):
# URL: https://inference.local/v1/chat/completions
#
# Direct to backend (WORKS - 23KB valid JSON tool call):
# URL: http://&lt;backend-host&gt;:&lt;port&gt;/v1/chat/completions

payload = {
    "model": "&lt;model&gt;",
    "messages": [{"role": "user", "content": "Write a comprehensive 2000-word project plan. Save it to project_plan.md"}],
    "tools": [{"type": "function", "function": {"name": "write", "description": "Write content to a file.", "parameters": {"type": "object", "properties": {"file_path": {"type": "string"}, "content": {"type": "string"}}, "required": ["file_path", "content"]}}}],
    "stream": True,
    "max_tokens": 16384
}
</code></pre></div><ol start="4" style="padding-inline-start: 2em; color: rgb(191, 191, 191); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(18, 19, 20); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><li><strong>Expected:</strong><span> </span>SSE stream includes<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">delta.tool_calls</code><span> </span>chunks, finishes with<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">finish_reason: "tool_calls"</code></li><li><strong>Actual:</strong><span> </span>SSE stream includes only<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">delta.reasoning</code><span> </span>chunks, then<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">[DONE]</code><span> </span>with no tool call data. OCSF log shows<span> </span><code style="font-family: monospace; color: rgb(140, 140, 140); background-color: rgb(38, 38, 38); padding: 2px 4px; border-radius: 3px; word-break: break-word; font-size: 0.9em;">NET:FAIL [LOW] inference.local:443</code></li></ol><p style="white-space: pre-wrap; margin-top: 0.1em; margin-bottom: 0.2em; color: rgb(191, 191, 191); font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(18, 19, 20); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><strong>Test results summary:</strong></p>
Test | Direct to backend | Through proxy
-- | -- | --
nemotron-3-super short tool call (~1.8KB) | works | works
nemotron-3-super long tool call (~23KB) | 324s, valid JSON | reasoning only, tool call dropped
gemma4:e4b long tool call (~13KB) | 101s, valid JSON | reasoning only, tool call dropped
Plain chat (no tools) | works | works


</body>
</html>

### Environment

OpenShell: 0.0.25 (CLI and gateway)
NemoClaw: 0.0.10
Host: Raspberry Pi 5 (8GB), Ubuntu Server 24.04, aarch64
Inference backend: Ollama (remote via SSH tunnel), models: nemotron-3-super, gemma4:e4b
Inference route configured with protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Proxy path: sandbox → inference.local:443 → OpenShell L7 proxy → backend endpoint

### Logs

```shell
Sandbox OCSF log around failure:


[timestamp] NET:OPEN [INFO] ALLOWED inference.local:443
[timestamp] [openshell_router] routing proxy inference request (streaming) endpoint=http://<backend>:11435/v1 method=POST path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
# ... sendChatAction calls every ~3s (typing indicator) ...
[timestamp] NET:FAIL [LOW] inference.local:443
[timestamp] HTTP:POST [INFO] ALLOWED POST http://api.telegram.org/bot[CREDENTIAL]/deleteMessage [policy:telegram]
The deleteMessage confirms OpenClaw received no usable response and cleaned up the partial reasoning message from the chat channel.
```

### Agent-First Checklist

- [x] I pointed my agent at the repo and had it investigate this issue
- [x] I loaded relevant skills (e.g., `debug-openshell-cluster`, `debug-inference`, `openshell-cli`)
- [x] My agent could not resolve this — the diagnostic above explains why

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L7 inference proxy silently drops tool_calls chunks on large streaming responses #829

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Agent-First Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test	Direct to backend	Through proxy
nemotron-3-super short tool call (~1.8KB)	works	works
nemotron-3-super long tool call (~23KB)	324s, valid JSON	reasoning only, tool call dropped
gemma4:e4b long tool call (~13KB)	101s, valid JSON	reasoning only, tool call dropped
Plain chat (no tools)	works	works

L7 inference proxy silently drops tool_calls chunks on large streaming responses #829

Description

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Agent-First Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions