Agent Diagnostic
The OpenShell L7 inference proxy passes reasoning tokens through correctly but silently drops tool_calls delta fields from SSE streaming responses when the tool call payload is large (>~5KB). Small tool calls (~1-2KB) pass through. The model generates valid responses — confirmed by bypassing the proxy and connecting directly to the inference backend.
Description
When an LLM streams a response with tool_calls via /v1/chat/completions (OpenAI-compatible, stream: true, tools parameter), the OpenShell inference routing proxy (inference.local) strips the tool_calls fields from the SSE chunks. The stream completes with reasoning content only — no tool call, no content, no finish_reason. The sandbox OCSF log shows NET:FAIL [LOW] inference.local:443 after the stream ends.
This affects all models — tested with both nemotron-3-super (120B MoE) and gemma4:e4b via Ollama. Both produce valid tool call JSON when accessed directly, but fail identically through the proxy.
Reproduction Steps
- Configure an OpenAI-compatible inference provider (e.g., Ollama) with tool-calling-capable models
- Set up inference routing:
openshell inference set --provider <name> --model <model> --timeout 1800 - From inside the sandbox, send a streaming request with tools that requires a large response:
# Through proxy (FAILS - tool_calls dropped):
# URL: https://inference.local/v1/chat/completions
#
# Direct to backend (WORKS - 23KB valid JSON tool call):
# URL: http://<backend-host>:<port>/v1/chat/completions
payload = {
"model": "<model>",
"messages": [{"role": "user", "content": "Write a comprehensive 2000-word project plan. Save it to project_plan.md"}],
"tools": [{"type": "function", "function": {"name": "write", "description": "Write content to a file.", "parameters": {"type": "object", "properties": {"file_path": {"type": "string"}, "content": {"type": "string"}}, "required": ["file_path", "content"]}}}],
"stream": True,
"max_tokens": 16384
}
- Expected: SSE stream includes
delta.tool_calls chunks, finishes with finish_reason: "tool_calls" - Actual: SSE stream includes only
delta.reasoning chunks, then [DONE] with no tool call data. OCSF log shows NET:FAIL [LOW] inference.local:443
Test results summary:
| Test |
Direct to backend |
Through proxy |
| nemotron-3-super short tool call (~1.8KB) |
works |
works |
| nemotron-3-super long tool call (~23KB) |
324s, valid JSON |
reasoning only, tool call dropped |
| gemma4:e4b long tool call (~13KB) |
101s, valid JSON |
reasoning only, tool call dropped |
| Plain chat (no tools) |
works |
works |
Environment
OpenShell: 0.0.25 (CLI and gateway)
NemoClaw: 0.0.10
Host: Raspberry Pi 5 (8GB), Ubuntu Server 24.04, aarch64
Inference backend: Ollama (remote via SSH tunnel), models: nemotron-3-super, gemma4:e4b
Inference route configured with protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Proxy path: sandbox → inference.local:443 → OpenShell L7 proxy → backend endpoint
Logs
Sandbox OCSF log around failure:
[timestamp] NET:OPEN [INFO] ALLOWED inference.local:443
[timestamp] [openshell_router] routing proxy inference request (streaming) endpoint=http://<backend>:11435/v1 method=POST path=/v1/chat/completions protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
# ... sendChatAction calls every ~3s (typing indicator) ...
[timestamp] NET:FAIL [LOW] inference.local:443
[timestamp] HTTP:POST [INFO] ALLOWED POST http://api.telegram.org/bot[CREDENTIAL]/deleteMessage [policy:telegram]
The deleteMessage confirms OpenClaw received no usable response and cleaned up the partial reasoning message from the chat channel.
Agent-First Checklist
Agent Diagnostic
The OpenShell L7 inference proxy passes reasoning tokens through correctly but silently drops tool_calls delta fields from SSE streaming responses when the tool call payload is large (>~5KB). Small tool calls (~1-2KB) pass through. The model generates valid responses — confirmed by bypassing the proxy and connecting directly to the inference backend.
Description
When an LLM streams a response with tool_calls via /v1/chat/completions (OpenAI-compatible, stream: true, tools parameter), the OpenShell inference routing proxy (inference.local) strips the tool_calls fields from the SSE chunks. The stream completes with reasoning content only — no tool call, no content, no finish_reason. The sandbox OCSF log shows NET:FAIL [LOW] inference.local:443 after the stream ends.
This affects all models — tested with both nemotron-3-super (120B MoE) and gemma4:e4b via Ollama. Both produce valid tool call JSON when accessed directly, but fail identically through the proxy.
Reproduction Steps
openshell inference set --provider <name> --model <model> --timeout 1800delta.tool_callschunks, finishes withfinish_reason: "tool_calls"delta.reasoningchunks, then[DONE]with no tool call data. OCSF log showsNET:FAIL [LOW] inference.local:443Test results summary:
Environment
OpenShell: 0.0.25 (CLI and gateway)
NemoClaw: 0.0.10
Host: Raspberry Pi 5 (8GB), Ubuntu Server 24.04, aarch64
Inference backend: Ollama (remote via SSH tunnel), models: nemotron-3-super, gemma4:e4b
Inference route configured with protocols=openai_chat_completions,openai_completions,openai_responses,model_discovery
Proxy path: sandbox → inference.local:443 → OpenShell L7 proxy → backend endpoint
Logs
Agent-First Checklist
debug-openshell-cluster,debug-inference,openshell-cli)