-
Notifications
You must be signed in to change notification settings - Fork 324
Description
Bug Description
When a transfer_task to a sub-agent fails mid-stream (e.g., unexpected EOF from the model), the TUI gets permanently stuck in Working… state. The user cannot interact with the agent — messages are queued instead of processed, and only Esc (cancel) can break out.
Root Cause
In runSubSessionForwarding (pkg/runtime/agent_delegation.go), when an ErrorEvent is received from the child stream, the function returns immediately, breaking out of the for event := range loop:
for event := range r.RunStream(ctx, child) {
evts <- event
if errEvent, ok := event.(*ErrorEvent); ok {
// returns immediately — stops consuming events from child channel
return nil, fmt.Errorf("%s", errEvent.Error)
}
}The child's RunStream goroutine is still running and will emit StreamStoppedEvent via finalizeEventChannel → close(events). But since runSubSessionForwarding already stopped reading, the child's StreamStoppedEvent is never forwarded to the parent's event channel.
The TUI's streamDepth counter incremented when it received the child's StreamStartedEvent, but never decrements because the matching StreamStoppedEvent is lost. This causes:
- When the parent stream eventually finishes and emits its own
StreamStoppedEvent, the TUI seesstreamDepth > 0after decrementing - The sub-agent guard in
handleStreamStoppedkicks in: only updates the sidebar, never callssetWorking(false) workingremainstrueforever — all subsequent user messages are queued instead of processed
Steps to Reproduce
- Configure a multi-agent setup with
transfer_taskdelegation - Trigger a task that delegates to a sub-agent via
transfer_task - Have the sub-agent's model stream fail mid-way (e.g., network error,
unexpected EOF) - The parent agent recovers and retries (or continues with other work)
- Eventually the parent completes its work and stops
- Observed: TUI shows
Working… (N queued)permanently; user messages are queued, not processed - Expected: TUI returns to idle state; user can interact normally
Evidence from Session Data
Session 4a000595-1ef3-4fbe-88b4-5f0e80420378 demonstrates this:
- Position 28: Parent calls
transfer_tasktocode-reviewer - Position 29: Tool response:
"Error calling tool: all models failed: error receiving from stream: unexpected EOF"— the first sub-session's stream failed - Position 32: Parent retries
transfer_tasksuccessfully (sub-session190e24e1-...completes normally) - Position 50: Parent outputs final message with no tool calls (model stops)
- TUI state: Still shows
Working… (1 queued)becausestreamDepthis off by 1 from the failed first transfer
Proposed Fix
In runSubSessionForwarding, when an ErrorEvent is received, drain the remaining events from the child channel before returning so the StreamStoppedEvent is forwarded to the parent:
for event := range r.RunStream(ctx, child) {
evts <- event
if errEvent, ok := event.(*ErrorEvent); ok {
// Drain remaining events so StreamStopped reaches the parent TUI
for remaining := range ??? {
evts <- remaining
}
span.RecordError(fmt.Errorf("%s", errEvent.Error))
span.SetStatus(codes.Error, "sub-session error")
return nil, fmt.Errorf("%s", errEvent.Error)
}
}Note: the current code uses for event := range r.RunStream(ctx, child) so there is no separate channel variable to drain after breaking. The fix likely requires capturing the channel:
childEvents := r.RunStream(ctx, child)
for event := range childEvents {
evts <- event
if errEvent, ok := event.(*ErrorEvent); ok {
// Drain remaining events (including StreamStopped)
for remaining := range childEvents {
evts <- remaining
}
span.RecordError(fmt.Errorf("%s", errEvent.Error))
span.SetStatus(codes.Error, "sub-session error")
return nil, fmt.Errorf("%s", errEvent.Error)
}
}Environment
- docker agent v1.37.0
- macOS (arm64)
- Multi-agent config with
transfer_taskdelegation - Model:
anthropic/claude-opus-4-6