Skip to content

TUI stuck in 'Working…' state after failed sub-agent transfer_task (streamDepth leak) #2255

@aheritier

Description

@aheritier

Bug Description

When a transfer_task to a sub-agent fails mid-stream (e.g., unexpected EOF from the model), the TUI gets permanently stuck in Working… state. The user cannot interact with the agent — messages are queued instead of processed, and only Esc (cancel) can break out.

Root Cause

In runSubSessionForwarding (pkg/runtime/agent_delegation.go), when an ErrorEvent is received from the child stream, the function returns immediately, breaking out of the for event := range loop:

for event := range r.RunStream(ctx, child) {
    evts <- event
    if errEvent, ok := event.(*ErrorEvent); ok {
        // returns immediately — stops consuming events from child channel
        return nil, fmt.Errorf("%s", errEvent.Error)
    }
}

The child's RunStream goroutine is still running and will emit StreamStoppedEvent via finalizeEventChannelclose(events). But since runSubSessionForwarding already stopped reading, the child's StreamStoppedEvent is never forwarded to the parent's event channel.

The TUI's streamDepth counter incremented when it received the child's StreamStartedEvent, but never decrements because the matching StreamStoppedEvent is lost. This causes:

  1. When the parent stream eventually finishes and emits its own StreamStoppedEvent, the TUI sees streamDepth > 0 after decrementing
  2. The sub-agent guard in handleStreamStopped kicks in: only updates the sidebar, never calls setWorking(false)
  3. working remains true forever — all subsequent user messages are queued instead of processed

Steps to Reproduce

  1. Configure a multi-agent setup with transfer_task delegation
  2. Trigger a task that delegates to a sub-agent via transfer_task
  3. Have the sub-agent's model stream fail mid-way (e.g., network error, unexpected EOF)
  4. The parent agent recovers and retries (or continues with other work)
  5. Eventually the parent completes its work and stops
  6. Observed: TUI shows Working… (N queued) permanently; user messages are queued, not processed
  7. Expected: TUI returns to idle state; user can interact normally

Evidence from Session Data

Session 4a000595-1ef3-4fbe-88b4-5f0e80420378 demonstrates this:

  • Position 28: Parent calls transfer_task to code-reviewer
  • Position 29: Tool response: "Error calling tool: all models failed: error receiving from stream: unexpected EOF" — the first sub-session's stream failed
  • Position 32: Parent retries transfer_task successfully (sub-session 190e24e1-... completes normally)
  • Position 50: Parent outputs final message with no tool calls (model stops)
  • TUI state: Still shows Working… (1 queued) because streamDepth is off by 1 from the failed first transfer

Proposed Fix

In runSubSessionForwarding, when an ErrorEvent is received, drain the remaining events from the child channel before returning so the StreamStoppedEvent is forwarded to the parent:

for event := range r.RunStream(ctx, child) {
    evts <- event
    if errEvent, ok := event.(*ErrorEvent); ok {
        // Drain remaining events so StreamStopped reaches the parent TUI
        for remaining := range ??? {
            evts <- remaining
        }
        span.RecordError(fmt.Errorf("%s", errEvent.Error))
        span.SetStatus(codes.Error, "sub-session error")
        return nil, fmt.Errorf("%s", errEvent.Error)
    }
}

Note: the current code uses for event := range r.RunStream(ctx, child) so there is no separate channel variable to drain after breaking. The fix likely requires capturing the channel:

childEvents := r.RunStream(ctx, child)
for event := range childEvents {
    evts <- event
    if errEvent, ok := event.(*ErrorEvent); ok {
        // Drain remaining events (including StreamStopped)
        for remaining := range childEvents {
            evts <- remaining
        }
        span.RecordError(fmt.Errorf("%s", errEvent.Error))
        span.SetStatus(codes.Error, "sub-session error")
        return nil, fmt.Errorf("%s", errEvent.Error)
    }
}

Environment

  • docker agent v1.37.0
  • macOS (arm64)
  • Multi-agent config with transfer_task delegation
  • Model: anthropic/claude-opus-4-6

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/agentFor work that has to do with the general agent loop/agentic features of the apparea/tuiFor features/issues/fixes related to the TUIkind/bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions