Per-agent in-memory task-list tool to survive context trimming

## Status: idea / not committed

Captured from a design discussion. Not scheduled. Posting so we can debate scope before any implementation.

## Idea

Give each agent invocation (primary and subagent) an in-memory task list, exposed as tool calls similar to Claude Code's `TodoWrite`/`TaskUpdate`. The list is scoped to a single `AgentLoopRunner.RunAsync` call, not persisted, not shared across agents.

Tools (rough sketch):

- `task_create(items: string[])` — replace or append to the current list.
- `task_update(id, status)` — `pending` / `in_progress` / `completed`.
- The current list is re-rendered into the system/context message **fresh each iteration** so it always reflects current state.

When the model plans a multi-step sequence at the start of a run, it writes the plan to the task list. As it works, it ticks items off. The completion-evaluation pass at the end of `AgentLoopRunner` can then check the list for unfinished items rather than re-deriving intent from chat history.

## Motivation

Subagent runs commonly hit 50–100 tool calls. Every call's request and result get replayed in the prompt on each subsequent iteration, so context pressure is real. `AgentLoopRunner` already handles overflow:

- `TrimLargeToolResults` (`src/RockBot.Host/AgentLoopRunner.cs:1113`) picks the largest `FunctionResultContent` and truncates it with a `[truncated to fit context window]` marker, repeating until under ~90% of the token budget.
- It runs reactively on a 400 overflow (line 712-720) and pre-emptively at the top of every iteration once `_knownContextLimit` is cached (line 701-702).
- It only touches tool results; user/assistant messages are safe.
- Per the comment at line 1111, this is **text-based path only** — the native `FunctionInvokingChatClient` path doesn't get this trimming today.

So the user task framing survives, but:

1. On the native path, nothing trims, so a long-horizon run can simply hit the model's context limit and fail.
2. On the text path, plans that were derived from a *tool result* (e.g. a subagent that read a planning doc, or a search result that listed work to do) lose fidelity as those results get squeezed.
3. Plans that live only as free-form prose in early assistant turns are never directly truncated, but they also age — by iteration 80 they are far from the model's attention.

A task list whose state is re-rendered fresh each iteration dodges all three: it's always current, it's compact, and it's independent of which path is in use.

## Why this complements existing scaffolding

`AgentLoopRunner` already injects iteration budget, datetime, and a completion-evaluation pass. A task list slots in alongside those: it's another piece of structured state that helps the loop stay coherent across many iterations, and it gives the post-loop completion check something concrete to grade against (\"these three items still say `pending`\").

## Open questions

- **Primary vs subagent**: subagents are usually shorter-horizon. Worth prototyping on the primary first and only extending if subagents demonstrably benefit.
- **Failure mode**: a half-maintained task list is worse than none — looks on-track when it isn't. Need prompt guidance and possibly a nudge when the loop ends with `in_progress` items still open.
- **Surface**: just `create`/`update`, or also `add_item`/`remove_item`? Start minimal.
- **Rendering**: inject as a system message refreshed each iteration, or as a synthetic tool result? System message is simpler and avoids polluting the tool-call history.
- **Interaction with completion evaluation**: should the eval pass *require* the list to be all-completed, or just surface unfinished items as a hint? Probably the latter at first.

## Out of scope

- Persistence across runs. This is in-memory only. Cross-run planning belongs in working-memory keys, not here.
- Sharing task state between primary and subagents. Subagents get their own list.
- Replacing free-form planning. The model can still think out loud; the task list is for committed steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-agent in-memory task-list tool to survive context trimming #336

Status: idea / not committed

Idea

Motivation

Why this complements existing scaffolding

Open questions

Out of scope

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Per-agent in-memory task-list tool to survive context trimming #336

Description

Status: idea / not committed

Idea

Motivation

Why this complements existing scaffolding

Open questions

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions