Status: idea / not committed
Captured from a design discussion. Not scheduled. Posting so we can debate scope before any implementation.
Idea
Give each agent invocation (primary and subagent) an in-memory task list, exposed as tool calls similar to Claude Code's TodoWrite/TaskUpdate. The list is scoped to a single AgentLoopRunner.RunAsync call, not persisted, not shared across agents.
Tools (rough sketch):
task_create(items: string[]) — replace or append to the current list.
task_update(id, status) — pending / in_progress / completed.
- The current list is re-rendered into the system/context message fresh each iteration so it always reflects current state.
When the model plans a multi-step sequence at the start of a run, it writes the plan to the task list. As it works, it ticks items off. The completion-evaluation pass at the end of AgentLoopRunner can then check the list for unfinished items rather than re-deriving intent from chat history.
Motivation
Subagent runs commonly hit 50–100 tool calls. Every call's request and result get replayed in the prompt on each subsequent iteration, so context pressure is real. AgentLoopRunner already handles overflow:
TrimLargeToolResults (src/RockBot.Host/AgentLoopRunner.cs:1113) picks the largest FunctionResultContent and truncates it with a [truncated to fit context window] marker, repeating until under ~90% of the token budget.
- It runs reactively on a 400 overflow (line 712-720) and pre-emptively at the top of every iteration once
_knownContextLimit is cached (line 701-702).
- It only touches tool results; user/assistant messages are safe.
- Per the comment at line 1111, this is text-based path only — the native
FunctionInvokingChatClient path doesn't get this trimming today.
So the user task framing survives, but:
- On the native path, nothing trims, so a long-horizon run can simply hit the model's context limit and fail.
- On the text path, plans that were derived from a tool result (e.g. a subagent that read a planning doc, or a search result that listed work to do) lose fidelity as those results get squeezed.
- Plans that live only as free-form prose in early assistant turns are never directly truncated, but they also age — by iteration 80 they are far from the model's attention.
A task list whose state is re-rendered fresh each iteration dodges all three: it's always current, it's compact, and it's independent of which path is in use.
Why this complements existing scaffolding
AgentLoopRunner already injects iteration budget, datetime, and a completion-evaluation pass. A task list slots in alongside those: it's another piece of structured state that helps the loop stay coherent across many iterations, and it gives the post-loop completion check something concrete to grade against ("these three items still say pending").
Open questions
- Primary vs subagent: subagents are usually shorter-horizon. Worth prototyping on the primary first and only extending if subagents demonstrably benefit.
- Failure mode: a half-maintained task list is worse than none — looks on-track when it isn't. Need prompt guidance and possibly a nudge when the loop ends with
in_progress items still open.
- Surface: just
create/update, or also add_item/remove_item? Start minimal.
- Rendering: inject as a system message refreshed each iteration, or as a synthetic tool result? System message is simpler and avoids polluting the tool-call history.
- Interaction with completion evaluation: should the eval pass require the list to be all-completed, or just surface unfinished items as a hint? Probably the latter at first.
Out of scope
- Persistence across runs. This is in-memory only. Cross-run planning belongs in working-memory keys, not here.
- Sharing task state between primary and subagents. Subagents get their own list.
- Replacing free-form planning. The model can still think out loud; the task list is for committed steps.
Status: idea / not committed
Captured from a design discussion. Not scheduled. Posting so we can debate scope before any implementation.
Idea
Give each agent invocation (primary and subagent) an in-memory task list, exposed as tool calls similar to Claude Code's
TodoWrite/TaskUpdate. The list is scoped to a singleAgentLoopRunner.RunAsynccall, not persisted, not shared across agents.Tools (rough sketch):
task_create(items: string[])— replace or append to the current list.task_update(id, status)—pending/in_progress/completed.When the model plans a multi-step sequence at the start of a run, it writes the plan to the task list. As it works, it ticks items off. The completion-evaluation pass at the end of
AgentLoopRunnercan then check the list for unfinished items rather than re-deriving intent from chat history.Motivation
Subagent runs commonly hit 50–100 tool calls. Every call's request and result get replayed in the prompt on each subsequent iteration, so context pressure is real.
AgentLoopRunneralready handles overflow:TrimLargeToolResults(src/RockBot.Host/AgentLoopRunner.cs:1113) picks the largestFunctionResultContentand truncates it with a[truncated to fit context window]marker, repeating until under ~90% of the token budget._knownContextLimitis cached (line 701-702).FunctionInvokingChatClientpath doesn't get this trimming today.So the user task framing survives, but:
A task list whose state is re-rendered fresh each iteration dodges all three: it's always current, it's compact, and it's independent of which path is in use.
Why this complements existing scaffolding
AgentLoopRunneralready injects iteration budget, datetime, and a completion-evaluation pass. A task list slots in alongside those: it's another piece of structured state that helps the loop stay coherent across many iterations, and it gives the post-loop completion check something concrete to grade against ("these three items still saypending").Open questions
in_progressitems still open.create/update, or alsoadd_item/remove_item? Start minimal.Out of scope