Skip to content

Phase 2 (conditional): restrict consolidated-synthesis tool surface for background tasks #334

@rockfordlhotka

Description

@rockfordlhotka

Status: conditional follow-up to #332

This is not a committed work item. It is a candidate change whose value depends on what we observe after the Phase 1 fix in #332 has run for a while in production.

If Phase 1 alone produces a clean single-summary experience for scheduled tasks — no surprise tool calls, no overwrites of subagent-authored shared keys — we may close this issue without doing it. The point of waiting is to find out whether the synthesis pass's full tool access is actually doing harm or whether it's a legitimate gap-fill mechanism.

What Phase 2 would do

For non-interactive primary sessions (those whose PrimarySessionId does not start with session/, e.g. patrol/heartbeat-patrol), restrict the Phase 2 synthesis pass in SubagentResultHandler to a read-only tool surface:

  • Allowed: memory tools (SearchMemory, SaveMemory, etc.), working-memory tools (GetFromWorkingMemory, ListWorkingMemory, etc.), skill tools (GetSkill, ListSkills).
  • Disallowed: MCP services (mcp_invoke_tool, mcp_get_service_details), scheduler tools (list_scheduled_tasks, schedule_task, cancel_scheduled_task), spawn_subagent, spawn_wisps, file/web tools.

Synthesis becomes purely "read what the subagents wrote, format into a single summary bubble, surface gaps the subagents flagged." It cannot issue fresh tool calls that compete with or overwrite the work the subagents already did.

Interactive chat (session/...) is unchanged — the user might legitimately follow up and the synthesis pass might need to gap-fill in real time.

Motivation (the symptom Phase 2 would address)

Observed in the heartbeat-patrol run on 2026-05-05 around 00:14 America/Chicago. The subagent reported it could not access list_scheduled_tasks and saved its findings to shared/patrol/errors-latest with a 00:14 timestamp. The Phase 2 synthesis pass then:

  1. Re-read all the working-memory keys the subagent wrote.
  2. Called list_scheduled_tasks itself (the subagent's namespace didn't have it; the primary's does).
  3. Overwrote shared/patrol/errors-latest with a new 00:25 timestamp and a "scheduler health verified" message.

That's not a duplicate bubble — Phase 1 fixed that — but it is the synthesis doing fresh work that competes with the subagent and rewrites what the subagent just authored. It's why the patrol "felt like it ran twice" from the user's perspective.

Decision criteria

Make the change if, after at least a week of observation post-Phase-1:

  • Synthesis passes are still surfacing as making meaningful tool calls that overlap or overwrite subagent output, AND
  • The user reports the patrol experience still feels duplicative.

Skip the change if:

  • Phase 1's gate fix alone produces a clean experience.
  • The synthesis's gap-filling tool calls turn out to be useful (e.g. the scheduler-health verification in the example above is genuinely improving the report).

Implementation sketch (if we do it)

In src/RockBot.Subagent/SubagentResultHandler.cs:

var isInteractive = sessionNamespace.StartsWith(\"session/\", StringComparison.OrdinalIgnoreCase);

var registryTools = isInteractive
    ? toolRegistry.GetTools()
        .Select(r => (AIFunction)new SubagentRegistryToolFunction(r, toolRegistry.GetExecutor(r.Name)!, sessionNamespace))
        .ToArray()
    : Array.Empty<AIFunction>();   // background: no registry tools at all

var chatOptions = new ChatOptions
{
    Tools = [..memoryTools.Tools, ..sessionWorkingMemoryTools.Tools, ..sessionSkillTools.Tools,
             ..rulesTools.Tools, ..toolGuideTools.Tools, ..registryTools]
};

Plus a system-message hint telling the synthesis pass that for background tasks, its job is to summarize, not to gap-fill.

Tests to add:

  • Synthesis for PrimarySessionId = patrol/foo does not have any registry tools in chatOptions.Tools.
  • Synthesis for PrimarySessionId = session/foo retains all registry tools.

Out of scope

  • The "thin primary" architecture idea (drop skill bodies from primary, keep skill index). Separate token-efficiency experiment.
  • Per-tool allowlists. We either give synthesis the full registry (interactive) or none of it (background). Half-measures are not worth the maintenance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions