Skip to content

enhancement: supersede resolved drift findings as task groups complete #623

Description

@mickume

Summary

Drift findings are frozen at the initial drift-review phase and never updated as task groups resolve them. By the time later groups run, the coder is told about problems that no longer exist — wasting prompt space and potentially causing confusion.

Evidence

Session 20260624_165149_e74eca, spec 07 (nightshift_standalone_cli):

The drift review (group 0) produced 12 findings describing the pre-implementation state. These same 12 findings were injected unchanged into every coder prompt from group 2 through group 7:

07_nightshift_standalone_cli:2 prompt — 12 drift findings (3,356 chars)
07_nightshift_standalone_cli:3 prompt — 12 drift findings (3,356 chars, identical)
07_nightshift_standalone_cli:4 prompt — 12 drift findings (3,356 chars, identical)
07_nightshift_standalone_cli:5 prompt — 12 drift findings (3,356 chars, identical)
07_nightshift_standalone_cli:6 prompt — 12 drift findings (3,356 chars, identical)
07_nightshift_standalone_cli:7 prompt — 12 drift findings (3,356 chars, identical)

Examples of stale findings injected into the group 6 coder prompt (docs update):

  • "The entire packages/nightshift/ directory does not exist."resolved by group 3 which created the scaffold
  • "packages/af/af/nightshift.py still exists in the repository."resolved by group 2 which deleted it
  • "packages/af/af/app.py imports and registers night-shift."resolved by group 2 which removed the registration
  • "Night-shift tests have not been migrated."resolved by group 5 which migrated them
  • "Root pyproject.toml is missing nightshift workspace entries."resolved by group 3 which added them

By group 6, only 2–3 of the 12 drift findings were still relevant (docs references to af night-shift). The other 9–10 described state that earlier groups had already fixed.

Impact

  • Prompt waste: 3,356 chars of drift findings per session, ~70% of which is stale by later groups. Across 6 groups, that's ~14k chars of stale context.
  • Potential confusion: A coder seeing "packages/nightshift/ does not exist" in its prompt while looking at a directory that clearly exists could second-guess its own codebase exploration, or worse, try to "fix" something that's already done.
  • Missed opportunity: The prompt space consumed by stale drift findings could carry genuinely useful information instead.

How it currently works

  1. The drift reviewer runs once (group 0) and produces N findings
  2. Findings are stored in review_findings table via review_store.insert_drift_findings()
  3. fox_provider._query_drift_findings() retrieves all active (non-superseded) drift findings for the spec
  4. Every subsequent coder session gets the full set injected into its system prompt
  5. No mechanism exists to mark drift findings as resolved

Suggested Fix

After each task group completes and merges successfully, run a lightweight check to supersede drift findings that the completed group's changes have resolved. Two approaches:

Option A: Automatic supersession based on task-to-drift mapping (recommended)

When the coder session completes and its changes are merged:

  1. Check which files were touched by the merge commit
  2. For each drift finding that references a file or path that was touched, mark it as superseded in review_findings (set superseded_by to the completing session's node_id)
  3. The superseded_by IS NULL filter in _query_drift_findings() will automatically exclude resolved findings from subsequent prompts

This is similar to how review findings are already superseded (review_store.supersede_injected_findings()). The drift findings just need the same treatment.

Option B: Re-run drift check after each group

After each task group merges, re-run a lightweight drift check that evaluates the current codebase state against the spec. Replace the stored drift findings with the fresh set. This is more accurate but more expensive (requires an LLM call per group completion).

Option C: Time-based decay

Mark drift findings with the group they were generated for. Only inject drift findings into coder prompts if they were generated within the last N groups. Simple but imprecise — a drift finding could still be relevant after N groups if no one addressed it.

Files

  • packages/agentfox/agentfox/knowledge/review_store.pyinsert_drift_findings(), supersession logic
  • packages/agentfox/agentfox/knowledge/fox_provider.py_query_drift_findings() retrieval
  • packages/agentfox/agentfox/engine/result_handler.py — post-merge handling where supersession should be triggered
  • packages/agentfox/agentfox/knowledge/migrations.pyreview_findings table schema (already has superseded_by column)

Metadata

Metadata

Assignees

No one assigned

    Labels

    af:implementedSpec implementation complete — awaiting manual verificationenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions