Skip to content

fix(cache-bust): correct marker placement across context resets, seeded resumes, and FORK children#6

Merged
cquil11 merged 9 commits into
SemiAnalysisAI:cjq/agentx-v0.3from
ajcasagrande:acasagrande/cache-bust-reset-context
Jun 11, 2026
Merged

fix(cache-bust): correct marker placement across context resets, seeded resumes, and FORK children#6
cquil11 merged 9 commits into
SemiAnalysisAI:cjq/agentx-v0.3from
ajcasagrande:acasagrande/cache-bust-reset-context

Conversation

@ajcasagrande

@ajcasagrande ajcasagrande commented Jun 11, 2026

Copy link
Copy Markdown

Summary

Fixes how the cache-bust marker is placed by the worker (_apply_cache_bust) so it lands on the effective wire prefix in every case, and adds extensive unit + real-subprocess integration coverage. Four correctness fixes, then tests.

Fixes

  1. reset_context re-application + sub-path-2 stacking — a reset_context turn makes build_messages discard the accumulated prefix and restart from that turn, so the turn-0 marker was lost from the new prefix; recycled plays then replayed a byte-identical post-reset prefix. Also fixed a latent stacking bug where _inject_marker_into_raw_messages re-mutated the shared turn_list[0] system message every credit under delta modes. Injection is now idempotent.

  2. Seeded mid-trajectory resume — agentic_replay warmup resumes at k_i > 0; advance_turn back-fills turns 0..k_i, so turn 0 (the real prefix) was present but the turn_index == 0 gate skipped it, leaving the AgentX-locked FIRST_TURN_PREFIX path's initial trajectories unmarked. Injection now runs every credit (idempotent) at the effective prefix.

  3. Last/buried reset turnbuild_messages restarts at every reset turn, so the effective prefix begins at the last reset turn in turn_list, which can be buried mid-history (seeded on a resume, never the current turn). Replaced the turn_list[-1]-only check with _effective_prefix_turns, which scans back to the last reset turn and unifies the FIRST_TURN_* and SYSTEM_* paths.

  4. FORK children inherit, never re-bust — FORK children seed turn_list = list(parent.turn_list) (the same Turn objects, same worker) and share the parent's KV cache by design. The every-credit injection would otherwise mutate the parent's shared turn and stack markers. Cache-bust is now a no-op for FORK children (they inherit the parent's already-play-distinct prefix); SPAWN children and roots are still busted.

Tests

  • 81 unit tests over the full session-type × target × prefix-scenario matrix, including a realistic FORK lifecycle through UserSessionManager.create_and_store (proves shared-object identity + the guard).
  • Component-integration: strengthened the linear-weka marker test (every session marked; guards the seeded-resume fix), a reset-path test, and a de-flaked collision-free test (gated on zero-collisions, not machine-speed throughput).
  • Real tests/integration/ (subprocess over real ZMQ + workers + HTTP): per-target marker matrix, NONE, SPAWN subagent fan-out (children independently busted with their own distinct marker), and the reset path.

Reachability note

FORK + cache-bust is not reachable via supported config (cache-bust requires agentic_replay → requires a weka loader → weka emits only SPAWN). The FORK guard is therefore defense-in-depth, validated by the unit matrix; the reachable fan-out path (weka SPAWN subagents) has real end-to-end coverage.

🤖 Generated with Claude Code


Note

Medium Risk
Changes load-generator wire payloads and shared turn mutation paths used in agentic replay; mistakes could skew prefix-cache behavior or break FORK/SPAWN semantics, though coverage is extensive.

Overview
Fixes worker cache-bust injection so per-session [rid:HEX] markers land on the effective wire prefix (what build_messages actually sends), not on discarded or wrong turns.

Worker (_apply_cache_bust): Adds idempotent edge checks (_content_has_marker_at_edge) so repeated credits do not stack markers on shared turns. Introduces _effective_prefix_turns (last reset_context slice) and _apply_system_target_cache_bust; FIRST_TURN_* and SYSTEM fallbacks now mark every credit on that prefix (fixes seeded mid-trajectory resumes at k_i > 0). FORK children skip re-busting so they keep the parent’s shared prefix; SPAWN/roots still bust. Silent-drop warnings are limited to empty turn_list.

Tests: Large expansion of unit coverage; component tests require markers on all profiling sessions, add reset-context E2E, and relax collision test throughput floor (20 sessions, longer duration). New real-subprocess integration tests for per-target markers, NONE, SPAWN subagents, and reset semantics.

Reviewed by Cursor Bugbot for commit 1f161a7. Bugbot is set up for automated code reviews on this repo. Configure here.

ajcasagrande and others added 9 commits June 10, 2026 15:35
…-2 stacking

The cache-bust marker for FIRST_TURN_* / SYSTEM_* targets was applied only at
turn_index==0, on the assumption that later turns inherit it via the server's
prefix cache. But a turn with reset_context=True makes base_endpoint.build_messages
discard all accumulated prior turns and restart the wire payload from that turn's
raw_messages, so the new effective prefix carried no marker. Every recycled play
of the trace then replayed a byte-identical post-reset prefix, warming the server's
prefix cache across plays - exactly what cache-bust exists to prevent.

_apply_cache_bust now detects the current reset turn (_current_reset_turn) and
re-injects the marker into the reset turn's own prefix:
- FIRST_TURN_*: inject into the reset turn's first user message.
- SYSTEM_* sub-paths 2 (raw system role) and 3 (first-user fallback): scope the
  lookup to the reset turn instead of the discarded turn 0.
- SYSTEM_* sub-path 1 (Conversation-level system_message) is unchanged - it rides
  on RequestInfo and is re-emitted every turn independent of the reset.

The SYSTEM_* dispatch is extracted into _apply_system_target_cache_bust to keep
_apply_cache_bust within the function-size budget.

Also fixes a latent stacking bug surfaced while verifying the above: sub-path 2
mutates turn_list[0]'s system message in place every credit, and under delta modes
turn_list[0] is a single shared object across the session's turns, so the marker
stacked ([rid][rid]...S0). _inject_marker_into_raw_messages is now idempotent -
it bails when the (per-session constant) marker is already at the target position.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Agentic replay can resume a trajectory at turn k_i > 0 (warmup starts mid-
conversation). The worker's advance_turn back-fills turns 0..k_i into turn_list
under DELTAS_WITH_RESPONSES, so turn 0 - the conversation's opening user message
and the true wire prefix - is present even though credit.turn_index > 0. The
FIRST_TURN_* injection was gated on credit.turn_index == 0, so the seeded turn 0
never received the marker; recycled plays then shared a byte-identical opening
prefix and warmed the server's prefix cache.

Drop the turn_index == 0 gate: FIRST_TURN_* (and the SYSTEM_* no-system
first-user fallback) now inject into the conversation's opening user turn on
every credit, made safe by idempotency. The reset_context branch still takes
precedence (the reset turn, not the discarded turn 0, is the effective prefix).

Idempotency is extended from _inject_marker_into_raw_messages to all injection
helpers via a shared _content_has_marker_at_edge check, since unconditional
every-credit injection would otherwise stack the marker on the shared turn-0
object across a session's turns.

This composes with the prior reset_context fix: between them FIRST_TURN_* /
SYSTEM_* now cover turn 0, seeded mid-trajectory resume, and reset_context cuts.
The SYSTEM_* "silent drop on turn>0 with no system" warning is removed - that
case now injects every credit instead of being a no-op.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
The reset_context detection only inspected turn_list[-1], so it missed a
reset turn buried mid-history. build_messages restarts the wire array at
every reset_context turn, so the effective prefix begins at the LAST such
turn in turn_list. That turn is not always the current turn: a mid-trajectory
resume seeds turns 0..k_i into turn_list via advance_turn without dispatching
them, so a reset at some j < k_i is never the current turn. The marker then
landed on the discarded turn 0 while the real wire prefix (turn j) stayed
unmarked - recycled plays warmed the server cache on identical post-reset bytes.

Replace _current_reset_turn (turn_list[-1] only) with _effective_prefix_turns,
which scans turn_list backward for the last reset_context turn and returns the
slice from there to the end (or the whole list when there is no reset). Both
FIRST_TURN_* and SYSTEM_* now inject into the first system/user turn of that
slice, unifying the previously separate reset and non-reset branches.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
FORK children seed turn_list = list(parent.turn_list) (the SAME Turn objects,
same worker via sticky routing) and share the parent's KV cache by design.
The parent already injected its marker into those shared turns, so the child
inherits it for free. The every-credit FIRST_TURN/SYSTEM injection (added for
seeded resume) was re-busting the child's inherited prefix: it injected the
child's distinct marker into the parent's shared turn 0, which (a) diverged the
child's prefix from the parent's -> prefix-cache MISS (defeating the whole point
of FORK), (b) mutated the parent's Turn objects (session_manager documents them
as read-only post-construction), and (c) stacked markers since the idempotency
check only compares the current credit's marker. With wide fan-out, N children
stacked N markers onto the shared turn.

Make cache-bust a no-op for FORK children (parent_correlation_id set and
branch_mode == FORK): they inherit the parent's already-play-distinct prefix
via the shared object. SPAWN children start fresh (no shared turns) and root
sessions own their prefix, so both are still busted normally. FORK is the only
path that shares Turn objects across sessions, so this also removes the only
cross-session mutation hazard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ctions

Add deterministic unit coverage of the full cache-bust injection surface that
the recent fixes touched, plus strengthen the component test:

- FORK no-op across all four targets; FORK no-op even with a Conversation-level
  system_message; multi-turn FORK never stacks onto the shared turn; FORK child
  with its own reset is still a no-op (documents current behavior).
- Realistic FORK lifecycle through UserSessionManager.create_and_store: proves
  the child shares the parent's marked Turn object by identity and the guard
  leaves it untouched (byte-identical inherited prefix -> cache-share).
- SPAWN children and root sessions are busted across all targets.
- Idempotency stress: one session, many credits -> single marker (prefix+suffix).
- Seeded-resume x buried-reset suffix coverage.

Component test (test_agentic_replay_cache_bust): the FIRST_TURN_* path now marks
every session (seeded resume included), so the assertion is tightened from
">= 1 marked session" to "no session is unmarked" for all targets — this now
actively guards the seeded-resume fix. Stale comment describing the old
turn_index==0-only behavior is corrected.

Note: FORK + cache-bust is not reachable via supported config (cache-bust
requires agentic_replay, which requires a weka loader, which emits only SPAWN
children; FORK comes only from dag_jsonl, which forbids cache-bust). The FORK
guard is therefore defense-in-depth, validated by the unit matrix above.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Add a component-integration test that validates actual cache-bust markers AND
reset semantics together on the real wire, through the full agentic_replay
pipeline.

A weka trace fixture is crafted so turn 1 is a non-monotonic LCP cut: turn 0 is
a single 5-block user segment [1,2,3,4,5]; turn 1 shares only [1,2] (LCP=2),
landing inside the previously-emitted segment, so the reconstructor flags
reset_context=True. Two-turn traces => agentic_replay picks k_i=0 and resumes
profiling at turn 1, making the reset turn the profiling turn.

A deterministic loader-level guard first asserts the fixture genuinely produces
a reset_context turn (so the end-to-end assertions can't pass vacuously). The
benchmark run then asserts:
- ACTUAL MARKERS: every profiling session carries exactly one [rid:HEX] in its
  wire payload, distinct across sessions, with no message ever carrying >1 (no
  stacking).
- RESET SEMANTICS: the reset-turn (turn_index==1) requests carry the marker on
  their first user message -- the post-reset prefix. Before the reset fix the
  marker landed on the discarded turn 0, leaving the reset turn's wire prefix
  unmarked, so this is a real regression guard for that fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Add a tests/integration/ counterpart to the component_integration reset test.
This one runs the full aiperf subprocess against the shared mock server over
real ZMQ, real workers, and real HTTP (cli.run + aiperf_mock_server) rather than
in-process FakeCommunication, so it also exercises the serialization and
multiprocess session paths that the component test bypasses.

Same crafted reset fixture (turn 1 is a non-monotonic LCP cut -> reset_context),
same loader-level guard against vacuity, and the same assertions: every
profiling session carries exactly one distinct rid, no message stacks markers,
and the reset-turn (turn_index==1) requests carry the marker on their post-reset
prefix. Uses the offline-cached openai/gpt-oss-120b tokenizer per the
integration conftest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…PAWN fan-out)

Add genuine tests/integration coverage (real aiperf subprocess over real ZMQ +
workers + HTTP against the mock server) for cache-bust, complementing the
in-process component_integration tests:

- Per-target marker matrix (FIRST_TURN_PREFIX/SUFFIX, SYSTEM_PREFIX/SUFFIX):
  every profiling session carries exactly one marker in the target's carrier
  (first user turn for FIRST_TURN_*, system message for SYSTEM_*), markers are
  distinct across sessions (collision-free minting validated end-to-end), and
  no wire message ever stacks markers.
- NONE target: no rid marker appears anywhere on the wire.
- SPAWN subagent fan-out: a weka fixture with type:subagent entries produces
  SPAWN children, and each child is independently busted with its OWN marker,
  distinct from the parent root's. This is the reachable production fan-out +
  cache-bust path and exercises the non-FORK branch of the worker guard through
  the real subprocess. (FORK + cache-bust remains unreachable by config, so it
  stays unit-only; weka emits only SPAWN.)

Uses the offline-cached openai/gpt-oss-120b tokenizer per the integration
conftest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
The collision-free test gated on ">=50 distinct sessions" produced within a
6s benchmark window. Session count scales with machine speed, so on a loaded
machine the run produced ~22 sessions and the smoke-check floor failed even
though the actual zero-collision contract held.

The floor only guards against a vacuous run; the real regression bar is the
zero-duplicate assertion, which catches the pre-fix 33% collision rate with
~99.9% probability even at 20 sessions. Raise the window to 10s for headroom
and lower the floor to 20 -- well under what even a loaded machine produces --
so the test gates on collision-freeness, not throughput. The zero-collision
assertion is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
@github-actions

Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1f161a7281a7bc4b9a3f5d86da034579aae593b0

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1f161a7281a7bc4b9a3f5d86da034579aae593b0

Last updated for commit: 1f161a7Browse code

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1f161a7. Configure here.

return
existing = first.contents[0]
if _content_has_marker_at_edge(existing, marker, is_prefix=is_prefix):
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text seed idempotency mismatch

Medium Severity

_content_has_marker_at_edge treats a full prefix/suffix marker on strings, but _inject_marker_into_first_user_text seeds marker-only synthetic turns with marker.strip() (no surrounding newlines). Because injection now runs every credit, a second call does not recognize the edge marker and prepends/appends the full marker again, stacking [rid:…] tokens on the shared turn.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1f161a7. Configure here.

@cquil11 cquil11 merged commit 1dffd77 into SemiAnalysisAI:cjq/agentx-v0.3 Jun 11, 2026
2 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants