fix(cache-bust): correct marker placement across context resets, seeded resumes, and FORK children#6
Conversation
…-2 stacking The cache-bust marker for FIRST_TURN_* / SYSTEM_* targets was applied only at turn_index==0, on the assumption that later turns inherit it via the server's prefix cache. But a turn with reset_context=True makes base_endpoint.build_messages discard all accumulated prior turns and restart the wire payload from that turn's raw_messages, so the new effective prefix carried no marker. Every recycled play of the trace then replayed a byte-identical post-reset prefix, warming the server's prefix cache across plays - exactly what cache-bust exists to prevent. _apply_cache_bust now detects the current reset turn (_current_reset_turn) and re-injects the marker into the reset turn's own prefix: - FIRST_TURN_*: inject into the reset turn's first user message. - SYSTEM_* sub-paths 2 (raw system role) and 3 (first-user fallback): scope the lookup to the reset turn instead of the discarded turn 0. - SYSTEM_* sub-path 1 (Conversation-level system_message) is unchanged - it rides on RequestInfo and is re-emitted every turn independent of the reset. The SYSTEM_* dispatch is extracted into _apply_system_target_cache_bust to keep _apply_cache_bust within the function-size budget. Also fixes a latent stacking bug surfaced while verifying the above: sub-path 2 mutates turn_list[0]'s system message in place every credit, and under delta modes turn_list[0] is a single shared object across the session's turns, so the marker stacked ([rid][rid]...S0). _inject_marker_into_raw_messages is now idempotent - it bails when the (per-session constant) marker is already at the target position. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Agentic replay can resume a trajectory at turn k_i > 0 (warmup starts mid- conversation). The worker's advance_turn back-fills turns 0..k_i into turn_list under DELTAS_WITH_RESPONSES, so turn 0 - the conversation's opening user message and the true wire prefix - is present even though credit.turn_index > 0. The FIRST_TURN_* injection was gated on credit.turn_index == 0, so the seeded turn 0 never received the marker; recycled plays then shared a byte-identical opening prefix and warmed the server's prefix cache. Drop the turn_index == 0 gate: FIRST_TURN_* (and the SYSTEM_* no-system first-user fallback) now inject into the conversation's opening user turn on every credit, made safe by idempotency. The reset_context branch still takes precedence (the reset turn, not the discarded turn 0, is the effective prefix). Idempotency is extended from _inject_marker_into_raw_messages to all injection helpers via a shared _content_has_marker_at_edge check, since unconditional every-credit injection would otherwise stack the marker on the shared turn-0 object across a session's turns. This composes with the prior reset_context fix: between them FIRST_TURN_* / SYSTEM_* now cover turn 0, seeded mid-trajectory resume, and reset_context cuts. The SYSTEM_* "silent drop on turn>0 with no system" warning is removed - that case now injects every credit instead of being a no-op. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
The reset_context detection only inspected turn_list[-1], so it missed a reset turn buried mid-history. build_messages restarts the wire array at every reset_context turn, so the effective prefix begins at the LAST such turn in turn_list. That turn is not always the current turn: a mid-trajectory resume seeds turns 0..k_i into turn_list via advance_turn without dispatching them, so a reset at some j < k_i is never the current turn. The marker then landed on the discarded turn 0 while the real wire prefix (turn j) stayed unmarked - recycled plays warmed the server cache on identical post-reset bytes. Replace _current_reset_turn (turn_list[-1] only) with _effective_prefix_turns, which scans turn_list backward for the last reset_context turn and returns the slice from there to the end (or the whole list when there is no reset). Both FIRST_TURN_* and SYSTEM_* now inject into the first system/user turn of that slice, unifying the previously separate reset and non-reset branches. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
FORK children seed turn_list = list(parent.turn_list) (the SAME Turn objects, same worker via sticky routing) and share the parent's KV cache by design. The parent already injected its marker into those shared turns, so the child inherits it for free. The every-credit FIRST_TURN/SYSTEM injection (added for seeded resume) was re-busting the child's inherited prefix: it injected the child's distinct marker into the parent's shared turn 0, which (a) diverged the child's prefix from the parent's -> prefix-cache MISS (defeating the whole point of FORK), (b) mutated the parent's Turn objects (session_manager documents them as read-only post-construction), and (c) stacked markers since the idempotency check only compares the current credit's marker. With wide fan-out, N children stacked N markers onto the shared turn. Make cache-bust a no-op for FORK children (parent_correlation_id set and branch_mode == FORK): they inherit the parent's already-play-distinct prefix via the shared object. SPAWN children start fresh (no shared turns) and root sessions own their prefix, so both are still busted normally. FORK is the only path that shares Turn objects across sessions, so this also removes the only cross-session mutation hazard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…ctions Add deterministic unit coverage of the full cache-bust injection surface that the recent fixes touched, plus strengthen the component test: - FORK no-op across all four targets; FORK no-op even with a Conversation-level system_message; multi-turn FORK never stacks onto the shared turn; FORK child with its own reset is still a no-op (documents current behavior). - Realistic FORK lifecycle through UserSessionManager.create_and_store: proves the child shares the parent's marked Turn object by identity and the guard leaves it untouched (byte-identical inherited prefix -> cache-share). - SPAWN children and root sessions are busted across all targets. - Idempotency stress: one session, many credits -> single marker (prefix+suffix). - Seeded-resume x buried-reset suffix coverage. Component test (test_agentic_replay_cache_bust): the FIRST_TURN_* path now marks every session (seeded resume included), so the assertion is tightened from ">= 1 marked session" to "no session is unmarked" for all targets — this now actively guards the seeded-resume fix. Stale comment describing the old turn_index==0-only behavior is corrected. Note: FORK + cache-bust is not reachable via supported config (cache-bust requires agentic_replay, which requires a weka loader, which emits only SPAWN children; FORK comes only from dag_jsonl, which forbids cache-bust). The FORK guard is therefore defense-in-depth, validated by the unit matrix above. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Add a component-integration test that validates actual cache-bust markers AND reset semantics together on the real wire, through the full agentic_replay pipeline. A weka trace fixture is crafted so turn 1 is a non-monotonic LCP cut: turn 0 is a single 5-block user segment [1,2,3,4,5]; turn 1 shares only [1,2] (LCP=2), landing inside the previously-emitted segment, so the reconstructor flags reset_context=True. Two-turn traces => agentic_replay picks k_i=0 and resumes profiling at turn 1, making the reset turn the profiling turn. A deterministic loader-level guard first asserts the fixture genuinely produces a reset_context turn (so the end-to-end assertions can't pass vacuously). The benchmark run then asserts: - ACTUAL MARKERS: every profiling session carries exactly one [rid:HEX] in its wire payload, distinct across sessions, with no message ever carrying >1 (no stacking). - RESET SEMANTICS: the reset-turn (turn_index==1) requests carry the marker on their first user message -- the post-reset prefix. Before the reset fix the marker landed on the discarded turn 0, leaving the reset turn's wire prefix unmarked, so this is a real regression guard for that fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Add a tests/integration/ counterpart to the component_integration reset test. This one runs the full aiperf subprocess against the shared mock server over real ZMQ, real workers, and real HTTP (cli.run + aiperf_mock_server) rather than in-process FakeCommunication, so it also exercises the serialization and multiprocess session paths that the component test bypasses. Same crafted reset fixture (turn 1 is a non-monotonic LCP cut -> reset_context), same loader-level guard against vacuity, and the same assertions: every profiling session carries exactly one distinct rid, no message stacks markers, and the reset-turn (turn_index==1) requests carry the marker on their post-reset prefix. Uses the offline-cached openai/gpt-oss-120b tokenizer per the integration conftest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
…PAWN fan-out) Add genuine tests/integration coverage (real aiperf subprocess over real ZMQ + workers + HTTP against the mock server) for cache-bust, complementing the in-process component_integration tests: - Per-target marker matrix (FIRST_TURN_PREFIX/SUFFIX, SYSTEM_PREFIX/SUFFIX): every profiling session carries exactly one marker in the target's carrier (first user turn for FIRST_TURN_*, system message for SYSTEM_*), markers are distinct across sessions (collision-free minting validated end-to-end), and no wire message ever stacks markers. - NONE target: no rid marker appears anywhere on the wire. - SPAWN subagent fan-out: a weka fixture with type:subagent entries produces SPAWN children, and each child is independently busted with its OWN marker, distinct from the parent root's. This is the reachable production fan-out + cache-bust path and exercises the non-FORK branch of the worker guard through the real subprocess. (FORK + cache-bust remains unreachable by config, so it stays unit-only; weka emits only SPAWN.) Uses the offline-cached openai/gpt-oss-120b tokenizer per the integration conftest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
The collision-free test gated on ">=50 distinct sessions" produced within a 6s benchmark window. Session count scales with machine speed, so on a loaded machine the run produced ~22 sessions and the smoke-check floor failed even though the actual zero-collision contract held. The floor only guards against a vacuous run; the real regression bar is the zero-duplicate assertion, which catches the pre-fix 33% collision rate with ~99.9% probability even at 20 sessions. Raise the window to 10s for headroom and lower the floor to 20 -- well under what even a loaded machine produces -- so the test gates on collision-freeness, not throughput. The zero-collision assertion is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Anthony Casagrande <acasagrande@nvidia.com>
Try out this PRQuick install: pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1f161a7281a7bc4b9a3f5d86da034579aae593b0Recommended with virtual environment (using uv): uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@1f161a7281a7bc4b9a3f5d86da034579aae593b0Last updated for commit: |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1f161a7. Configure here.
| return | ||
| existing = first.contents[0] | ||
| if _content_has_marker_at_edge(existing, marker, is_prefix=is_prefix): | ||
| return |
There was a problem hiding this comment.
Text seed idempotency mismatch
Medium Severity
_content_has_marker_at_edge treats a full prefix/suffix marker on strings, but _inject_marker_into_first_user_text seeds marker-only synthetic turns with marker.strip() (no surrounding newlines). Because injection now runs every credit, a second call does not recognize the edge marker and prepends/appends the full marker again, stacking [rid:…] tokens on the shared turn.
Reviewed by Cursor Bugbot for commit 1f161a7. Configure here.


Summary
Fixes how the cache-bust marker is placed by the worker (
_apply_cache_bust) so it lands on the effective wire prefix in every case, and adds extensive unit + real-subprocess integration coverage. Four correctness fixes, then tests.Fixes
reset_contextre-application + sub-path-2 stacking — areset_contextturn makesbuild_messagesdiscard the accumulated prefix and restart from that turn, so the turn-0 marker was lost from the new prefix; recycled plays then replayed a byte-identical post-reset prefix. Also fixed a latent stacking bug where_inject_marker_into_raw_messagesre-mutated the sharedturn_list[0]system message every credit under delta modes. Injection is now idempotent.Seeded mid-trajectory resume — agentic_replay warmup resumes at
k_i > 0;advance_turnback-fills turns0..k_i, so turn 0 (the real prefix) was present but theturn_index == 0gate skipped it, leaving the AgentX-lockedFIRST_TURN_PREFIXpath's initial trajectories unmarked. Injection now runs every credit (idempotent) at the effective prefix.Last/buried reset turn —
build_messagesrestarts at every reset turn, so the effective prefix begins at the last reset turn inturn_list, which can be buried mid-history (seeded on a resume, never the current turn). Replaced theturn_list[-1]-only check with_effective_prefix_turns, which scans back to the last reset turn and unifies the FIRST_TURN_* and SYSTEM_* paths.FORK children inherit, never re-bust — FORK children seed
turn_list = list(parent.turn_list)(the same Turn objects, same worker) and share the parent's KV cache by design. The every-credit injection would otherwise mutate the parent's shared turn and stack markers. Cache-bust is now a no-op for FORK children (they inherit the parent's already-play-distinct prefix); SPAWN children and roots are still busted.Tests
UserSessionManager.create_and_store(proves shared-object identity + the guard).tests/integration/(subprocess over real ZMQ + workers + HTTP): per-target marker matrix, NONE, SPAWN subagent fan-out (children independently busted with their own distinct marker), and the reset path.Reachability note
FORK + cache-bust is not reachable via supported config (cache-bust requires
agentic_replay→ requires a weka loader → weka emits only SPAWN). The FORK guard is therefore defense-in-depth, validated by the unit matrix; the reachable fan-out path (weka SPAWN subagents) has real end-to-end coverage.🤖 Generated with Claude Code
Note
Medium Risk
Changes load-generator wire payloads and shared turn mutation paths used in agentic replay; mistakes could skew prefix-cache behavior or break FORK/SPAWN semantics, though coverage is extensive.
Overview
Fixes worker cache-bust injection so per-session
[rid:HEX]markers land on the effective wire prefix (whatbuild_messagesactually sends), not on discarded or wrong turns.Worker (
_apply_cache_bust): Adds idempotent edge checks (_content_has_marker_at_edge) so repeated credits do not stack markers on shared turns. Introduces_effective_prefix_turns(lastreset_contextslice) and_apply_system_target_cache_bust; FIRST_TURN_* and SYSTEM fallbacks now mark every credit on that prefix (fixes seeded mid-trajectory resumes atk_i > 0). FORK children skip re-busting so they keep the parent’s shared prefix; SPAWN/roots still bust. Silent-drop warnings are limited to emptyturn_list.Tests: Large expansion of unit coverage; component tests require markers on all profiling sessions, add reset-context E2E, and relax collision test throughput floor (20 sessions, longer duration). New real-subprocess integration tests for per-target markers, NONE, SPAWN subagents, and reset semantics.
Reviewed by Cursor Bugbot for commit 1f161a7. Bugbot is set up for automated code reviews on this repo. Configure here.