Issue #1138 — un-peg sync-responder CPU: A1–A4 + B0–B2 (§7 checkpoint PASS)#1164
Merged
Conversation
A1 validated server-backed (oxigraph-server/RocksDB): durable-data warm <500ms / durable-meta 102ms / SWM warm pages cheap — all perf phases under budget; D-SEC admission-parity 12/12; full agent suite 1283 green. CPU peg resolved. Merging to integration branch.
A2 validated: asAbortError fix verified (core 1060/0-errors), p2p-messenger delegation tests updated (3/3), A2 suites (72) + storage (207) green, server-backed perf 3/3 (caps don't regress A1 snapshot perf). Full-suite e2e reds (e2e-sub-graphs, e2e-publish-protocol) are pre-existing parallel-load flakes — pass 100% in isolation, not A2-caused. Merging to integration.
…1138 Brings the data-volume half (#1155) under the A1+A2 access-pattern work. Single conflict: sync-handler.ts durable-data else branch — took A1's structure (inline delta query was re-homed to graph-plan.readDurableDataPage); #1155's read-both collapsed-KA arm is already present in A1's durableDeltaWhereClauseForGraphs (legacy partOf arm + collapsed rootEntity+batchId arm), so the coordination re-port is satisfied. #1155 preserves the sync _meta predicates (memoryLayer/assertionGraph), so A1 durable-meta admission + D-SEC are unaffected. Builds green (tsc 8/8).
…hygiene (#1143) Squash-merge of codex/issue-1138-a3-sync-scheduler-progress (15 commits) onto integration/issue-1138 (A1+A2+#1155), plus one stale-test fix. A3 makes the sync requester/scheduler honest about failure: - separates peer-reachability failures (failedPeers) from phase failures (failedPhases); the lifecycle success-gate requires both zero, so a failed/timed-out/denied round no longer stamps success → backoff engages - per-(peer, contextGraph) progress accounting (defines a 'clean round') - freshness-aware durable checkpoints; metadata-only freshness is not counted as data progress - flap hygiene: cooldowns/backoff survive connection:close Validation (worktree dkg-a1-verify, integration is not CI-gated): - build + tsc clean (agent/cli/node-ui) - agent suite 1361 passed / 0 failed; node-ui 1431/0 - cli 82 fails are pre-existing Windows-host env (identical on base) — A3's own new cli tests pass - server-backed responder perf guard 3/3 (A3 doesn't touch the responder) - 4-lens adversarial review: 0 confirmed issues Stale-test fix: swm-snapshot-sync 'remote snapshots unavailable' is a phase failure, not a peer failure → assert failedPeers=0 + failedPhases=1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) * fix(agent): damp VM reconcile retries * fix(sync): harden VM reconcile cache keys * fix(sync): harden VM reconcile negative cache * fix(sync): refresh VM reconcile cache inputs * fix(sync): preserve VM reconcile peer context * fix(sync): bound VM reconcile replay state * fix(sync): stabilize VM reconcile state cleanup * fix(sync): keep VM reconcile probes cheap * fix(sync): reprime VM reconcile peers before defer * fix(sync): recheck VM reconcile fallbacks * fix(agent): fold reconcile robustness into A4 * fix(agent): address A4 review edge cases * fix(agent): harden VM reconcile damping retries * fix(agent): scope VM reconcile cache by context graph * fix(agent): avoid unsafe reconcile and warm-core trims * fix(agent): retry VM reconcile after SWM fetch progress * fix(agent): clear VM reconcile state on CG resets * fix(agent): fully reset VM reconcile state on CG rebind * fix(agent): drain core host recordings on stop * fix(agent): harden VM reconcile SWM generation cache * fix(agent): complete VM reconcile peer rotation pass * fix(agent): bound core host ACK recording paths * fix(agent): isolate warm-core unpin retries * fix(agent): finish A4 review edge cases * fix(agent): settle A4 recording and peer metrics * fix(agent): close A4 stale recording races * fix(agent): settle A4 liveness and peer metrics * fix(agent): settle A4 liveness follow-ups --------- Co-authored-by: Jurij Skornik <jurij.skornik@gmail.com>
…#1145) * fix(cli): use cheap context graph metrics count * fix(cli): preserve known context graph metrics * fix(cli): refine context graph metric candidates * fix(cli): bound context graph metric metadata reads * fix(cli): narrow context graph declaration metrics * fix(cli): accept nested wallet context graph metrics * fix(cli): exclude system graphs from CG metrics * fix(cli): prefer canonical wallet metric candidates * fix(cli): canonicalize context graph metric aliases * fix(cli): suppress proven metric shadow aliases * fix(cli): prefer wire identity for CG metrics * fix(cli): constrain CG declaration metric source * fix(cli): skip graph-derived metric shadows * fix(cli): require backing for slash CG metrics * fix(cli): allow slash CG declaration candidates * fix(cli): keep context graph metric cheap --------- Co-authored-by: Jurij Skornik <jurij.skornik@gmail.com>
* fix(cli): add exact context graph write preflight * fix: tighten B1 preflight authorization paths * fix: constrain B1 exact preflight fast path * fix(cli): require public policy for tokenless preflight * fix(cli): preserve preflight failure context --------- Co-authored-by: Jurij Skornik <jurij.skornik@gmail.com>
* fix(chain): bound context graph registry scans * fix: surface partial B2 registry scans * fix: finalize B2 partial scan pages before watermark * fix: preserve B2 registry scan resume bounds * fix: keep B2 chain discovery opt-in incremental * fix: reuse shared B2 partial scan guard * fix: allow degraded B2 registry scans past budget * fix: keep B2 daemon chain full resync path * fix(chain): keep full registry scans unbounded * fix(cli): seed registry scan watermark after full scans * fix(cli): retry full registry scans until seeded * fix(cli): surface failed registry seed scans * fix(cli): derive registry scan mode from adapter watermark * fix(agent): rethrow strict partial registry scans --------- Co-authored-by: Jurij Skornik <jurij.skornik@gmail.com>
This was referenced Jun 14, 2026
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Lands the validated un-peg gate for the sync-responder / oxigraph CPU peg (#1127 / #1136), tracked in #1138 — the set of fixes that takes a data-rich node from a machine-pegging, non-converging sync storm to a converging, sub-core steady state.
Included (waves — each independently reviewed + validated)
GRAPH ?gO(store) scans + per-rowEXISTS. Includes the §4.4 cross-CGD-SECfix (disclosed in Sync responder's unbound GRAPH ?g page queries (O(store) per page) + retry/scheduler loops cause the ~9-core oxigraph peg (#1127) and the write-path failures (#1136) — analysis, staged fix plan, and an independent unauthenticated cross-CG read bypass in the same responder #1138; must land before mainnet).failedPeersvsfailedPhases).main; reconciled here.§7 field-repro checkpoint — PASS
Controlled rig (real oxigraph-server 0.5.8 / RocksDB, field-shaped ~150k-quad CG, N concurrent peers), same-rig before/after:
~40× reduction, ≤0.38 cores ≪ the
< 1 coretarget (spec predicted ~9.6 → "well under 0.5"). Mechanisms confirmed live: A1 bounded+cached, A2 cap sheds floods (peer queue full), A1 one-session-per-CG (session superseded), A3 converges.Validation
Each wave: Codex review + multi-lens adversarial review + server-backed (oxigraph-server, not embedded) perf validation. The integration branch was not CI-gated (workflows pinned to
main/rc) — this PR is the first CI run on the stack. The known pre-existing #1161 e2e-publish flake may surface (isolation-passes; not introduced here).Not included (follow-ups, direct to main)
Merge note
Please use a merge commit (not squash) to preserve the per-wave history (A1…B2).
🤖 Generated with Claude Code