feat(agent): curator-leader convergence — member auto-converges from curator on reconnect#1193
Conversation
…iscovery re-track A reconnecting private-CG member auto-converges its shared memory to the curator's current state (full per-(graph,subject) REPLACE), with the curator never reverse-polluted. Curatorship + curator-peer are resolved by the STRUCTURAL curator (wallet-scoped id prefix), robust to the rfc38 pattern where a member pre-creates the CG and self-stamps a dkg:curator triple. Plus discovery re-track of an already-subscribed private CG into the SWM-sync scope. Stacks on #1173 (recovery). Devnet: scripts/devnet-test-curator-converge.sh Gate A (auto-converge, no union, curator unpolluted) + Gate B (member-local root survives) PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
| // `trackSyncContextGraph` is idempotent, so public/already-scoped CGs are | ||
| // unaffected. | ||
| if (await this.isPrivateContextGraph(id)) { | ||
| this.trackSyncContextGraph(id); |
There was a problem hiding this comment.
🔴 Bug: this re-adds any existing private CG to syncContextGraphs, but existing also covers entries that were explicitly unsubscribed or left host-only. unsubscribeFromContextGraph() keeps the record and removes only the sync scope, so the next discovery scan will silently opt that CG back into SWM sync and can pull private data back onto the node. Gate this on existing.subscribed (or another explicit opt-in flag) before calling trackSyncContextGraph.
| } | ||
| // Resolve the structural curator's peer via the agent registry. On a | ||
| // reconnect the registry may not be populated yet, so refresh meta once. | ||
| let curatorPeerId = await resolveAgentPeer(structuralAgent); |
There was a problem hiding this comment.
🔴 Bug: findAgents() is not a unique wallet-to-peer mapping; resolveCuratorPeerId() already treats same-wallet matches as ambiguous. Taking the first registry hit here can REPLACE-recover from the wrong peer, or skip the real curator, when multiple agents advertise the same agentAddress. Please require a unique match here, otherwise fall back to a deterministic creator/owner tie-breaker or skip recovery with a warning.
| // gated above to be the curator and not self). Reuses the all-or-nothing recovery. | ||
| for (const contextGraphId of privateRecoverFromCurator) { | ||
| try { | ||
| const r = await this.recoverContextGraphSwmFromPeer(remotePeerId, contextGraphId); |
There was a problem hiding this comment.
🔴 Bug: recoverContextGraphSwmFromPeer() signals partial/time-limited fetches with completed: false without throwing, but this loop ignores that flag and leaves failedPhases/timedOutPhases at 0. sync-on-connect and the catch-up runner use those counters as the success signal, so an incomplete private recovery can be reported as clean and stop further retries while the member stays stale. Treat completed === false as a failed phase (or throw) before returning the summary.
…everse-pollution)
readDurableDataPage built its candidate graph set from everything under the CG
prefix, excluding only the top _meta and _private graphs — but NOT the
_shared_memory* graphs. So the durable DATA phase served SWM data, which the
requester blind-UNIONs (storeInsert). Two consequences, both devnet-proven via a
stack-trace on the curator's _shared_memory write (curator-converge Gate A):
1. CORRUPTION: a curator's own SWM gets reverse-synced from a member on
reconnect (the durable sync has no curator-skip), polluting the curator's
single-valued roots into {v1,v3} — the exact union the #1193 curator-leader
REPLACE exists to prevent. The member then faithfully replicates it.
2. LEAK: gated/private SWM was served to any durable-data requester.
SWM is the EXCLUSIVE domain of the dedicated SWM phase (readSwmDataPage), which
applies per-(graph,subject) REPLACE + the structural-curator-skip + member
recovery. Exclude /_shared_memory* from the durable data candidate graphs.
Pre-existing (graph-plan.ts byte-identical to the stack; NOT introduced by the
Track-C reconciliation). Devnet curator-converge now: GATE A PASS + GATE B PASS
(member converges to [v3], curator stays [v3], no union, no reverse pollution).
Unit: 187 sync/SWM/catalog/durable-responder tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Closing as redundant — superseded by #1203 ( Verified against current No unique unmerged work. Reopen if I've missed something. |
What
A private-CG member that reconnects after missing updates now automatically converges its shared memory to the curator's current state, via full per-(graph,subject) REPLACE recovery — with the curator never reverse-polluted by a CG it owns. Stacks on #1173 (the recovery apply + endpoint).
How
0x<addr>/<name>), not thedkg:curator/dkg:creatortriples. This is robust to the rfc38 onboarding pattern where a member pre-creates the CG and self-stamps adkg:curatortriple — which otherwise makes globalisCuratorOfresolve the member as the curator and silently skip its recovery.refreshMetaFromCurator+ re-resolve); explicit per-branch logs (skipped/deferred/enqueued) replace the silent skip.Validation
Devnet regression
scripts/devnet-test-curator-converge.sh: Gate A (auto-converge[v1]→[v3]on reconnect, no{v1,v3}union, curator unpolluted) and Gate B (a member-local root the curator lacks survives the sync) — both PASS on the realistic pre-create onboarding pattern. Gate C is a diagnostic for offline member→curator push (the seqno-watermark follow-up), not a gate.Notes
Pairs with the union-corruption fix (separate PR) and is the back-fill the host-mode participant strip relies on. The global
isCuratorOfdefect is worked around inside the SWM gate here; a root fix (prefer the structural curator everywhere) matters for the members-only auth path and is tracked separately.🤖 Generated with Claude Code