Skip to content

feat(agent): curator-leader convergence — member auto-converges from curator on reconnect#1193

Closed
branarakic wants to merge 1 commit into
feat/shared-memory-recoveryfrom
feat/curator-leader-convergence
Closed

feat(agent): curator-leader convergence — member auto-converges from curator on reconnect#1193
branarakic wants to merge 1 commit into
feat/shared-memory-recoveryfrom
feat/curator-leader-convergence

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

What

A private-CG member that reconnects after missing updates now automatically converges its shared memory to the curator's current state, via full per-(graph,subject) REPLACE recovery — with the curator never reverse-polluted by a CG it owns. Stacks on #1173 (the recovery apply + endpoint).

How

  • Curatorship by the STRUCTURAL curator. The SWM gate resolves "who is the curator" (and the curator's peer) from the wallet-scoped CG-id prefix (0x<addr>/<name>), not the dkg:curator/dkg:creator triples. This is robust to the rfc38 onboarding pattern where a member pre-creates the CG and self-stamps a dkg:curator triple — which otherwise makes global isCuratorOf resolve the member as the curator and silently skip its recovery.
  • Reconnect recovery. On connect, a private CG REPLACE-recovers from the curator (refreshMetaFromCurator + re-resolve); explicit per-branch logs (skipped/deferred/enqueued) replace the silent skip.
  • Discovery re-track. An already-subscribed private CG is re-tracked into the SWM-sync scope on discovery (a restart re-seeds subscriptions but not the sync scope, so the on-connect SWM pass skipped it).

Validation

Devnet regression scripts/devnet-test-curator-converge.sh: Gate A (auto-converge [v1]→[v3] on reconnect, no {v1,v3} union, curator unpolluted) and Gate B (a member-local root the curator lacks survives the sync) — both PASS on the realistic pre-create onboarding pattern. Gate C is a diagnostic for offline member→curator push (the seqno-watermark follow-up), not a gate.

Notes

Pairs with the union-corruption fix (separate PR) and is the back-fill the host-mode participant strip relies on. The global isCuratorOf defect is worked around inside the SWM gate here; a root fix (prefer the structural curator everywhere) matters for the members-only auth path and is tracked separately.

🤖 Generated with Claude Code

…iscovery re-track

A reconnecting private-CG member auto-converges its shared memory to the
curator's current state (full per-(graph,subject) REPLACE), with the curator
never reverse-polluted. Curatorship + curator-peer are resolved by the
STRUCTURAL curator (wallet-scoped id prefix), robust to the rfc38 pattern where
a member pre-creates the CG and self-stamps a dkg:curator triple. Plus discovery
re-track of an already-subscribed private CG into the SWM-sync scope.

Stacks on #1173 (recovery). Devnet: scripts/devnet-test-curator-converge.sh
Gate A (auto-converge, no union, curator unpolluted) + Gate B (member-local root
survives) PASS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
// `trackSyncContextGraph` is idempotent, so public/already-scoped CGs are
// unaffected.
if (await this.isPrivateContextGraph(id)) {
this.trackSyncContextGraph(id);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this re-adds any existing private CG to syncContextGraphs, but existing also covers entries that were explicitly unsubscribed or left host-only. unsubscribeFromContextGraph() keeps the record and removes only the sync scope, so the next discovery scan will silently opt that CG back into SWM sync and can pull private data back onto the node. Gate this on existing.subscribed (or another explicit opt-in flag) before calling trackSyncContextGraph.

}
// Resolve the structural curator's peer via the agent registry. On a
// reconnect the registry may not be populated yet, so refresh meta once.
let curatorPeerId = await resolveAgentPeer(structuralAgent);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: findAgents() is not a unique wallet-to-peer mapping; resolveCuratorPeerId() already treats same-wallet matches as ambiguous. Taking the first registry hit here can REPLACE-recover from the wrong peer, or skip the real curator, when multiple agents advertise the same agentAddress. Please require a unique match here, otherwise fall back to a deterministic creator/owner tie-breaker or skip recovery with a warning.

// gated above to be the curator and not self). Reuses the all-or-nothing recovery.
for (const contextGraphId of privateRecoverFromCurator) {
try {
const r = await this.recoverContextGraphSwmFromPeer(remotePeerId, contextGraphId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: recoverContextGraphSwmFromPeer() signals partial/time-limited fetches with completed: false without throwing, but this loop ignores that flag and leaves failedPhases/timedOutPhases at 0. sync-on-connect and the catch-up runner use those counters as the success signal, so an incomplete private recovery can be reported as clean and stop further retries while the member stays stale. Treat completed === false as a failed phase (or throw) before returning the summary.

branarakic pushed a commit that referenced this pull request Jun 16, 2026
…everse-pollution)

readDurableDataPage built its candidate graph set from everything under the CG
prefix, excluding only the top _meta and _private graphs — but NOT the
_shared_memory* graphs. So the durable DATA phase served SWM data, which the
requester blind-UNIONs (storeInsert). Two consequences, both devnet-proven via a
stack-trace on the curator's _shared_memory write (curator-converge Gate A):

  1. CORRUPTION: a curator's own SWM gets reverse-synced from a member on
     reconnect (the durable sync has no curator-skip), polluting the curator's
     single-valued roots into {v1,v3} — the exact union the #1193 curator-leader
     REPLACE exists to prevent. The member then faithfully replicates it.
  2. LEAK: gated/private SWM was served to any durable-data requester.

SWM is the EXCLUSIVE domain of the dedicated SWM phase (readSwmDataPage), which
applies per-(graph,subject) REPLACE + the structural-curator-skip + member
recovery. Exclude /_shared_memory* from the durable data candidate graphs.

Pre-existing (graph-plan.ts byte-identical to the stack; NOT introduced by the
Track-C reconciliation). Devnet curator-converge now: GATE A PASS + GATE B PASS
(member converges to [v3], curator stays [v3], no union, no reverse pollution).
Unit: 187 sync/SWM/catalog/durable-responder tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@branarakic

Copy link
Copy Markdown
Contributor Author

Closing as redundant — superseded by #1203 (integration/rfc49-full) + #1201, which landed the RFC-49 SWM/agent work on main.

Verified against current main: the stack tip #1193 has 0 residual (every file it touches is byte-identical in main), and this branch adds nothing not already in main. Any per-file delta is a superseded intermediate — e.g. a relocated contextGraphCatalogUri, or a pre-curator-ack dkg-publisher that main has since advanced past (main carries the confirmBeforeCommit curator-ack gate this branch lacks).

No unique unmerged work. Reopen if I've missed something.

@branarakic branarakic closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant