feat(agent): shared-memory recovery — REPLACE apply + ACL gate + endpoint by branarakic · Pull Request #1173 · OriginTrail/dkg

branarakic · 2026-06-14T23:27:25Z

Part 5/5 · base: feat/facet-open-serve.

Shared-memory recovery driver — a member repairs a stale or corrupt private shared-memory copy by pulling the current state from one chosen authoritative peer.

REPLACE apply (applySwmRecovery) — per-KA-graph clear + reinsert, fixing the blind-union corruption where re-syncing status=v2 into a store holding v1 left both ({v1, v2}) permanently.
All-or-nothing — never REPLACE over an incomplete fetch (row-based pagination can cut a root mid-stream); on a partial fetch it mutates nothing and drops the checkpoint so the retry re-fetches the complete state.
Own checkpoint namespace — recovery can't clobber the background incremental-sync cursor. (The members-only auth gate it triggers lives in the facet PR this is based on.)
Explicit endpoint POST /api/context-graph/recover-shared-memory {contextGraphId, remotePeerId} — pulls from one chosen authoritative peer, never a blanket on-connect sweep.
scripts/devnet-test-swm-recovery.sh — multi-node regression proving REPLACE (v1 → v2, no {v1,v2} union).

recovery tests green. Please do not merge before review / out of order.

🤖 Generated with Claude Code

Depends on #1175 — please merge that first.

github-actions · 2026-06-14T23:30:04Z

+      remotePeerId,
+      contextGraphId,
+      deadline: this.createContextGraphSyncDeadline(1),
+      fetchSyncPages: this.fetchSyncPages.bind(this),


🔴 Bug: this recovery path still goes through the normal sync transport, but nothing in the request envelope/responder differentiates it from ordinary SWM sync. The new isMemberRecoveryAuthorized helper is never consulted, so recovery is still authorized by authorizePrivateSyncRequest's participant/peer fallback instead of the intended hard-deny members-only gate. Please plumb an explicit recovery auth mode through the request/response path and enforce the fresh agent gate on the responder before serving plaintext SWM.

github-actions · 2026-06-14T23:30:04Z

+        const graphManager = new GraphManager(this.store);
+        await graphManager.ensureContextGraph(cgId);
+      },
+      setCheckpoint: (key, offset) => this.syncCheckpoints.set(key, offset),


🔴 Bug: recovery is reusing the same syncCheckpoints namespace as background SWM catch-up (remotePeerId|contextGraphId|swm|phase). If a normal sync already has saved progress for this peer/context graph, running recovery here can overwrite or delete that cursor and force the incremental sync to restart from offset 0. Give recovery its own checkpoint/session scope instead of mutating the shared incremental-sync state.

github-actions

Codex review produced 4 comment(s) but all targeted lines outside the diff and were dropped. Check the workflow logs for details.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-15T08:05:32Z

+  // has no cross-invocation accumulator, so the retry must restart from
+  // offset 0 (which makes the responder re-read from the start of its row
+  // list) and rebuild the COMPLETE state before the apply gate can pass.
+  if (lastCheckpointKey !== undefined) deps.deleteCheckpoint(lastCheckpointKey);


🔴 Bug: Dropping only the persisted checkpoint here does not actually force a clean retry. fetchSyncPages also keeps the unfinished responder session in its module-local cache keyed by the same checkpointKey, so a retry before that TTL expires will reuse the old syncSessionId with offset=0 and the responder will keep serving the cached pre-timeout row list instead of re-reading current state. Recovery can then converge to a stale snapshot. When abandoning a partial recovery, clear/rotate the responder-session state as well, not just the checkpoint offset.

github-actions · 2026-06-15T08:05:32Z

+  const applied = await applySwmRecovery({
+    store: deps.store,
+    verifiedData: processed.verifiedData,
+    roots: processed.entityCreators,


🔴 Bug: processed.entityCreators is not the full set of roots that need REPLACE semantics here. The verifier only adds an entry when it can recover a creator from meta, but verifiedData can still contain valid rows for roots whose op is missing dkg:publisherPeerId / literal prov:wasAttributedTo. Those roots will be inserted without first deleting the stale local rows, which reintroduces the {v1,v2} union corruption this recovery path is meant to fix. Build the roots list from all validated rootEntity subjects (or return that separately from the verifier), and use creators only for ownership-cache hydration.

github-actions · 2026-06-15T08:05:32Z

+      const ownershipKey = sharedMemoryOwnershipKeyFromGraph(deps.contextGraphId, dataGraph);
+      if (!ownershipKey) continue;
+      const ownedMap = deps.ensureOwnedMap(ownershipKey);
+      if (!ownedMap.has(entity)) ownedMap.set(entity, creator);


🔴 Bug: Recovery is supposed to repair stale local state, but this only hydrates missing ownership-cache entries. If the local cache already contains the wrong owner for a recovered root, the stale mapping survives and the next contended write is still arbitrated incorrectly even though the triples were repaired. Overwrite the cached owner for recovered roots (or clear the key before hydration) so the cache matches the authoritative source you just applied.

) A partial recovery fetch deletes its checkpoint to restart from offset 0, but the requester remembered the responder session under the recovery checkpointKey — so the retry reused the old syncSessionId and the responder re-served its cached pre-timeout row list, converging to a STALE snapshot (up to the session TTL old) instead of current state. R10 already scopes recovery's responder session separately; this completes the intent: a recovery fetch never persists a responder session (timeout + error paths), so a retry mints a fresh id and the responder re-reads. Bounded before (self-heals after TTL); now correct on the retry. Adds the contrasting unit test (non-recovery reuses; recovery mints fresh). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-15T21:43:05Z

+    verifiedData: processed.verifiedData,
+    roots: processed.entityCreators,
+  });
+  if (processed.verifiedMeta.length > 0) {


🔴 Bug: Unlike runSharedMemorySync, this path never calls syncPublicSnapshotsForMeta before returning completed: true. Any recovered meta rows that reference dkg:publicSnapshotRef will point at snapshot payloads that were never fetched into publicSnapshotStore, so recovery can report success while leaving snapshot-backed SWM data unreadable/incomplete. Mirror the normal snapshot fetch/validation step here before treating recovery as complete.

github-actions · 2026-06-15T21:43:05Z

+      const ownershipKey = sharedMemoryOwnershipKeyFromGraph(deps.contextGraphId, dataGraph);
+      if (!ownershipKey) continue;
+      const ownedMap = deps.ensureOwnedMap(ownershipKey);
+      if (!ownedMap.has(entity)) ownedMap.set(entity, creator);


🟡 Issue: Recovery only hydrates ownership entries when they are missing. If the local ownership cache already contains a stale creator for one of these roots, the recovered triples are correct but the next contended write is still arbitrated with the wrong owner. Since this endpoint is explicitly repairing stale/corrupt local SWM state, it should overwrite recovered ownership entries (or clear them first) rather than leaving existing values in place.

branarakic · 2026-06-17T09:16:59Z

Closing as redundant — superseded by #1203 (integration/rfc49-full) + #1201, which landed the RFC-49 SWM/agent work on main.

Verified against current main: the stack tip #1193 has 0 residual (every file it touches is byte-identical in main), and this branch adds nothing not already in main. Any per-file delta is a superseded intermediate — e.g. a relocated contextGraphCatalogUri, or a pre-curator-ack dkg-publisher that main has since advanced past (main carries the confirmBeforeCommit curator-ack gate this branch lacks).

No unique unmerged work. Reopen if I've missed something.

branarakic requested review from Jurij89 and zsculac as code owners June 14, 2026 23:27

github-actions Bot reviewed Jun 14, 2026

View reviewed changes

branarakic force-pushed the feat/public-context-graph-projection branch from 49e0c6e to 3423171 Compare June 15, 2026 01:02

branarakic force-pushed the feat/shared-memory-recovery branch from 3927262 to 70f45ad Compare June 15, 2026 01:02

branarakic changed the base branch from feat/public-context-graph-projection to feat/facet-open-serve June 15, 2026 01:03

github-actions Bot reviewed Jun 15, 2026

View reviewed changes

feat(agent): shared-memory recovery — per-KA-graph REPLACE + endpoint

479a34b

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

branarakic force-pushed the feat/facet-open-serve branch from 76fcd31 to ccd5c2f Compare June 15, 2026 08:02

branarakic force-pushed the feat/shared-memory-recovery branch from 70f45ad to 479a34b Compare June 15, 2026 08:02

github-actions Bot reviewed Jun 15, 2026

View reviewed changes

branarakic mentioned this pull request Jun 15, 2026

feat(agent): curator-leader convergence — member auto-converges from curator on reconnect #1193

Closed

github-actions Bot reviewed Jun 15, 2026

View reviewed changes

branarakic mentioned this pull request Jun 16, 2026

RFC-49 agent half: catalog model + SWM recovery + curator-leader convergence + strict curator-ack gate #1201

Merged

branarakic closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): shared-memory recovery — REPLACE apply + ACL gate + endpoint#1173

feat(agent): shared-memory recovery — REPLACE apply + ACL gate + endpoint#1173
branarakic wants to merge 2 commits into
feat/facet-open-servefrom
feat/shared-memory-recovery

branarakic commented Jun 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot Jun 14, 2026

Uh oh!

github-actions Bot Jun 14, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Jun 15, 2026

Uh oh!

github-actions Bot Jun 15, 2026

Uh oh!

github-actions Bot Jun 15, 2026

Uh oh!

github-actions Bot Jun 15, 2026

Uh oh!

github-actions Bot Jun 15, 2026

Uh oh!

branarakic commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

branarakic commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

branarakic commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

branarakic commented Jun 14, 2026 •

edited

Loading