RFC-49 agent half: catalog model + SWM recovery + curator-leader convergence + strict curator-ack gate#1201
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…iscovery re-track A reconnecting private-CG member auto-converges its shared memory to the curator's current state (full per-(graph,subject) REPLACE), with the curator never reverse-polluted. Curatorship + curator-peer are resolved by the STRUCTURAL curator (wallet-scoped id prefix), robust to the rfc38 pattern where a member pre-creates the CG and self-stamps a dkg:curator triple. Plus discovery re-track of an already-subscribed private CG into the SWM-sync scope. Stacks on #1173 (recovery). Devnet: scripts/devnet-test-curator-converge.sh Gate A (auto-converge, no union, curator unpolluted) + Gate B (member-local root survives) PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ng (Codex #1171) A curated publish whose every quad is a public catalog entry on the context-graph DID (reachable — that DID namespace is not reserved) partitions to empty otherQuads, leaving encryptableNquadsStr empty, which threw an opaque 'rejects empty plaintext' deep in the chunked encryptor. Guard early, curated-only (public CGs never enter the encrypt branch), with an actionable error: the catalog entry alone cannot satisfy the required ciphertext commitment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 73eb710)
…1171, security) isContextGraphRegistered used a cross-graph ASK { GRAPH ?g { <cgUri> a dkg:ContextGraph } }, so a publisher could author that type triple in their own content graph (the rdf:type object isn't a reserved IRI) and SPOOF registration — making reconstructSharedMemoryOwnership treat a sub-graph as a registered root and derive the wrong ownership key after restart. The triple is only ever written into the system ONTOLOGY data graph (public CGs + every publish) or the CG's own _meta graph (curated; createContextGraph defGraph = isCurated ? _meta : ontology). Scope the ASK to exactly those two graphs. Adds a focused security test (legit in either authoritative graph -> true; spoof in any user content graph -> false). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 9c350d2)
) A partial recovery fetch deletes its checkpoint to restart from offset 0, but the requester remembered the responder session under the recovery checkpointKey — so the retry reused the old syncSessionId and the responder re-served its cached pre-timeout row list, converging to a STALE snapshot (up to the session TTL old) instead of current state. R10 already scopes recovery's responder session separately; this completes the intent: a recovery fetch never persists a responder session (timeout + error paths), so a retry mints a fresh id and the responder re-reads. Bounded before (self-heals after TTL); now correct on the retry. Adds the contrasting unit test (non-recovery reuses; recovery mints fresh). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> (cherry picked from commit 32829ff)
Three #1193 curator-leader-convergence accounting/scope fixes + one stale config doc, all verified non-critical (no data loss/privacy/crash), surfaced by the stack triage of Codex #1193/#1175 findings: - dkg-agent discovery re-track now gates on existing.subscribed, so an explicitly unsubscribed (or host-only) private CG is no longer silently re-added to the SWM-sync scope on every discovery scan. - on-connect curator-peer resolution checks the connecting peer against ALL registry peers for the curator wallet instead of a first-match pick, so a stale/duplicate wallet->peer entry can no longer wrongly defer/mis-gate recovery (findAgents is not a unique wallet->peer map). - a partial (completed:false) curator REPLACE recovery now counts as a failed phase (forces a prompt retry) and a complete one as a completed phase, so an incomplete private recovery is no longer reported clean and a successful idempotent 0-insert REPLACE still registers as progress. - publicProjectionContextGraphId doc corrected: the projection is written into the SOURCE CG's own _catalog graph (open-serve surface), and the value is only an enable flag + self-projection guard, not a write target (B7/B8). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in (Track-C) Re-homes the stack's unique work onto main's hardened base, per the per-subsystem reconciliation plan + 5 settled design decisions: - projection (context-graph-meta-projection.ts): COMBINED — main's hardened Track-C base (abort/signal threading, defensive copies, chunked enumeration, case-sensitivity split) + grafted the OT-RFC-49 catalog read-path (floor-filtered _catalog source, catalog discovery via the chunked idiom, catalog dirty-tracking). Access policy = STICKY-PRIVATE one-way ratchet (decision #2; main test #94 inverted). Catalog source is predicate-filtered to the disclosure floor (defense-in-depth vs untrusted peer-fetched catalogs). - publisher: kept the stack's CRITICAL registration-ASK ownership-takeover fix + catalog partition/persist; carried main's NO_DATA_IN_SWM retry + ciphertextChunks + skipContextGraphEnsure. - workspace-handler: take-stack (hardened per-field gate resolution). - agent cg-resolve/crypto/ownership/context-graph/cg-registry/gossip + lifecycle: take-main call-sites (signal-threaded getCgMeta + list-cache invalidation superset), preserving the stack's catalog-responder/recovery/#1193 code; recovery store: wrapper folds in main's invalidateListContextGraphsCache. - storage: take-main (strict superset: abort-detach, telemetry, options-threaded). - CLI/agent tests: take-main (real-server oxigraph suite; part-13 aligns w/ combined projection's meta-first mechanism). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 3-way merge duplicated module-level helpers where main RELOCATED/RENAMED code the stack also had (git added both copies, no conflict markers): the list-privacy helpers + ContextGraphListRow interface in dkg-agent-cg-resolve.ts (kept main's ListContextGraphsRow + fail-closed applyContextGraphListPrivacy per decision #5), and listGraphFamily/listGraphsByPrefix in dkg-agent-lifecycle.ts (identical copies). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…th sides) Covers the OT-RFC-49 catalog read-path grafted in the Track-C merge: a catalog-only CG resolves declared+private; catalog-only CGs appear in listDeclaredContextGraphIds; the one-way-ratchet holds across sources (meta=public + catalog floor=private ⇒ private); and — the security case — an untrusted/peer-fetched _catalog graph can NEVER feed authz fields (creator/allowlist/participant), only the disclosure floor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mock _publish now unconditionally refreshes the public catalog projection after a confirmed publish (OT-RFC-49, no-op unless configured). This main-origin test's hand-built agentLike mock predates that method, so the full-publish case threw 'emitPublicProjectionAfterPublish is not a function' once the stack's _publish merged in. Stub it as a no-op. (Pre-existing latent gap on the stack; the access- policy behavior the test asserts is unchanged.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…everse-pollution)
readDurableDataPage built its candidate graph set from everything under the CG
prefix, excluding only the top _meta and _private graphs — but NOT the
_shared_memory* graphs. So the durable DATA phase served SWM data, which the
requester blind-UNIONs (storeInsert). Two consequences, both devnet-proven via a
stack-trace on the curator's _shared_memory write (curator-converge Gate A):
1. CORRUPTION: a curator's own SWM gets reverse-synced from a member on
reconnect (the durable sync has no curator-skip), polluting the curator's
single-valued roots into {v1,v3} — the exact union the #1193 curator-leader
REPLACE exists to prevent. The member then faithfully replicates it.
2. LEAK: gated/private SWM was served to any durable-data requester.
SWM is the EXCLUSIVE domain of the dedicated SWM phase (readSwmDataPage), which
applies per-(graph,subject) REPLACE + the structural-curator-skip + member
recovery. Exclude /_shared_memory* from the durable data candidate graphs.
Pre-existing (graph-plan.ts byte-identical to the stack; NOT introduced by the
Track-C reconciliation). Devnet curator-converge now: GATE A PASS + GATE B PASS
(member converges to [v3], curator stays [v3], no union, no reverse pollution).
Unit: 187 sync/SWM/catalog/durable-responder tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… writes (close M2-b silent-loss) A private-CG shared-memory write is now durable IFF the curator (the authoritative replica under OT-RFC-49 curator-leader) applied it. Closes the silent same-root-update loss that §8/§9 flagged as unfinished: today a member UPDATE to a root the curator already holds returns HTTP 200 on the LOCAL commit alone, can miss the always-on curator (best-effort gossip, errors swallowed, no durable outbox), and is then silently REPLACE-reverted on the member's next reconnect — with no error ever surfaced. Devnet-proven, beyond what §7 accepts. Mechanism — a `confirmBeforeCommit` hook in the publisher's write path runs AFTER the signed wire message is built and BEFORE the first store mutation. The agent injects a confirmer that reliably delivers the message to the curator and requires an applied-ack (the SWM_UPDATE receiver returns an empty reply ONLY in the `outcome.applied` branch, so delivered ⇔ persisted). On non-confirmation the write ABORTS with zero local persistence: - curator unreachable / ack timeout / unresolved → CuratorUnconfirmedError → 503 - curator permanent refusal (0x01 sentinel) → CuratorRejectedError → 409 For share()/conditionalShare() the gate runs under the existing per-CG write lock, so the lock is held across the curator round-trip (writes serialize through the curator). Gossip is demoted to the post-commit cross-version safety-net + propagation to other members; it never gates success. Scope — all three SWM-entry paths: share() and conditionalShare() (`/api/shared-memory/write`, via `_shareImpl`) and promote/assertionPromote (`/knowledge-assets/:name/swm/share`, the WM→SWM path). WM drafts, `localOnly` writes, public CGs, and a node that IS the curator are exempt. Default-OFF (phase-1): config `swmAwaitCuratorAck` + per-request `awaitCuratorAck` override. Deliberate tradeoff (the user's "don't accept state"): with the gate ON a member cannot write while the curator is offline — including brand-new roots — even though new roots were never silently lost. Mitigated by default-off; a future nuance could gate only updates to curator-known roots. VM is already sound (it throws→500 on no-ACKs, surfaces tentative/confirmed, and cleans SWM only on confirmed) — no VM change. Validation: publisher unit (workspace.test.ts +3: gate aborts with ZERO persist on applied:false, CuratorRejectedError on rejected, commits on applied:true; 185 share/promote/conditional tests byte-identical) + devnet (scripts/devnet-test-curator-ack-gate.sh: share AND promote — curator-up→200 +lands on curator, curator-down→503 no-persist, recovery on restart) + curator-converge GATE A+B no-regression at default-off. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
otReviewAgent
left a comment
There was a problem hiding this comment.
Operational Notice: Review Agent could not complete this review.
Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'
branarakic
left a comment
There was a problem hiding this comment.
Codex review findings.
| roots: processed.entityCreators, | ||
| }); | ||
| if (processed.verifiedMeta.length > 0) { | ||
| await deps.store.insert([...processed.verifiedMeta]); |
There was a problem hiding this comment.
[high] Bug: Recovery replaces the root data but only appends the recovered SWM metadata, so any older WorkspaceOperation/rootEntity rows for the same root remain in _shared_memory_meta. The existing TTL cleanup deletes data for every expired op root, so a stale op can later delete the freshly recovered root even though the data was replaced correctly. Please mirror the gossip apply path here by deleting/replacing meta for each recovered root, and refresh the ownership entry, before inserting verifiedMeta.
| type OperationContext, | ||
| } from '@origintrail-official/dkg-core'; | ||
| import type { SyncPageResult } from './page-fetch.js'; | ||
| import { applySwmRecovery, type SwmRecoveryStore } from './swm-recovery-apply.js'; |
There was a problem hiding this comment.
[medium] Issue: The imported recovery helper currently contains a literal NUL byte in its template-string separator, which makes GitHub classify the .ts file as binary with no patch. That hides future diffs and review for recovery logic, and can trip text tooling. Please replace the literal NUL with an escaped separator such as \0, or with a printable delimiter, so the file stays normal UTF-8 text.
Codex [high] - SWM recovery replaced root DATA but only APPENDED meta, so a stale WorkspaceOperation/rootEntity row for a recovered root lingered in _shared_memory_meta; the TTL sweep could then delete the freshly-recovered root. Recovery now REPLACEs the meta per recovered root before inserting the curator's fresh meta - a new replaceMetaForRoots recovery dep, implemented in dkg-agent-lifecycle to mirror the publisher's deleteMetaForRoot (drop the op->root-entity links, then delete any op left with no remaining roots), scoped to the curator's fresh-meta graphs. Codex [medium] - swm-recovery-apply.ts used a literal NUL byte as the dedup-key separator, which made git classify the .ts as binary (no patch / no review). Replaced with the string escape (identical runtime, plain-text source). cli CI (knowledge-assets-route.test: "409 VM_PUBLISH_PRECONDITION when finalized but nothing shared into SWM") - REG is curated (accessPolicy:1). With nothing private shared, the catalog model surfaces the curated "has no private payload" precondition (only the public catalog entry exists) instead of the public-path "No quads in shared memory". Both are caller preconditions thrown BEFORE any chain interaction, so the vm/publish route now down-classifies "has no private payload" to 409 VM_PUBLISH_PRECONDITION (parity with the public path); the test accepts either precondition message. Verified locally against a real chain. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
otReviewAgent
left a comment
There was a problem hiding this comment.
Operational Notice: Review Agent could not complete this review.
Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'
…ild-green) Strip 68b80e1 reconciled via 3-way merge (base #1164) keeping main's additions + the curator-ack gate + Gate-A + recovery meta-replace; surgical cutover applied to dkg-publisher.ts; ack-collector/storage-ack-handler taken wholesale (cutover rewrites that collapsed the V2 ciphertext path). All 10 packages build green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…k-handler) where ours==main
…llapsed by the strip)
…ampling + remove obsolete V2 self-ACK test dkg-agent-publish.ts auto-merged so the strip's catalog publish-routing drifted; applied the strip's 3-hunk cutover onto ours (curator-ack preserved). Built the random-sampling package (catalog-extractor; was missing from the build sweep). rfc49-catalog-parity.e2e now PASSES (rebuilt catalog root == on-chain getCatalogRoot).
…0.3 (pre-existing drift on main, unrelated to cutover)
…ublish is valid post-cutover)
…xt:false) OT-RFC-49 WS-A defaults stripCiphertext ON (cores hold zero ciphertext), so handleGetCiphertextChunk declines before reaching the keying/auth/replay logic these tests verify. The serving handler still exists for nodes that opt into custody — disable the strip in the test responder so the logic is exercised. NOTE: publish-jsonld.test async failures are pre-existing environmental Hardhat nonce flakiness (automining/shared-chain), not a cutover regression — the only async-lift change here is the chunkedCommitment->catalogCommitment field rename.
Devnet finding (M2-b curator-ack gate): curator does not re-confirm its own private CG after restartSurfaced while running Observed:
Suspected (not yet observed): the Fix direction (M2 scope, separate work): on boot, a node must re-confirm the private CGs it has registered/subscribed (re-read on-chain Does not block the #1203 cutover (that PR changes the publish/catalog path; this is the SWM gate, and is gated behind the contract-redeploy coordination regardless). |
…commitment reverts The OT-RFC-49 WS-B proof-race rewrite of submitProof (reads the snapshotted challengeRoot/challengeLeafCount from the grown RandomSamplingLib.Challenge struct) changed RandomSampling's behavior but left _VERSION at 10.0.4 — identical to the live base_sepolia deployment, while its coupled storage contract already moved to 10.1.0. Bump so a redeploy is not read as a no-op. Also adds the previously-zero-assertion catalog-commitment integrity reverts (KnowledgeAssetsLifecycle), incl. IncompleteCatalogCommitment (the PR #1198 lifecycle regression): partial commitment + public-CG-with-catalog on BOTH the publish and update paths, and the already-committed zero-pair stranding guard.
OT-RFC-49 WS-C: deterministic coverage of the prover's curated branch — a curated CG proves the PUBLIC _catalog (extract catalog -> hashTripleV10 leaves -> flat-kc build -> submit), plus the unsynced-catalog -> kc-not-synced skip. Previously only the public flat-kc path and the soft-warn devnet leg covered this; the devnet strip-OFF discriminator did not fire in-window, so this is the first deterministic proof of the catalog-sampling dispatch. NOTE: 8 PRE-EXISTING failures in this file (public-path 'leaf-count-mismatch', red at HEAD before this commit, unrelated to the catalog cutover) are left as a separate follow-up — this commit is purely additive (the 2 curated tests pass).
Verified-against-source runbook for the coordinated redeploy: the non-proxy new-address constraint (grown Challenge struct), the two RandomSamplingStorage hazards (reward/score state zeroing -> epoch-boundary pin; clearOutstanding- Challenges no-op vs a fresh deploy), the version-keyed deployed-flag mechanic, the ordered procedure, post-cutover validation, and the known gaps (curated UPDATE not cut over, curator-restart re-confirm, strip-OFF hatch).
RFC-49 ciphertext→catalog cutover (WS-A..E) — full reconciliation onto the agent half
otReviewAgent
left a comment
There was a problem hiding this comment.
Operational Notice: Review Agent could not complete this review.
Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'
| /// so that removing it cannot shift the base slot of `_authorKaNumberHighWater` | ||
| /// or any later mapping; same discipline as `_deprecatedKnowledgeAssetsCounter`. | ||
| /// Never read or written after RFC-49. | ||
| mapping(uint256 => bytes32) private _deprecatedCiphertextChunksRoots; |
|
|
||
| /// @notice OT-RFC-49: DEAD — slot preserved. Formerly `ciphertextChunkCounts`. | ||
| /// See `_deprecatedCiphertextChunksRoots`. Superseded by `catalogLeafCounts`. | ||
| mapping(uint256 => uint32) private _deprecatedCiphertextChunkCounts; |
| function clearOutstandingChallenges(uint72[] calldata identityIds) external onlyOwnerOrMultiSigOwner { | ||
| for (uint256 i = 0; i < identityIds.length; i++) { | ||
| delete nodesChallenges[identityIds[i]]; | ||
| emit NodeChallengeCleared(identityIds[i]); | ||
| } | ||
| } |
…in mock The OT-RFC-49 WS-B rewrite makes the prover verify against the challenge's PINNED (challengeRoot, challengeLeafCount) snapshot instead of a live getLatestMerkleRoot/getMerkleLeafCount re-read. The pre-cutover makeChallenge factory never set those fields, so every public-path build saw challengeLeafCount=undefined -> NaN -> data-corrupted(leaf-count-mismatch). Fix is test-harness only (production is correct): makeChain now pins the snapshot from the chain's reported (root, leafCount) onto any surfaced challenge, mirroring what the on-chain createChallenge does; makeChallenge defaults isCurated:false. Full random-sampling suite green (64/64).
otReviewAgent
left a comment
There was a problem hiding this comment.
Operational Notice: Review Agent could not complete this review.
Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'
…FC-49 catalog cutover) #1201 (RFC-49 catalog model + ACK domain separation) landed on main and edits the same contracts as RFC-51. Resolved 4 conflicts: - KnowledgeAssetsLifecycle.sol: keep main's catalog model + ACK_DIGEST_VERSION; RFC-51's realized-credit removal preserved (calls gone, comments remain); _VERSION 10.1.0 -> 10.1.1. - RandomSampling.sol: keep main's proof-race/sampling rewrite + RFC-51's 1-epoch publishing-factor read (main left _calculateNodeScore untouched); _VERSION 10.1.0 -> 10.1.1. - EpochStorage.test.ts / RandomSampling.test.ts: version assertions reconciled (10.0.4 / 10.1.1). Re-synced the 3 main-domain ABIs (DKGKnowledgeAssets, KnowledgeAssetsLifecycle, RandomSamplingStorage) in evm-module + chain. Verified: compile OK; focused + EpochStorage + RandomSampling unit = 68 passing; chain abi-pinning 20/20; no straggler callers of the renamed EpochStorage getters. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
What this is
The agent/node half of OT-RFC-49 ("Hosting Follows Access"), reconciled onto
main. This collapses the previously-stacked PRs #1170 → #1171 → #1172 → #1175 → #1173 → #1193 into one integration branch (reconciled againstmain's parallel "Track-C" work), plus two follow-on fixes found during validation.No contract-redeploy dependency — this is pure node software and ships via a normal node update. The on-chain ciphertext→catalog commitment strip (#1196/#1198) is a separate stacked PR that follows this one and is gated on the coordinated contract redeploy.
Contents (17 commits, 0-behind-main)
core/publisher) and the graph-set index store (storage)._catalogto outsiders without the allowlist) + members-only sync auth.d5e0b11b2) — the durable data-sync responder must not serve_shared_memorygraphs (prevents a curator reverse-polluting its own CG's SWM from a member, and prevents leaking gated SWM).34dba6c72) — closes the M2-b silent same-root-update loss: a private-CG SWM write (share / conditionalShare / promote) is durable iff the curator applied it; curator unreachable → HTTP 503, nothing persisted. Default-OFF (configswmAwaitCuratorAck+ per-requestawaitCuratorAck).Validation
publisher+agent+cligreen.workspace.test.ts), 176 legacy share tests byte-identical.devnet-test-curator-converge.shGATE A + B pass;devnet-test-curator-ack-gate.sh— share and promote: curator-up→200+lands, curator-down→503 no-persist, recovery on restart.Deliberate semantic note
When the curator-ack gate is enabled, a member cannot write while the curator is offline — including brand-new roots — the intended "don't accept state" tradeoff. It's default-off; flip per the rollout plan.
🤖 Generated with Claude Code