Skip to content

RFC-49 ciphertext→catalog cutover (WS-A..E) — full reconciliation onto the agent half#1203

Merged
branarakic merged 10 commits into
integration/track-c-mergefrom
integration/rfc49-full
Jun 17, 2026
Merged

RFC-49 ciphertext→catalog cutover (WS-A..E) — full reconciliation onto the agent half#1203
branarakic merged 10 commits into
integration/track-c-mergefrom
integration/rfc49-full

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

Stacked on #1201 (the RFC-49 agent half). This applies the OT-RFC-49 "Hosting Follows Access" contract/ciphertext strip — cores hold zero private ciphertext for curated CGs; the on-chain random-sampling proof target moves from the private ciphertextChunksRoot to the public _catalog (catalogRoot). Member-side encryption stays (members exchange ciphertext via SWM); only the on-chain commitment + core ACK + pricing move to the catalog.

⚠️ Activation requires a coordinated contract redeploy at an epoch boundary (new addresses + clearOutstandingChallenges migration) — this branch makes the line lead main on the cutover. Code is review-ready; merge/activation is gated on the redeploy + auth review + soak.

How it was reconciled (not a naive merge)

The strip (#1196, commit 68b80e18a) was built on a pre-Track-C base, so a wholesale merge regresses main's post-base additions. Instead:

  • 3-way merge (base = the Issue #1138 — un-peg sync-responder CPU: A1–A4 + B0–B2 (§7 checkpoint PASS) #1164 common ancestor) for the overlapping files — keeps both main's evolution (incl. the curator-ack gate, Gate-A fix, recovery meta-replace, skipContextGraphEnsure) and the strip's cutover.
  • Surgical cutover diff applied onto ours for dkg-publisher.ts and dkg-agent-publish.ts (where ours and the strip both heavily edit the publish path).
  • Wholesale strip only for the files the strip rewrote end-to-end (the collapsed V2 chunked-ciphertext ACK path: ack-collector.ts, storage-ack-handler.ts).
  • Obsolete V2/ciphertext tests removed (the path is collapsed); cutover test updates taken from the strip.

Validation (all green)

Suite Result
Build (10 packages incl. random-sampling)
Core ✅ 282/282
Publisher ✅ 1163/1163
EVM contracts (catalogRoot + curated random-sampling) ✅ 723/723
CLI ✅ 1992/1992
Agent ✅ 1558+ (134 files); see note
rfc49-catalog-parity.e2e — rebuilt catalog root == on-chain getCatalogRoot

Note: publish-jsonld.test async cases fail with a Hardhat nonce/automining error — pre-existing environmental flakiness (the only async-lift change here is a chunkedCommitment→catalogCommitment field rename, which doesn't touch tx sequencing). Not a cutover regression.

Behavior change to be aware of

A curated CG can now publish catalog-only (the catalog is the on-chain commitment), so "nothing shared into SWM" is no longer a precondition error for curated CGs — it proceeds. The public "No quads in shared memory" precondition still holds (tested via a public CG).

Comprehensive devnet validation (curated publish → catalog cutover + cores-hold-zero, curator-ack gate, converge/recovery) to follow on this PR.

🤖 Generated with Claude Code

Branimir Rakic and others added 7 commits June 16, 2026 23:51
…ild-green)

Strip 68b80e1 reconciled via 3-way merge (base #1164) keeping main's additions
+ the curator-ack gate + Gate-A + recovery meta-replace; surgical cutover applied
to dkg-publisher.ts; ack-collector/storage-ack-handler taken wholesale (cutover
rewrites that collapsed the V2 ciphertext path). All 10 packages build green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ampling + remove obsolete V2 self-ACK test

dkg-agent-publish.ts auto-merged so the strip's catalog publish-routing drifted;
applied the strip's 3-hunk cutover onto ours (curator-ack preserved). Built the
random-sampling package (catalog-extractor; was missing from the build sweep).
rfc49-catalog-parity.e2e now PASSES (rebuilt catalog root == on-chain getCatalogRoot).
…0.3 (pre-existing drift on main, unrelated to cutover)
…xt:false)

OT-RFC-49 WS-A defaults stripCiphertext ON (cores hold zero ciphertext), so
handleGetCiphertextChunk declines before reaching the keying/auth/replay logic
these tests verify. The serving handler still exists for nodes that opt into
custody — disable the strip in the test responder so the logic is exercised.

NOTE: publish-jsonld.test async failures are pre-existing environmental Hardhat
nonce flakiness (automining/shared-chain), not a cutover regression — the only
async-lift change here is the chunkedCommitment->catalogCommitment field rename.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review skipped: filtered diff is 9391 lines (cap: 5,000). Please consider splitting this into smaller PRs for reviewability.

@otReviewAgent otReviewAgent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operational Notice: Review Agent could not complete this review.

Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'

@branarakic

branarakic commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Phase 4 — comprehensive devnet validation ✅

Ran on a fresh devnet brought up with the cutover contracts. Two layouts.

4a — no-regression on the M2 SWM gates (5 nodes: 3 cores + 2 edge curator/member):

  • devnet-test-curator-converge.sh (M2-a): GATE A + B PASS — automatic curator-REPLACE convergence, no union, no collateral loss.
  • devnet-test-curator-ack-gate.sh (M2-b): core promise PASS — curator up → 200 + lands; curator down → 503 + no local persist (no silent state). One pre-existing M2 caveat surfaced in the post-restart section (a curator doesn't re-confirm its own private CG after restart) — not introduced by this cutover (confirmed by diffing the cutover delta: it touches only host-mode ciphertext custody, not the SWM-access/confirm gate) — filed against RFC-49 agent half: catalog model + SWM recovery + curator-leader convergence + strict curator-ack gate #1201.

4b — the cutover itself (devnet-test-rfc49-catalog-sampling.sh, 6 nodes: cores 1-3 stripped, core 4 strip-OFF discriminator, edge 5/6 curator+member) — PASSED (exit 0):

  • ✅ curated publish → on-chain catalog commitment getCatalogRoot=0xce4883…09c0d, getCatalogLeafCount=4 — a catalog commitment, not a ciphertext commitment.
  • ✅ publisher genuinely emitted private ciphertext (1 chunk) — the strip is non-vacuous.
  • stripped cores 1-3 hold ZERO private ciphertext rows while holding the public _catalog (4 triples each).
  • ✅ a core submitted a random-sampling proof against the _catalog (submittedCount 3→4).
  • FROM-SWM shortcut (raw write + /shared-memory/publish, no finalize): catalog auto-injected (leafCount=4), cores hold it, and the finalize path stayed intact — catalog-only curated publish works.

Caveats (honest scoping — one observed, one not isolated):

  • Member edge6 (observed, fine): synced the private data just after the test's check window — the log shows SWM sender-key setup receive accepted for the curated CG (a private-SWM-only exchange) + repeated Sync complete: N verified triples. The member-side ciphertext-exchange path works.
  • Strip-OFF discriminator core 4 (root cause NOT isolated): held 0 ciphertext, persistently (re-polled 2 min+ after the run). It host-mode-subscribed CG-3's nameHash but never attempted a ciphertext pull. Two explanations produce the identical symptom and I have not distinguished them: (a) cleartext-name resolution timing / one-shot announcement, or (b) the reconciliation affecting the non-shipped strip-OFF custody hatch. So non-vacuousness here rests on the (weaker) emitted-chunks check rather than the discriminator.

@branarakic branarakic left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review findings.

kcs.setCiphertextChunksCommitment(p.id, p.newCiphertextChunksRoot, p.newCiphertextChunkCount);
} else if (kcs.getLatestCiphertextChunksRoot(p.id) != bytes32(0)) {
kcs.setCatalogCommitment(p.id, p.newCatalogRoot, p.newCatalogLeafCount);
} else if (kcs.getCatalogRoot(p.id) != bytes32(0)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This makes every update to a curated KA that already has a catalog commitment require a fresh (newCatalogRoot,newCatalogLeafCount), but the TS publisher update path never computes or passes those fields into v10UpdateACKProvider or updateKnowledgeCollectionV10; they default to bytes32(0),0 in the adapter. A post-RFC-49 curated KA will therefore publish successfully and then revert on its first update with IncompleteCatalogCommitment. Please mirror the publish-side catalog partition/root computation in DKGPublisher.update, pass the pair through ACK collection and the chain adapter, and add a curated-update regression test.

BigInt(newMerkleLeafCount),
ciphertextRootForAckDigest(intent.newCiphertextChunksRoot),
BigInt(intent.newCiphertextChunkCount ?? 0),
catalogRootForAckDigest(intent.newCatalogRoot),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Update ACKs now sign newCatalogRoot/newCatalogLeafCount, but the encrypted/curated update branch above never rebuilds that catalog root from an inline public catalog or persists it to <cg>/_catalog the way publish does. A publisher can therefore collect valid core ACKs for an arbitrary catalog commitment that no core can later serve/prove, and the contract only enforces the pair is non-zero. Please make curated updates carry the public catalog N-Triples, verify computeCatalogRoot(catalogCommittedLeaves(...)) against these fields before signing, and replace the local _catalog graph on success.

…commitment reverts

The OT-RFC-49 WS-B proof-race rewrite of submitProof (reads the snapshotted
challengeRoot/challengeLeafCount from the grown RandomSamplingLib.Challenge
struct) changed RandomSampling's behavior but left _VERSION at 10.0.4 —
identical to the live base_sepolia deployment, while its coupled storage
contract already moved to 10.1.0. Bump so a redeploy is not read as a no-op.

Also adds the previously-zero-assertion catalog-commitment integrity reverts
(KnowledgeAssetsLifecycle), incl. IncompleteCatalogCommitment (the PR #1198
lifecycle regression): partial commitment + public-CG-with-catalog on BOTH the
publish and update paths, and the already-committed zero-pair stranding guard.
@branarakic

Copy link
Copy Markdown
Contributor Author

⚠️ Significant finding: the curated UPDATE path is not cut over to the catalog model

While adding the missing guard tests, the curated UPDATE path turned out to be unimplemented for the catalog cutover, not merely untested. PUBLISH was fully cut over; UPDATE was not. Confirmed across the publisher, agent, and core ACK handler:

  • Publisher (dkg-publisher.ts) — the update path builds no catalog N-quads (catalogNquadsStr exists only on the publish path, :2059-2065), prices newByteSize off the full update quads (updateByteSize), not the catalog footprint, and its stagingQuads carries private roots — never the public _catalog.
  • Agent (dkg-agent-publish.ts)_ensureCuratedCatalogInSwm (the public-catalog projection/injection) is called only from publishFromSharedMemory (:3738); there is no catalog build or injection on any update path.
  • Core ACK handler (storage-ack-handler.ts updateHandler, :824-844) — for a curated (isEncryptedPayload) update it confirms the curation oracle and then trusts the publisher's claimed newCatalogRoot/newCatalogLeafCount and signs the ACK. It does not rebuild the catalog root from inline data, does not enforce byteSize parity, and does not persist the updated _catalog — all of which the publish handler does (:305-415). newCatalogRoot defaults to 32 zero-bytes when unset (ack-collector.ts:421).

Consequences

  1. A value-adding curated update (deltaTokenAmount > 0) would ship a zero catalog root and hit the on-chain CuratedCGRequiresCatalogCommitment revert — i.e. value-growing updates to curated KAs are effectively blocked on-chain.
  2. Even where it doesn't revert, cores never receive or re-host the updated public catalog, so a random-sampling challenge against an updated curated KA can't be served/proved by the cores.
  3. The core signs the catalog commitment on update without verifying it (the publish-path integrity guarantee is absent on update).

Why it wasn't caught earlier: the publish path's rfc49-catalog-parity.e2e + the 6-node devnet only exercise PUBLISH; there is no curated-UPDATE catalog e2e/devnet leg.

Fix scope (a focused follow-up, not a one-liner — security/correctness-sensitive, needs devnet validation):

  • Publisher: build committedCatalogLeavescomputeCatalogRoot on update, set newCatalogRoot/newCatalogLeafCount, price newByteSize off the catalog footprint, ship the public catalog inline.
  • updateHandler: rebuild + verify catalog root, enforce byteSize parity, decline CATALOG_ROOT_MISMATCH, persist <cg>/_catalog — mirroring the publish handler.
  • Agent: inject/host the updated catalog (the update analogue of _ensureCuratedCatalogInSwm).
  • Add a curated-UPDATE catalog parity e2e + a devnet leg.

The on-chain integrity reverts for the update path are now tested (this PR adds IncompleteCatalogCommitment / PublicCGCannotHaveCatalogCommitment coverage on both publish and update). This finding is about the off-chain producer/host/verify half on update.

Branimir Rakic added 2 commits June 17, 2026 09:35
OT-RFC-49 WS-C: deterministic coverage of the prover's curated branch — a
curated CG proves the PUBLIC _catalog (extract catalog -> hashTripleV10 leaves
-> flat-kc build -> submit), plus the unsynced-catalog -> kc-not-synced skip.
Previously only the public flat-kc path and the soft-warn devnet leg covered
this; the devnet strip-OFF discriminator did not fire in-window, so this is the
first deterministic proof of the catalog-sampling dispatch.

NOTE: 8 PRE-EXISTING failures in this file (public-path 'leaf-count-mismatch',
red at HEAD before this commit, unrelated to the catalog cutover) are left as a
separate follow-up — this commit is purely additive (the 2 curated tests pass).
Verified-against-source runbook for the coordinated redeploy: the non-proxy
new-address constraint (grown Challenge struct), the two RandomSamplingStorage
hazards (reward/score state zeroing -> epoch-boundary pin; clearOutstanding-
Challenges no-op vs a fresh deploy), the version-keyed deployed-flag mechanic,
the ordered procedure, post-cutover validation, and the known gaps (curated
UPDATE not cut over, curator-restart re-confirm, strip-OFF hatch).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codex review skipped: filtered diff is 9950 lines (cap: 5,000). Please consider splitting this into smaller PRs for reviewability.

@otReviewAgent otReviewAgent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operational Notice: Review Agent could not complete this review.

Business logic reviewer failed: [Errno 7] Argument list too long: '/var/lib/review-agent/toolchain/codex/node_modules/.bin/codex'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants