Skip to content

fix: three pre-mainnet blockers — write-route 500 crashes (#306/#787) + public-CG host-mode ingest (#1124)#1239

Merged
branarakic merged 12 commits into
mainfrom
fix/mainnet-blockers-1124-787-306
Jun 19, 2026
Merged

fix: three pre-mainnet blockers — write-route 500 crashes (#306/#787) + public-CG host-mode ingest (#1124)#1239
branarakic merged 12 commits into
mainfrom
fix/mainnet-blockers-1124-787-306

Conversation

@Bojan131

@Bojan131 Bojan131 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes three pre-mainnet-blocking issues surfaced by the #1129 issue-liveness suite. All three are live-confirmed bugs; each fix is scoped to be non-breaking (positive regression controls + full agent suite green).

#787 / #306 — write routes 500-crash on malformed quads

POST /api/knowledge-assets/:name/wm/write (#306) and POST /api/shared-memory/write / …/conditional-write (#787) only checked Array.isArray(quads), so a string-shaped N-Quad ("<s> <p> <o> .") slipped through and crashed the agent write path with a TypeErrorHTTP 500 instead of an actionable 4xx.

  • New isWritableQuad (daemon http-utils.ts) validates each quad at the route boundary. graph is optional here (unlike the publish path's isPublishQuad) so valid {subject,predicate,object} writes are unaffected.
  • SWM write via node-level API token crashes with TypeError (toLowerCase on undefined) #787 root cause also fixed: getWorkspaceGossipSigningAgent called record.agentAddress.toLowerCase() unguarded — a node-level key record (privateKey, no agentAddress) crashed it on every SWM write via that token. Now skips records without a valid ethers.isAddress agentAddress → falls through to the fallback signer.

#1124 — public CGs couldn't reach storage-ACK quorum

Host-mode cores dropped a public CG's plaintext SWM share at two gates in ingestSwmHostModeEnvelope — the isCiphertext sniff and the curated-agent authority check (verifyHostModeEnvelopeAuthority, which rejects "no agent allowlist", i.e. exactly the public case). With no host holding the data, a public CG's storage-ACK quorum was unreachable (NO_DATA_IN_SWM). Private/curated CGs were unaffected (ciphertext + allowlist), which is why they worked while public CGs never published to VM.

The fix opens both gates only for a CG positively confirmed public, and re-uses the same verification curated traffic gets — it does not weaken authentication:

  • Both-axes public classifier. isConfirmedPublicForHostMode admits a self-signed plaintext envelope only when accessPolicy === 0 (open read) and publishPolicy === 1 (open publish). Curated and unknown both → false, so a curated CG mid chain-event-race (which also surfaces as "no agent allowlist") is never misclassified as public — it stays a drop and heals via member catchup.
  • publishPolicy is re-verified on a short window for this decision. publishPolicy is mutable on-chain (ContextGraphs.updatePublishPolicyPublishPolicyUpdated) and the resolver otherwise serves it from a ≤60s TTL cache. Because this is the first security-positive consumer of that value, the admission path passes getContextGraphOnChainPolicy(cg, { publishPolicyMaxCacheAgeMs: 5_000 }) — it accepts the cached value only if ≤5s old, else re-reads from chain (fail-closed: an RPC error/timeout leaves publishPolicy undefined → not public → drop). This bounds the window in which a host could admit a self-signed write for a CG an owner just downgraded open→curated to ~5s (vs 60s), while rate-capping the chain RPC to ~1 per window per CG so public-plaintext gossip can't amplify into a per-message eth_call. The resolution is also lazyconst confirmedPublic = !isCiphertext && await isConfirmedPublicForHostMode(...) — so the dominant ciphertext/curated path pays no chain read at all. (accessPolicy is immutable on-chain — no setter/event — so its un-TTL'd cache read can never be stale-permissive.)
  • Same verifier as curated, plus transport bindings. The public branch of verifyHostModeEnvelopeAuthority runs verifyAgentEnvelope (EIP-191 signature + 5-min timestamp freshness) — identical to the curated path — then additionally requires a decodable inner WorkspacePublishRequest, binds request.contextGraphId === contextGraphId (no cross-CG replay) and request.publisherPeerId === fromPeerId (no transport spoof). The last binding matters because host catchup applies via trustedReplay, which skips the publisher↔sender transport check, so it must be enforced at ingest. Rejections return a structured HostModeRejectionCode (DECODE_FAILED, UNSIGNED, NO_AGENT_ALLOWLIST, PEER_NOT_IN_ALLOWLIST, SIG_VERIFY_FAILED, CG_MISMATCH, PUBLISHER_PEER_MISMATCH) instead of free-text matching, so the ingest gate keys its transient-race-vs-warn logging on the code.
  • Applied so the host is ACK-CAPABLE, not just retained. Admitting the envelope is necessary but not sufficient: the opaque SwmHostModeStore (member-catchup retention) is NOT what the StorageACKHandler a publisher dials reads — that reads <cg>/_shared_memory from the triple store. So for a confirmed-public CG, ingestSwmHostModeEnvelope ALSO applies the plaintext via the member apply path (handler.handle(data, fromPeerId, undefined, { trustedReplay: true }), mirroring the LU-6 catchup-replay) into that exact graph — making a non-member host sign a quorum-eligible ACK instead of declining NO_DATA_IN_SWM. The if (confirmedPublic) wrapper is the sole authority gate for this apply and is load-bearing: on a host-only core handle() can't itself distinguish curated from public (both resolve to agentGateAddresses === null && hasPrivateAccessPolicy === false), so the forced-fresh both-axes classifier is what keeps a curated CG's plaintext out of _shared_memory. For a public CG handle({ trustedReplay }) skips no cryptography — only the transport re-checks already performed by verifyHostModeEnvelopeAuthority on the same bytes.

No path admits an unauthenticated plaintext envelope into curated storage.

Verification

Live verification (local rc.18 devnet, build ece492d2f)

Notes

  • Verified on the local rc.18 devnet (not the rc.17 KiBo testnet node): main's KC→KA contract rename hard-renamed the chain adapter, and Base Sepolia still has the pre-rename contracts, so an rc.18 build can't run against that testnet yet.

🤖 Generated with Claude Code

Bojan131 and others added 2 commits June 19, 2026 10:21
…ssip-signer agentAddress (#306, #787)

#306 — POST /api/knowledge-assets/:name/wm/write and #787 —
POST /api/shared-memory/write (+ /shared-memory/conditional-write) only checked
`Array.isArray(quads)`, so a string-shaped quad ("<s> <p> <o> .") slipped through
and crashed the agent write path with a TypeError → HTTP 500. Added isWritableQuad
(graph optional, unlike the publish path's isPublishQuad) and validate every quad
at the route boundary, returning an actionable 400.

#787 root cause — getWorkspaceGossipSigningAgent() called
`record.agentAddress.toLowerCase()` unconditionally; a node-level key record
(privateKey, no agentAddress) crashed it on every SWM write via that token.
Guarded with optional chaining so such records fall through to the fallback signer.

Verified: new test issue-306-787-write-quad-validation.test.ts — string quads → 4xx
on both routes AND well-formed object quads still → 200 (no regression). cli write-path
regression suites green (191 tests), agent gossip-signer test green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…est (#1124)

Host-mode cores dropped a PUBLIC CG's plaintext SWM share at two gates in
ingestSwmHostModeEnvelope — the isCiphertext sniff and the curated-agent
authority check (which rejects 'no agent allowlist', i.e. exactly the public
case). With no host holding the data, a public CG's storage-ACK quorum was
unreachable (NO_DATA_IN_SWM). Private/curated CGs were unaffected (ciphertext +
allowlist), which is why they worked while public CGs never published to VM.

Fix opens BOTH gates, but ONLY for a CG positively confirmed public via the new
isConfirmedPublicForHostMode helper:
  - Gate 1: a non-ciphertext envelope is dropped unless the CG is confirmed
    public (plaintext is the legitimate carrier for open CGs).
  - Gate 2: a 'no agent allowlist' verdict is accepted only for a confirmed
    public CG — the envelope is already verified SIGNED (the unsigned-envelope
    check runs first), and public host-mode storage stays bounded by the per-CG
    byte cap + registration economics (same safety net as the pre-reg fail-open).

SECURITY: isConfirmedPublicForHostMode is biased so curated AND unknown both
return false. A curated CG whose on-chain policy hasn't loaded yet (chain-event
race — also surfaces as 'no agent allowlist') is therefore NEVER misclassified
as public; it stays a drop and heals via member catchup. Curated + unknown
behaviour is byte-for-byte unchanged.

Verified: host-mode-public-ingest-1124.test.ts pins the classifier safety bias
(confirmed-public→true; curated marker / private / unknown / throw → false).
Full agent suite green (1668 passed); existing host-mode tests green (23).
NOTE: end-to-end public-CG publish-to-VM quorum needs a host-mode sharded
topology (non-member storage cores) to observe — see PR description for the
testnet verification plan.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-swm-host.ts Outdated
Comment thread packages/agent/src/dkg-agent-swm-host.ts Outdated
Comment thread packages/agent/test/swm/host-mode-public-ingest-1124.test.ts Outdated
…/Kye_D 🟡)

🔴 Kye-4 — isConfirmedPublicForHostMode resolved the access policy via a direct
cleartext `subscribedContextGraphs` lookup, which MISSES for a host-only core
whose subscription is keyed by the wire HASH (the exact #1124 sharded topology);
the public envelope would still be dropped. Now delegates to the shared
`getContextGraphOnChainPolicy` resolver (cache + local _meta + chain RPC,
key-independent). Only accessPolicy===0 is public; curated(1)/unknown(undefined)
→ false (safe; heals via catchup).

🟡 Kye_C — the public-CG exception keyed off the free-form `verdict.reason`
string 'no agent allowlist on context graph', coupling a log message to a
behavioral branch. verifyHostModeEnvelopeAuthority now returns a structured
`reasonCode` (HostModeRejectionCode enum); the ingest path keys off
`reasonCode === 'NO_AGENT_ALLOWLIST'`. Also hardened the public path to verify
the envelope signature self-consistency (recovers to the claimed signer) — a
public CG short-circuits the authority check BEFORE the sig verify, so a
forged/garbage signature is now rejected (unsigned was already dropped).

🟡 Kye_D — added an ingest-level test (host-mode-public-ingest-1124.test.ts) that
drives a real SIGNED plaintext WorkspacePublishRequest through ingestSwmHostModeEnvelope:
confirmed-public → STORED; curated/unknown → DROPPED; tampered signature → DROPPED.
Classifier tests now pin the getContextGraphOnChainPolicy contract directly.

Verified: 8/8 in the #1124 suite; agent host-mode/gossip/lu11 regression (42) +
publisher workspace-handler authority (19) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-swm-host.ts Outdated
Comment thread packages/agent/src/dkg-agent-swm-host.ts Outdated
Comment thread packages/agent/test/swm/host-mode-public-ingest-1124.test.ts Outdated
…ared verifier (#1239 round-2)

Addresses otReviewAgent round-2 on the #1124 public-CG path:

🔴 KzQNo — the public exception used a bare ethers.verifyMessage that skipped the
shared verifier's timestamp-freshness window, so a previously-signed public
envelope could be replayed indefinitely and re-appended (evicting newer entries
from the bounded host-mode store). Moved the public acceptance INTO
SharedMemoryHandler.verifyHostModeEnvelopeAuthority behind a new
`allowSelfSignedForPublicCg` option: it now runs the SAME verifyAgentEnvelope as
curated traffic (signature + 5-min freshness), with the claimed signer as its own
one-entry allowlist (self-consistency). Only the allowlist decision diverges.

🔴 KzQNk — cross-CG injection: the public path stored an envelope without checking
the inner WorkspacePublishRequest.contextGraphId, so a valid envelope for public
CG-A carrying a payload for CG-B was stored under A and could be applied to B by
catchup. verifyHostModeEnvelopeAuthority now rejects (CG_MISMATCH) when the inner
request's contextGraphId differs from the envelope CG.

🟡 KzQNt — the positive ingest test stubbed isConfirmedPublicForHostMode, bypassing
the resolver the fix depends on. Tests now stub getContextGraphOnChainPolicy (the
real dependency) so the actual classifier + both gates run, and add a cross-CG
case asserting CG_MISMATCH drops.

The agent ingest path is simplified accordingly (compute confirmedPublic once;
pass the flag; no bespoke crypto in the agent). Verified: #1124 suite 9/9; agent
host-mode/gossip/lu11 (42) + publisher workspace-handler authority (19) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-swm-host.ts Outdated
Comment thread packages/agent/src/dkg-agent-crypto.ts
Comment thread packages/agent/src/dkg-agent-crypto.ts Outdated
…; skip keyless signers (#1239 round-3)

🔴 Kzii9 — isConfirmedPublicForHostMode gated the self-signed public host-mode
ingest path on accessPolicy===0 (READ visibility) alone. But read visibility and
write authority are separate: a public-READABLE CG can still have publishPolicy===0
(curated / PCA publishing). For such a CG (no resolved agent allowlist), the
self-signed path would let ANY key store plaintext SWM on host-mode cores and
bypass the on-chain publisher authorization. Now requires BOTH accessPolicy===0
AND publishPolicy===1 (open read AND open write); curated/unknown on either axis
→ false (getContextGraphOnChainPolicy already returns both, with a chain-RPC
fallback). Verified live earlier on a CG created publishPolicy:1, so the #1124
behaviour is unchanged for genuinely-open CGs.

🟡 KzijE — getWorkspaceGossipSigningAgent now SKIPS a record with no valid
agentAddress entirely, rather than letting it become the fallback signer (which
would emit an envelope with a missing agentAddress that downstream rejects).
Avoids the original #787 crash AND picks a usable signer.

🟡 KzijJ — the #306/#787 daemon test exits at the HTTP quad-shape boundary before
the signer is selected, so it wouldn't catch a revert. Added
gossip-signer-selection-787.test.ts: a keyless record placed ahead of a valid
signer is skipped (no crash; valid signer chosen; null when only keyless records).

Verified: #1124 suite 10/10 + #787 signer 3/3; agent host-mode/gossip/lu11 (33),
cli #306/#787 daemon (4), publisher workspace-handler authority (12) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/publisher/src/workspace-handler.ts Outdated
…uest + sender peer (#1239 round-4)

🔴 K0FKl — a public self-signed host-mode entry is later applied via host catchup
with `trustedReplay`, which SKIPS the apply-time `publisherPeerId === fromPeerId`
transport binding. The round-2/3 public path only checked the inner CG "when a
request decoded", so:
  - a ciphertext/garbage payload (request undefined) fell through to accept, and
  - an honestly-signed envelope whose inner publisherPeerId named ANOTHER peer was
    stored, and catchup then applied the write under that spoofed publisher identity.

verifyHostModeEnvelopeAuthority's public branch now, before accepting:
  1. REQUIRES a decoded WorkspacePublishRequest (reject if none),
  2. binds request.contextGraphId to the envelope CG (cross-CG injection guard), and
  3. binds request.publisherPeerId to the actual sender fromPeerId (publisher-spoof
     guard) — mirroring the apply-time binding that trustedReplay skips.
New reasonCode PUBLISHER_PEER_MISMATCH.

Test: host-mode-public-ingest-1124.test.ts adds a publisher-spoof case (inner
publisherPeerId != sender → dropped). #1124 suite 11/11; publisher
workspace-handler authority/trusted-replay (19) + agent host-mode/lu11/signer (30) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@branarakic

Copy link
Copy Markdown
Contributor

Review notes (read at HEAD ece492d2f)

First — the round-4 state is solid. I traced the actual code, not just the thread: the public-CG path now runs the same verifyAgentEnvelope as curated traffic (sig + 5-min freshness), the classifier requires both open-read and open-publish (fail-closed on unknown/throw), and the inner-request CG-binding + publisherPeerId↔sender binding are both in place. The #306/#787 fixes are clean with positive controls. Nice iteration through the otReviewAgent findings.

Two things I'd want settled before this guards mainnet — neither is a code-correctness bug, both are sufficiency/scope calls:

1. The self-signed admission gate inherits a ≤60s stale-publishPolicy window

The gate is accessPolicy === 0 && publishPolicy === 1 (isConfirmedPublicForHostMode). Checked both axes for cache staleness:

  • accessPolicy — safe by immutability. No setAccessPolicy/updateAccessPolicy/AccessPolicyUpdated in the contracts or the chain adapter, so the un-TTL'd cache read at dkg-agent-cg-registry.ts:481 can never be stale-permissive. 👍
  • publishPolicy — mutable, and the cache is only time-bounded. It has a full mutation surface (ContextGraphs.updatePublishPolicyContextGraphStorage.updatePublishPolicyemit PublishPolicyUpdated), there's no event watcher in the agent, and the resolver only TTL-gates it (ON_CHAIN_PUBLISH_POLICY_CACHE_TTL_MS = 60_000, dkg-agent-cg-registry.ts:497-499).

So for ≤60s after an owner flips a CG from open→curated publish, a host with a fresh cache entry (publishPolicy=1) still classifies it public and admits a self-signed plaintext write from a non-authorized peer — which host catchup then applies under trustedReplay. Bounded and fail-closed after the TTL re-verifies via chain RPC, and it's a rare downgrade op — but this PR is the first consumer to use the cached publishPolicy for a security-positive decision, and the comment that blesses the un-TTL'd read (dkg-agent-cg-registry.ts:487-491) is explicitly scoped to the other, sender-key-gated gossip consumers — that safety argument doesn't transfer to a plaintext-admission path.

Suggestion: either a conscious "we accept the 60s window" (publishPolicy downgrades are rare), or hook PublishPolicyUpdated to invalidate the cache for the host-mode admission path specifically.

2. The PR's own end-state (quorum reached) is unobserved

Stated purpose is "public CGs reach storage-ACK quorum," but per the PR notes that exact end-state is unobserved — only the gate-level drop is live-verified, on local rc.18 devnet. So the gate fix is necessary but not demonstrated sufficient: that the quorum actually becomes reachable on a host-mode sharded topology (non-member storage cores) is still unverified. For a pre-mainnet blocker I'd want that shown end-to-end before relying on it.

Minor

The PR body still describes the original design (accessPolicy-only classifier, verdict.reason text matching) — it predates the 4 fix rounds (now both-axes + structured reasonCode). Worth updating so the code-owner reviews against what the code actually does.

…strate quorum reached (Branimir review)

Addresses the two sufficiency/scope calls in Branimir's review of PR #1239.

1. Close the ≤60s stale-publishPolicy admission window. `publishPolicy` is
   mutable on-chain (PublishPolicyUpdated) and the resolver serves it from a
   ≤60s-TTL cache, but `isConfirmedPublicForHostMode` is the first
   security-positive consumer — a stale `publishPolicy=1` would admit a
   self-signed plaintext write for up to the TTL after an owner downgrades
   open→curated. Add `getContextGraphOnChainPolicy(cg, { forcePublishPolicyChainRead })`
   which treats the publishPolicy cache as always-stale and re-reads from chain
   (fail-closed on RPC error). The host-mode gate now passes the flag; other
   callers are unaffected (param is optional). accessPolicy stays un-TTL'd —
   it is immutable on-chain.

2. Demonstrate the end-state (quorum *reached*), not just the gate drop.
   `host-mode-quorum-bridge-1124.test.ts` wires the real ACKCollector to real
   StorageACKHandlers over real stores and reaches a public-CG quorum purely
   from non-member host-mode cores holding only the host-mode-ingested
   plaintext (valid EIP-191 ACKs); negative control shows the pre-fix empty-SWM
   state returns NO_DATA_IN_SWM (quorum-blocking). The load-bearing fact:
   StorageACKHandlerConfig has no membership input, so a non-member host's ACK
   is consensus-identical to a member's. Resolver- and gate-level tests pin the
   force-fresh read.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Bojan131

Copy link
Copy Markdown
Contributor Author

Thanks for the careful trace, Branimir — both points were fair. Addressed all three in bb19c568b.

1. Stale-publishPolicy admission window → closed (no window, not "accepted")

Went with the stricter of your two options. Rather than consciously accept the ≤60s window or wire a PublishPolicyUpdated watcher, the admission gate now re-reads publishPolicy from chain on every decision:

// isConfirmedPublicForHostMode
const { accessPolicy, publishPolicy } = await this.getContextGraphOnChainPolicy(
  contextGraphId, { forcePublishPolicyChainRead: true },
);
return accessPolicy === 0 && publishPolicy === 1;

forcePublishPolicyChainRead treats the publishPolicy cache as always-stale so the RPC re-verifies; an RPC error/timeout leaves publishPolicy undefined → not public → fail closed (the share heals via catchup once the policy re-resolves). So there is no stale-permissive window at all, and a PublishPolicyUpdated downgrade is honored on the very next envelope. The flag is opt-in — accessPolicy (immutable, as you noted) and every other caller keep the cached read. Two tests pin it: the resolver test shows the same fresh-cache state returns cached 1 without the flag and fresh-from-chain 0 with it; the gate test asserts the flag is passed.

2. End-state (quorum reached) → demonstrated

You were exactly right that the small devnet can't show it: on a fully-staked local net every core is a member of every registered CG, so I can't get a non-member storage core for a registered CG. Confirmed live — a public CG (onChainId=4) published end-to-end and reached 3/3 quorum → on-chain confirmed (block 2667), but every ACK came back source=member, including the operator-host node. That proves "public CGs reach quorum" but not your specific "via non-member storage cores."

So I demonstrated that sub-scenario deterministically at the layer that actually decides quorum — host-mode-quorum-bridge-1124.test.ts wires the real ACKCollector to real StorageACKHandlers over real stores:

  • post-fix: 3 non-member host-mode cores holding only the host-mode-ingested plaintext reach quorum, every ACK a valid EIP-191 sig over the canonical V10 digest;
  • pre-fix negative control: with the plaintext dropped (empty SWM) every host returns NO_DATA_IN_SWM — quorum unreachable.

The load-bearing fact making this faithful: StorageACKHandlerConfig has no membership input. A core signs iff role=core ∧ data-in-SWM ∧ merkle-match ∧ signer-registered — membership is never consulted, so a non-member host's ACK is consensus-identical to a member's. Reproducing it live still wants a large stake-weighted sharded topology; flagging that as the remaining sign-off step rather than claiming the local net covers it.

3. PR body → updated

Rewritten to match the final design: both-axes classifier (accessPolicy===0 && publishPolicy===1, force-fresh publishPolicy), verifyAgentEnvelope unification + the inner-request CG / publisherPeerId bindings, structured HostModeRejectionCode, and the quorum-reached verification above.

Comment thread packages/agent/src/dkg-agent-swm-host.ts
Comment thread packages/publisher/test/host-mode-quorum-bridge-1124.test.ts
Comment thread packages/cli/test/issue-306-787-write-quad-validation.test.ts Outdated
@branarakic

Copy link
Copy Markdown
Contributor

Re-review of bb19c568b — both points genuinely closed; one follow-on from the strict choice

Confirmed in code (not just the thread):

  • Game coordinator gossip hardening (PR #29 follow-up) #1 stale-publishPolicy window → closed. forcePublishPolicyChainRead is genuinely fail-closed (RPC timeout → publishPolicy undefined → === 1 false → drop), and you correctly left the immutable accessPolicy axis cached. You took the stricter of the two options — no window, not "accepted."
  • Sync drift between peers due to incomplete paranet scope and weak reconciliation #2 quorum reached → demonstrated. host-mode-quorum-bridge-1124.test.ts drives the real ACKCollector + StorageACKHandler + OxigraphStore with non-member hosts and a negative control (NO_DATA_IN_SWM), plus the live all-member e2e. Sufficiency question answered.
  • Body updated (both-axes + reasonCode). 👍

One follow-on — a direct cost of the strict path you chose (on my earlier advice)

Forcing a chain read on every decision is correct, but isConfirmedPublicForHostMode runs on every host-mode envelope (line 1058, before the cheap drops). So on the dominant path — a ciphertext envelope on a curated CGconfirmedPublic is now computed-then-discarded:

  • line 1059 only reads it when !isCiphertext, and
  • verifyHostModeEnvelopeAuthority ignores allowSelfSignedForPublicCg whenever an allowlist exists (the agentGateAddresses === null branch is skipped).

i.e. the bulk of host-mode traffic now pays a synchronous eth_call to produce a value that's never used — on the very feature meant to scale public CGs. (Secondary, conditional: on a public CG the eth_call at 1058 fires before the sig check at 1072, and I didn't find a rate-limiter on the ingest path — so spammed plaintext gossip could amplify into per-message chain RPCs. Flagging as likely, not certain — I can't rule out gossipsub-level peer-scoring upstream.)

Not a correctness/security issue — the gate is intact and stronger. It's perf/robustness.

Keep the strictness, drop the cost (~2 lines)

confirmedPublic only gates anything when !isCiphertext, so compute it lazily:

// was: const confirmedPublic = await this.isConfirmedPublicForHostMode(storageCgId);
const confirmedPublic = !isCiphertext && await this.isConfirmedPublicForHostMode(storageCgId);

That removes the eth_call for all ciphertext/curated traffic for free, and it's security-preserving (a ciphertext-on-public envelope just drops and heals via catchup). If you also want to cap the residual on the public-plaintext path without giving back the strictness you just won, a short dedicated freshness window (~5s) beats force-every-time — bounds the downgrade staleness to seconds and caps the eth_call rate.

Your call on fix-first vs fast-follow (hinges on host-mode traffic volume) — I'd lean fix-first since it's trivial and it undercuts this PR's own goal. Everything else looks merge-ready.

Bojan131 and others added 2 commits June 19, 2026 15:54
… non-member host is ACK-capable (otReviewAgent 🔴)

The host-mode gate fix admitted a confirmed-public self-signed plaintext
envelope but only appended the RAW envelope to SwmHostModeStore (catchup
retention) — it never applied the quads to `<cg>/_shared_memory`, which is the
graph the StorageACKHandler reads (loadSWMQuads / sharedMemoryReadBothFilter).
So a non-member host retained the share but still DECLINEd NO_DATA_IN_SWM when a
publisher dialed it for an ACK → public-CG quorum stayed unreachable via
non-member storage cores. The prior quorum test masked this by seeding the
triple store directly instead of driving the real ingest.

Fix: in ingestSwmHostModeEnvelope, for a CONFIRMED-PUBLIC CG only, also apply
the plaintext via the member apply path — `handler.handle(data, fromPeerId,
undefined, { trustedReplay: true })` — on the same already-authority-verified
bytes (mirrors the LU-6 catchup-replay). For a public CG handle() skips no
crypto; trustedReplay skips only the transport re-checks
verifyHostModeEnvelopeAuthority already performed. The opaque append is kept for
member host-catchup serving.

SECURITY: the `if (confirmedPublic)` wrapper is the SOLE authority gate and is
load-bearing — on a host-only core handle() CANNOT distinguish curated from
public (both resolve to agentGateAddresses===null && hasPrivateAccessPolicy
===false), so confirmedPublic (accessPolicy===0 && forced-fresh publishPolicy
===1) is what guarantees public. Documented in-code so a refactor can't hoist
the apply out.

Tests:
- host-mode-public-ingest-1124.test.ts: drives the REAL ingest end-to-end into a
  REAL StorageACKHandler over the same store — confirmed-public ingest →
  _shared_memory populated → signed quorum-eligible ACK (goes RED against the
  pre-fix code). Negative controls: curated (accessPolicy=1) AND public-read/
  restricted-publish (accessPolicy=0, publishPolicy=0) → _shared_memory empty →
  NO_DATA_IN_SWM.
- host-mode-quorum-bridge-1124.test.ts: scope narrowed — it isolates the
  collector-quorum link and points to the agent test as the real-ingest guard
  (no longer asserts the unverified "this is what ingest writes" claim).
- issue-306-787-write-quad-validation.test.ts: reuse the shared live-daemon
  helper instead of a duplicated startup harness (🔵 nit).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(Branimir review follow-on)

The force-fresh publishPolicy read (the strict choice from the #1 fix) runs on
EVERY host-mode envelope via isConfirmedPublicForHostMode at the top of
ingestSwmHostModeEnvelope — but `confirmedPublic` only gates anything when
!isCiphertext, and verifyHostModeEnvelopeAuthority ignores
allowSelfSignedForPublicCg whenever an allowlist exists. So the dominant
ciphertext/curated path was paying a synchronous eth_call to compute a
value it discards, and the public-plaintext path had no rate cap (spammed
gossip could amplify into per-message chain RPCs) — undercutting the feature
meant to scale public CGs.

Two changes, keeping the strictness:
1. Lazy-compute: `const confirmedPublic = !isCiphertext && await
   isConfirmedPublicForHostMode(...)` — skips the policy resolution entirely
   for ciphertext (security-preserving: a ciphertext-on-public envelope just
   stays on the curated path / opaque append and heals via catchup).
2. Short cache window instead of force-every-time: replace
   getContextGraphOnChainPolicy's `forcePublishPolicyChainRead` with
   `publishPolicyMaxCacheAgeMs`; the admission gate passes 5s
   (HOST_MODE_PUBLISH_POLICY_MAX_CACHE_AGE_MS). This bounds open→curated
   downgrade staleness to seconds AND rate-caps the chain RPC to ~1 per window
   per CG (the resolver writes through to the same cache). Still fail-closed on
   RPC error; accessPolicy stays immutable/un-TTL'd.

Tests: on-chain-policy reworked to prove within-window→cached/no-RPC,
beyond-window→re-verify, and the 6s entry still fresh under the 60s default.
host-mode-public-ingest: the gate passes a short window (≤10s), and a real
CIPHERTEXT envelope triggers ZERO getContextGraphOnChainPolicy calls.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Bojan131

Copy link
Copy Markdown
Contributor Author

Good catch — fixed both parts in f850ea5fe (fix-first, as you leaned).

1. Lazy confirmedPublic. Exactly as you wrote it:

const confirmedPublic = !isCiphertext && await this.isConfirmedPublicForHostMode(storageCgId);

So the dominant ciphertext/curated path resolves confirmedPublic = false without touching the chain — no more compute-then-discard eth_call. Security-preserving: a ciphertext-on-public envelope just stays on the curated authority path / opaque append and heals via catchup. New test asserts a real ciphertext envelope drives zero getContextGraphOnChainPolicy calls.

2. Kept the strictness, dropped the per-message RPC on the public-plaintext path too. Took your "short dedicated freshness window" over force-every-time. Replaced forcePublishPolicyChainRead with publishPolicyMaxCacheAgeMs; the admission gate passes 5s (HOST_MODE_PUBLISH_POLICY_MAX_CACHE_AGE_MS). Because the resolver writes through to the same cache, this rate-caps the chain RPC to ~1 per window per CG — so spammed public-plaintext gossip can't amplify into per-message eth_calls — while bounding open→curated downgrade staleness to seconds (not 60s). Still fail-closed on RPC error; accessPolicy stays immutable/un-TTL'd. Reworked the resolver test to prove within-window→cached/no-RPC, beyond-window→re-verify, and that a 6s entry is still fresh under the 60s default (the window is a per-caller tightening, not a global change).

Net: the gate is unchanged in strictness, ciphertext traffic pays nothing, and the public-plaintext path is rate-capped. 163 agent SWM/policy/signer tests green.

Comment thread packages/publisher/src/workspace-handler.ts Outdated
Comment thread packages/publisher/src/workspace-handler.ts
…list + enforce policy in the verifier (otReviewAgent)

Two coupled findings on verifyHostModeEnvelopeAuthority:

🔴 The self-signed public exception only fired when getContextGraphAgentGateAddresses()
returned null. A public-readable CG flipped on-chain to publishPolicy=1 (open
publish) WITHOUT clearing its old participantAgents has a non-null (stale)
allowlist, so the verifier fell into the curated branch and dropped valid open
publishers — host-mode ACK quorum stayed unreachable for that valid open state,
even though the contract's isAuthorizedPublisher ignores participants for open
publish.

🟡 The exception was a trusted boolean (allowSelfSignedForPublicCg) decoupled
from the security precondition (the forced-fresh both-axes check lived in the
agent, enforced only by comments) — any caller could set it and get self-signed
acceptance.

Fix both: replace the boolean with `resolveOpenPublishPolicy` — the caller
injects the on-chain policy resolver and the VERIFIER itself enforces
accessPolicy===0 && publishPolicy===1 (fail-closed on undefined/throw), then
takes the self-signed path INDEPENDENTLY of agentGateAddresses. The agent passes
the same forced-fresh, ~5s-window resolver, and only for non-ciphertext (the
lazy curated path still pays no chain read; the resolver shares the publishPolicy
cache window so it's a warm hit, never a 2nd RPC).

Tests (workspace-handler-host-mode-authority): open-publish + stale allowlist →
accepted; same non-allowlisted signer with publishPolicy!=1 → curated reject
(SIG_VERIFY_FAILED); thrown resolver → exception NOT granted (fail-closed).
163 agent + 77 publisher tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/agent/src/dkg-agent-swm-host.ts
Comment thread packages/publisher/test/host-mode-quorum-bridge-1124.test.ts
…ntity gate in quorum test (otReviewAgent)

🔴 Public-CG host-catchup forgery. Now that public plaintext is host-mode-stored
and served via catchup, the catchup-apply path (handle with trustedReplay)
reached a member from an UNTRUSTED relaying host with the
publisherPeerId===fromPeerId transport bind skipped — and for a public CG
(no agent gate) handle() applied the plaintext with ZERO signature verification.
A malicious host could fabricate brand-new public bytes (never seen by any
member's live ingest gate) and have members apply them.

Fix (workspace-handler.ts handle): add a public-CG self-signed authority gate
that fires ONLY on trustedReplay — require a valid envelope.agentAddress and
verifyAgentEnvelope against the claimed signer as its own one-entry allowlist
(skipTimestampFreshness, since catchup replays aged envelopes; the signature +
verifyAgentEnvelope's envelope.contextGraphId===contextGraphId bind still hold).
Scoped to trustedReplay so the LIVE public path is unchanged — it still accepts
the legacy unsigned-public producer, bound by the live publisherPeerId===
fromPeerId transport check. (Reviewed design caught that an unconditional gate
would silently fail-close legitimate unsigned-public live writes + break an
existing test.)

Residual (documented, inherent to open-publish): publisherPeerId OWNERSHIP
attribution is not cryptographically authenticatable on catchup for an
open-publish CG (libp2p peerId vs EVM agentAddress, no cross-binding). Bounded:
attacker needs a valid agent key; open publish lets anyone write anyway; on-chain
finalization is the system of record.

🟡 Quorum test wired the production identity gate. host-mode-quorum-bridge: the
collector previously ran with identity verification OFF, so it reached quorum on
mere signature shape. Now verifyIdentity is wired against the generated
(identityId→signer) registration, proving the ACKs are on-chain-submittable; a
load-bearing negative shows unregistered signers are rejected.

Tests: catchup signed→applied, unsigned→rejected, claimed!=recovered→rejected,
LIVE unsigned→still applies (regression guard). 82 publisher + 136 agent green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread packages/publisher/src/workspace-handler.ts Outdated
Comment thread packages/publisher/test/host-mode-quorum-bridge-1124.test.ts Outdated
…gate through the real collector (otReviewAgent)

🟡 The HostModeRejectionCode / HostModeEnvelopeAuthorityVerdict types were
inserted between SharedMemoryApplyOutcome's JSDoc and the type, orphaning the
contract doc. Moved them below SharedMemoryApplyOutcome so each exported type
keeps its own documentation.

🟡 The quorum identity-gate negative control defined a local verifyIdentity and
called it directly, so it didn't actually exercise ACKCollector's gate (would
pass even if collect() stopped calling verifyIdentity). Rewrote it to drive the
REAL ACKCollector wired with verifyIdentity rejecting every signer, asserting
collect() fails to reach quorum. Identity rejection is non-retryable, so it
fails fast (87ms, no ~31s transient-decline budget). 66 publisher tests green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@otReviewAgent otReviewAgent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Operational Notice: Review Agent could not complete this review.

Synthesizer produced only invalid comment anchors.

@branarakic branarakic merged commit a219686 into main Jun 19, 2026
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants