Skip to content

feat(agent,cli): gate private-CG host-mode custody on participation (non-member cores hold zero ciphertext)#1190

Closed
branarakic wants to merge 5 commits into
mainfrom
feat/host-mode-participant-gate
Closed

feat(agent,cli): gate private-CG host-mode custody on participation (non-member cores hold zero ciphertext)#1190
branarakic wants to merge 5 commits into
mainfrom
feat/host-mode-participant-gate

Conversation

@branarakic

Copy link
Copy Markdown
Contributor

What

A core node that is not a participant (curator or member) of a private context graph no longer custodies that CG's live shared-memory (SWM) ciphertext. Today host-mode (LU-6) custody is core-only and auto-engages for any curated CG a core discovers via the chain-event / discovery-beacon path — so any third-party core ends up holding ciphertext for CGs it has no role in. This gates that on participation: a non-participant core declines, holding zero ciphertext. Members back-fill from the curator (REPLACE-recovery), which is already the intended private-CG catch-up path.

How

  • isNodeParticipantOfCg(cgId) (new, dkg-agent-cg-resolve.ts) — true iff this node is the curator OR a local-_meta member OR an on-chain participant. ID-shape-robust (cleartext vs wire-hash) and positive-memoized (participantCgIds); a genuine participant resolves locally, so it never over-blocks its own custody.
  • Three gates in dkg-agent-swm-host.ts, all curated-only (they sit after the existing if (!curated) return, so public CGs are untouched):
    1. reconcileSwmHostModeSubscription — the primary gate: decline the auto-host subscribe (this alone starves both ingest paths, which the same handler wires).
    2. ingestSwmHostModeEnvelope — defensive: the host-mode .meta surface.
    3. ingestSwmCiphertextChunkEnvelope — defensive: the LU-11 chunk surface.
  • swmHostMode.stripNonParticipants (default on) — rollout kill-switch + A/B test control.
  • CLI plumbing fix — the entire swmHostMode config block was never forwarded config.json → agent (DKGAgent.create omitted it), so the block was inert. Now plumbed (lifecycle.ts + DkgConfig type).

Validation

  • Unit: host-mode-participant-gate.test.ts — 7/7 (curator/member ⇒ host; bystander ⇒ decline; reconcile wires vs declines).
  • Devnet (scripts/devnet-test-swm-strip.sh, 4 cores): side-by-side baseline node (strip-off) hosts the CG (Δ2 .meta) vs strip node (strip-on) holds zero (Δ0) — the discriminator that makes "zero" meaningful; the member converges from the curator; and it still converges with both bystander cores stopped (curator is the sole holder).

Scope

This is the live SWM ciphertext. The VM-payload chunked ciphertext (published KA chunks) is a separate surface (agent.share never chunks — confirmed) and is follow-up work; the defensive Path-3 ingest gate already covers it. Operationally this depends on the curator-recovery back-fill (separate change) being present.

🤖 Generated with Claude Code

Branimir Rakic and others added 2 commits June 15, 2026 20:12
…o private ciphertext

A core that learns of a private CG via the chain-event / discovery-beacon
auto-host path no longer custodies its SWM ciphertext: host-mode subscription
is gated on isNodeParticipantOfCg (curator OR local-meta member OR on-chain
participant; id-shape-robust, positive-memoized). Members backfill from the
curator (REPLACE-recovery), so a third-party core holds nothing.

- dkg-agent-cg-resolve: isNodeParticipantOfCg helper; dkg-agent-base: participantCgIds memo.
- dkg-agent-swm-host: decline host-mode subscribe for non-participants on private CGs.
- swmHostMode.stripNonParticipants flag (default on) — rollout kill-switch + A/B baseline.
- cli: plumb the swmHostMode config block through to the agent (was inert before,
  so the kill-switch had no effect via config.json).
- unit test (7/7) + devnet harness: side-by-side baseline (strip-off core hosts,
  Δ2) vs strip (strip-on core holds zero, Δ0), member backfill, and convergence
  with both bystander cores absent.

Scope: SWM half only. VM-payload private ciphertext still reaches cores (M5/M6).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…nk/public checks

Gate ingestSwmHostModeEnvelope (host-mode .meta surface) and
ingestSwmCiphertextChunkEnvelope (LU-11 chunk surface) on isNodeParticipantOfCg
so a non-participant core holds zero private ciphertext even if a stray re-flood
or a restored persisted subscription wires the handler (the transitional /
rolling-upgrade case the Path-2 subscribe decline doesn't cover on its own).
Both are curated-only paths, so public CGs are untouched.

Harness: assert the chunk-store surface is also empty on the strip node (SWM
writes go via agent.share which never chunks — the chunk store is the VM-payload
M6 surface — so this corroborates zero leak on both surfaces), a public-CG no-op
gate (the strip gate never fires for accessPolicy=0), and a wire-hash DECLINE
log match.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* isn't wrongly kept out of its own custody. Keyed by whatever id form the
* caller passes (cleartext or wire-hash) — both can map to the same true.
*/
protected readonly participantCgIds = new Set<string>();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This positive-only cache is not actually safe for the lifetime of the process. The repo already supports removing/revoking participants (remove-participant, revokedAgents), so once a node resolves true here it will keep passing isNodeParticipantOfCg() for the rest of the session and continue ingesting/serving private SWM after access was revoked. Either invalidate this set on CG ACL/curator updates or avoid memoizing positive answers until that invalidation path exists.

// ON; `stripNonParticipants:false` restores legacy auto-host (kill-switch /
// A/B baseline).
const stripNonParticipants = this.config.swmHostMode?.stripNonParticipants ?? true;
if (stripNonParticipants && !(await this.isNodeParticipantOfCg(contextGraphId))) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: Declining here only prevents a new auto-subscribe; it does not unwind an existing one. On nodes that already hosted this CG before upgrade / before stripNonParticipants was enabled, initializeSwmHostModeStore() restores the persisted host-mode subscription first, and this early return leaves that handler plus any stored ciphertext in place. The node can still serve those envelopes via host catchup, so the strip does not actually enforce zero custody. Consider explicitly unwireSwmHostModeHandler() and clearing/purging stored host-mode state when a reconciled CG is now non-participant.

// direct call wired this handler. The Path-2 subscribe decline is the
// primary gate; this closes the residual ingest surface.
const stripNonParticipants = this.config.swmHostMode?.stripNonParticipants ?? true;
if (stripNonParticipants && !(await this.isNodeParticipantOfCg(storageCgId))) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This defense-in-depth drop ignores how the handler was wired, so it also fires for enableSwmHostModeFor() subscriptions. That breaks the documented manual override: the API can report host-mode enabled, but every incoming envelope (and the matching LU-11 chunk path below) is discarded. If manual host-mode is meant to remain an explicit override, preserve the subscription source and bypass this gate for SUBSCRIPTION_SOURCES.MANUAL.

The Path-2 subscribe decline is the proven core of the strip (a non-participant
core never wires the host handler, so it never ingests on either surface — devnet
G-strip Δ0). The extra participant checks inside ingestSwmHostModeEnvelope and
ingestSwmCiphertextChunkEnvelope were defence-in-depth for the transitional /
direct-call case, but they changed behaviour that existing wiring tests pin
(lu11-chunk-catchup-wiring asserts the ingester persists), and they aren't needed
for the SWM claim: SWM writes go via agent.share which never chunks, so the chunk
store is the VM-payload (M6) surface, not SWM. Defer those gates to the M6
follow-up where the affected tests are updated deliberately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* negative is re-evaluated so a freshly-joined member self-heals.
*/
async isNodeParticipantOfCg(this: DKGAgent, contextGraphId: string): Promise<boolean> {
if (this.participantCgIds.has(contextGraphId)) return true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This positive-only memo never gets invalidated, but participation can shrink during a session. For example, removeAgentFromContextGraph() tombstones the agent and re-queues SWM subscription reconciliation; after this change isNodeParticipantOfCg() will still short-circuit to true, so a revoked core can keep or re-enable host-mode custody for that private CG. Please either clear this cache on membership/owner revocation changes or stop memoizing true across the whole process lifetime.

…(Codex #1190)

The persisted-subscription restore path (initializeSwmHostModeStore) re-wired the
host-mode handler directly, bypassing the live subscribe gate — so on a rolling
upgrade a core that hosted a private CG under the old 'any core auto-hosts'
behaviour would re-engage custody on restart, defeating zero-custody for already-
hosting nodes. Gate the restore on isNodeParticipantOfCg (same check as the
subscribe path) and shed the stale persisted record via enqueueHostModePersistence
so it stops re-arming each boot. Closes the transitional/upgrade hole.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* negative is re-evaluated so a freshly-joined member self-heals.
*/
async isNodeParticipantOfCg(this: DKGAgent, contextGraphId: string): Promise<boolean> {
if (this.participantCgIds.has(contextGraphId)) return true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This positive-only cache never gets invalidated when the local node is removed from a private CG. reconcileSharedMemoryGossipSubscription() can re-run after a participant revocation, but once a CG id lands here isNodeParticipantOfCg() will keep returning true and the core will continue host-mode custody until restart. Either avoid memoizing positives, or clear this entry whenever membership / revokedAgents state changes.

// it stops re-arming each boot. (This is the upgrade path the live
// subscribe gate can't reach, since restore wires the handler directly.)
const stripNonParticipants = this.config.swmHostMode?.stripNonParticipants ?? true;
if (stripNonParticipants && !(await this.isNodeParticipantOfCg(cgId))) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: This restore-time participant gate runs before startup rehydrates subscribedContextGraphs/wireIdToLocalCgId (initializeSwmHostModeStore() happens before rehydrateContextGraphSubscriptions()). Persisted hash-form host-mode entries therefore have no way to resolve back to their on-chain id yet, so isNodeParticipantOfCg(cgId) can false-negative and enqueueHostModePersistence(cgId, false) deletes a valid subscription on every restart. Defer the strip until after rehydration, or restore against a cleartext/canonical id that can be resolved at this point.

Revert the restore-path participant gate too: like the ingest gates it changes
restart/init behaviour that existing tests pin (swm-sender-key-pending-by-agent
'loads persisted pending rows after restart'). The live subscribe gate
(reconcileSwmHostModeSubscription) is the devnet-proven core (G-strip Δ0); the
transitional/upgrade hardening (restore-path shed + ingest early-returns) moves to
a dedicated follow-up PR that updates the affected restart/ingest tests deliberately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@branarakic

Copy link
Copy Markdown
Contributor Author

Closing as superseded — main now carries the full WS-A SWM strip (via #1203 integration/rfc49-full), which is strictly broader than this PR's rung-1 stripNonParticipants participant-gate (see dkg-agent-swm-host.ts:614: "Unlike rung-1's narrower stripNonParticipants gate, WS-A strips for [all non-participants]"). The zero-custody goal this PR targeted is achieved in main by WS-A. No unique unmerged work. Reopen if I've missed something.

@branarakic branarakic closed this Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant