fix(agent): stop unbounded agents/_meta growth from profile heartbeats (#1233)#1234
fix(agent): stop unbounded agents/_meta growth from profile heartbeats (#1233)#1234Jurij89 wants to merge 1 commit into
Conversation
| * minutes under a fresh UAL would otherwise grow `agents/_meta` without | ||
| * bound and stall offset-0 sync (the agents slice of #1221). | ||
| */ | ||
| persistPublishMeta?: boolean; |
There was a problem hiding this comment.
🟡 Issue: This makes an agents-registry-specific storage policy a public, caller-controlled PublishOptions switch. That pushes the invariant out to each caller (ProfileManager has to remember persistPublishMeta: false) while GossipPublishHandler hard-codes the same exception separately, so future registry publish paths can drift and arbitrary callers can also suppress _meta for normal local publishes. Consider centralizing this as internal policy, e.g. a shared shouldPersistTentativePublishMeta(contextGraphId)/system-graph helper used by both publisher and gossip paths, or keep the opt-out out of the public options surface.
| await this.store.deleteBySubjectPrefix(dataGraph, prefix); | ||
| } | ||
|
|
||
| const options: PublishOptions = { |
There was a problem hiding this comment.
🟡 Issue: The added validation never exercises the production ProfileManager.publishProfile() path that supplies persistPublishMeta: false; the publisher test calls publisher.publish() directly with the option, and the gossip test covers remote receipt. If this call site is later dropped or miswired, the new tests would still pass while local agent heartbeat publishes resume filling agents/_meta. Add a ProfileManager-level regression that publishes a profile through publishProfile() and asserts did:dkg:context-graph:agents/_meta stays empty while the agents data graph is populated.
369f94f to
5831eef
Compare
| const result = await publisher.publish({ contextGraphId: cg, quads: profileQuads(cg) }); | ||
|
|
||
| // The publish still succeeds and accounts the entity... | ||
| expect(result.kaManifest[0]?.rootEntity).toBe(ENTITY); |
There was a problem hiding this comment.
🟡 Issue: The publisher regression test proves the agents path returns a manifest and leaves agents/_meta empty, but it never verifies that the agent profile quads are still persisted. A regression that accidentally skips both the data write and the metadata write for SYSTEM_CONTEXT_GRAPHS.AGENTS would still pass this test. Add the same data-graph count/query assertion used in the gossip test so the validation covers the full intended contract: data graph is written, _meta is not.
5831eef to
5fbe5fa
Compare
|
Pushed 1. Centralized the policy (publisher.ts thread). Done exactly as suggested — removed the public 2. ProfileManager regression (profile-manager.ts thread). Added 3. Data-write assertion (publisher test thread). The AGENTS case now also asserts the data graph is written, so a regression that skipped both the data and Tests: gossip handler 13/13, publisher 2/2, profile-manager 1/1. CI note. The failing |
| // per-publish tentative `_meta` record has no consumer (agent facts are | ||
| // served from the DATA graph); persisting one per heartbeat would grow | ||
| // `agents/_meta` without bound and stall offset-0 sync. Skip it there. | ||
| if (!isAgentRegistryContextGraph(contextGraphId)) { |
There was a problem hiding this comment.
🟡 Issue: This guard only covers the local publish() terminal branch, but ProfileManager.publishProfile() switches to publisher.update() after the first successful profile publish (currentKcId is set). The local-only update path still calls storeUpdatedQuads(), which restates label metadata into did:dkg:context-graph:agents/_meta, so the production heartbeat can still create agents _meta rows even though the new integration test only exercises the first publish. Please either apply the same agents-registry predicate to the update/restatement path or adjust the profile heartbeat flow, and extend the ProfileManager test to call publishProfile() twice so this cannot drift again.
| }); | ||
| const profileManager = new ProfileManager(publisher, store); | ||
|
|
||
| await profileManager.publishProfile({ |
There was a problem hiding this comment.
🟡 Issue: This end-to-end guard only exercises the first profile write. In production, ProfileManager.publishProfile sets currentKcId after that call and subsequent heartbeats go through publisher.update, whose local update path still restamps metadata into the context graph _meta via restateLabelGraphForUpdate. So these tests can pass while the actual repeated heartbeat path resumes writing to agents/_meta. Add a second publishProfile call/assertion here, or a direct publisher.update coverage case, so the validation covers the real heartbeat/re-publish path.
There was a problem hiding this comment.
🟡 Issue: This test does not actually exercise the publisher.update() / restateLabelGraphForUpdate(skipMeta) path it is meant to guard. With the NoChainAdapter setup, DKGPublisher.publish() returns kaId: 0n, and ProfileManager only calls update() when currentKcId is truthy, so the second publishProfile() call goes through publish() again. That leaves the new agents update-path behavior effectively untested; use a publisher spy/mock that returns a nonzero kaId, or add a direct update() regression test for the agents CG.
This test does not actually exercise the publisher.update() path it claims to cover. With NoChainAdapter, the first DKGPublisher.publish() returns kaId: 0n, and ProfileManager only calls update() when currentKcId is truthy, so the second publishProfile() call falls back to another publish(). A regression in the new update/restatement skipMeta plumbing would still pass this test. Force a non-zero KC id or directly drive DKGPublisher.update() / the update handler and assert agents/_meta stays empty.
5fbe5fa to
8b10a2e
Compare
|
Pushed You're right: Fix — the same agents-registry predicate now gates every
So all four Test — |
| * profile heartbeat goes through this update() path and would otherwise keep | ||
| * filling `agents/_meta` without bound. | ||
| */ | ||
| skipMeta?: boolean; |
There was a problem hiding this comment.
🟡 Issue: restateLabelGraphForUpdate is exported from the publisher package, so this generic skipMeta?: boolean turns an agents-only storage rule into caller-controlled behavior. A future non-agents caller can accidentally skip root/privateRoot/merkleRoot and materialized-version metadata, leaving the label graph hard to query and the version guard unadvanced. Prefer deriving the agents exception inside the helper from the context/meta graph, or keep the flag behind an internal agents-specific wrapper.
| * per-publish KC/KA `_meta` tracking record has no consumer — agent facts are | ||
| * served from the DATA graph. Persisting that `_meta` on every heartbeat | ||
| * (locally, and once per gossiped peer heartbeat) grows `agents/_meta` without | ||
| * bound (#1233). Both `_meta` write sites — the publisher's local path and the |
There was a problem hiding this comment.
🟡 Issue: This comment says there are only two _meta write sites, but packages/publisher/src/publish-handler.ts still calls generateTentativeMetadata unconditionally for incoming /dkg/10.0.0/publish requests. Even if the heartbeat path currently uses gossip, the new central predicate documentation is already incomplete, which makes future drift likely. Either include/gate that handler too or narrow the comment to the paths this PR actually covers.
8b10a2e to
cfe9d78
Compare
|
Pushed
|
| * filling `agents/_meta` without bound. The exception is derived internally | ||
| * from this id, so it is never a caller-controlled meta-suppression knob. | ||
| */ | ||
| contextGraphId: string; |
There was a problem hiding this comment.
🔴 Bug: restateLabelGraphForUpdate is exported publicly from packages/publisher/src/index.ts, but this PR adds a required contextGraphId option without updating existing direct callers or preserving compatibility. Several in-repo tests still call it without this field (for example packages/publisher/test/multi-root-token-rows.test.ts:211, packages/publisher/test/materialization-lock.test.ts:159, and packages/random-sampling/test/ka-extractor.test.ts:799), so type-checking consumers will break; plain JS callers will pass undefined, causing the agents-registry skip to be bypassed and reintroducing _meta writes. Make this optional with a safe derivation/default, or update every caller and treat the API break deliberately.
restateLabelGraphForUpdate is exported from the publisher package, and this change makes contextGraphId a new required option. Existing direct callers still use the old shape (for example the publisher/random-sampling test suites), and external TypeScript consumers will now fail to compile even though dataGraph/metaGraph already identify the context graph. Preserve the existing contract by deriving the agents-registry case from the graph URI, or make the new field optional with a default non-agents path and update call sites separately.
| // #1233 — the agents registry CG never persists `_meta` (no consumer; it | ||
| // is served from the data graph). `metadataQuads` is still generated for | ||
| // the expire-timeout cleanup signature below, but skip the actual write. | ||
| if (!isAgentRegistryContextGraph(contextGraphId)) { |
There was a problem hiding this comment.
🟡 Issue: The added validation does not exercise this modified receive-side PublishHandler path, nor the analogous UpdateHandler restatement path at packages/publisher/src/update-handler.ts:270. The new tests cover local DKGPublisher publish/update, ProfileManager.publishProfile, and GossipPublishHandler, so a regression in stream receive handling or contextGraphId plumbing for agents-registry publish/update messages would still pass. Add targeted handler tests that assert agents _meta stays empty and a normal CG still writes _meta for both receive publish and receive update.
cfe9d78 to
a0ac7cb
Compare
|
Pushed
|
| // `_meta` re-stamp + merkleRoot + materialized-version writes, so the profile | ||
| // heartbeat's update() path doesn't resume filling agents/_meta. Callers that | ||
| // omit `contextGraphId` (external consumers) keep the standard write path. | ||
| if (contextGraphId && isAgentRegistryContextGraph(contextGraphId)) return true; |
There was a problem hiding this comment.
🔴 Bug: Skipping the agents _meta restamp leaves restateLabelGraphForUpdate without a current source of prior root entities. The cleanup above still discovers old roots only from metaGraph, so after an agents update that changes rootEntity (wallet/key rotation or legacy peer-id form to address form), replicas can leave the previous profile subject in the agents data graph and registry discovery may show stale/duplicate agents. The local ProfileManager has a peerId data-graph cleanup, but receive-side updates do not. Add an agents-specific data-graph cleanup/index before this return, or keep a bounded per-agent root mapping instead of per-publish _meta.
| } | ||
|
|
||
| // ── Tentative lifecycle timeout ── | ||
| const timeout = setTimeout( |
There was a problem hiding this comment.
🔴 Bug: The agents-registry branch skips writing tentative _meta, but this direct receive path still registers the publish as pending and schedules the normal tentative timeout. Because agents publishes never confirm on-chain, expireTentativePublish will not find a confirmed status and will delete the just-stored profile data after TENTATIVE_TIMEOUT_MS; restored journal entries have the same cleanup behavior. Skip the pending/timeout lifecycle for isAgentRegistryContextGraph(contextGraphId) or make expiry a no-op for that CG, and add a timer/journal regression test.
| const pub = await publisher.publish({ contextGraphId: cg, quads: profileQuads(cg) }); | ||
| await publisher.update(pub.kaId, { contextGraphId: cg, quads: profileQuads(cg) }); | ||
|
|
||
| expect( |
There was a problem hiding this comment.
🟡 Issue: This normal-CG update control is not isolating the update path. The preceding publish() already writes _meta, so countQuads(metaOf(cg)) > 0 would still pass if the update/restatement path stopped writing normal-CG metadata entirely. That leaves the regression guard for the new contextGraphId propagation ineffective. Capture the meta count after publish and assert it changes in an update-specific way, or use an update payload/metadata assertion that can only be produced by restateLabelGraphForUpdate.
…eats (#1233) The agents system context graph never confirms on-chain, so every agent-profile publish — a 5-minute heartbeat, fanned out across peers via gossip — wrote a tentative `_meta` tracking record under a fresh, never-repeated UAL, and nothing ever pruned superseded ones. agents/_meta grew without bound (112k quads / ~386 records per agent on a long-lived node), making the CG expensive to serve and stalling offset-0 sync — the agents-CG-only slice of #1221. The per-publish `_meta` record has no consumer for the agents registry: agent facts are served from the DATA graph (ContextGraphMetaProjection never enumerates the per-KA `_meta` index) and the CG never confirms, so the record is write-only. A consumer audit confirmed the DATA graph itself IS load-bearing (cross-peer SWM encryption keys, peer discovery, curator/membership sync, the agent directory) and must stay; only the `_meta` record is removable. Skip persisting it at the two highest-volume, side-effect-free paths, gated on a single shared `dkg-core` predicate `isAgentRegistryContextGraph`: - the gossip publish receiver — the DOMINANT source, re-inserting a stub per received peer heartbeat (the offset-0 storm); - the local publish terminal. The data graph is always written; every other context graph is unaffected. Scoped deliberately: the update-restatement and direct-protocol receive paths are NOT skipped here — there the `_meta` is load-bearing (it is the source for prior-root cleanup on rootEntity change, and it drives the tentative-expiry lifecycle), so simply skipping it would regress those. Bounding/pruning the record on those paths (and a one-time prune of existing stores) is tracked as a follow-up in #1233. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01G6SgdmcAV3AfpWQ3mYiFx6
a0ac7cb to
41ce51f
Compare
|
Reworked to address the load-bearing- You're right that on the update-restatement and direct-protocol receive paths the Kept (gated on
Reverted to base (was the source of both 🔴s): the update-restatement ( Follow-up (#1233): bound/prune the per-agent record on those remaining paths (rather than skip it, so prior-root cleanup + lifecycle keep working), plus a one-time prune of existing stores. Tests: gossip handler 13/13, publisher opt-out 2/2, ProfileManager 1/1; the reverted suites are back to base, all green. |
otReviewAgent
left a comment
There was a problem hiding this comment.
Review Agent completed this review and found no issues.
Summary
agentssystem context graph never confirms on-chain, so every agent-profile publish — a 5-minute heartbeat, fanned out across peers via gossip — appended a tentative_metatracking record under a fresh, never-repeated UAL, and nothing ever pruned superseded ones.agents/_metagrew without bound (112k quads / ~386 records per agent on a long-lived node), making the CG expensive to serve and stalling offset-0 sync — the agents-CG-only slice of Residual ~9–10-core oxigraph peg in rc.18 despite #1164 (A1–A4), under live multi-peer sync on an operation-heavy CG #1221._metarecord has no consumer for the agents registry: agent facts are served from the data graph and the CG never confirms. An adversarial consumer audit confirmed the agents data graph itself IS load-bearing (cross-peer SWM encryption keys, peer discovery, curator/membership sync, the agent directory) — so only the_metarecord is safe to touch, not the CG.dkg-corepredicateisAgentRegistryContextGraph:_metais load-bearing (it is the prior-root source for data-graph cleanup onrootEntitychange, and it drives the tentative-expiry lifecycle), so skipping it would regress those. Bounding/pruning the per-agent record on those paths — plus a one-time prune of existing stores — is the follow-up.Related
Sync page retrystorm).Files changed
packages/core/src/genesis.ts+index.tsisAgentRegistryContextGraph(cgId)helper (single source of truth)packages/agent/src/gossip-publish-handler.ts_metainsert for the agents registry CG (the dominant source)packages/publisher/src/dkg-publisher.tspackages/agent/test/gossip-publish-handler.test.ts_meta; non-registry CG keeps itpackages/agent/test/profile-manager-meta-skip.test.tsProfileManager.publishProfile()populates the data graph but leavesagents/_metaemptypackages/publisher/test/dkg-publisher-meta-optout.test.ts_meta; a normal CG keeps_metaTest plan
vitest run gossip-publish-handler(agent) — 13/13vitest run profile-manager-meta-skip(agent) — 1/1vitest run dkg-publisher-meta-optout(publisher) — 2/2Tornado: agentswm-ack-quorum/sync-responder;Bura: clidaemon-http-behavior-extraSIGINT timing), green onmainand untouched by this PRstatus="tentative"stubs in 2h33m pre-fix; the gossip-receiver guard removes the dominant peer-fanned source.🤖 Generated with Claude Code