Integration: data-store & query rework (C-track #1150–53 + A1 #1141 + #1155)#1156
Integration: data-store & query rework (C-track #1150–53 + A1 #1141 + #1155)#1156branarakic wants to merge 27 commits into
Conversation
One 1-triple KA publish leaves ~134 resident quads (live-measured); ~97% is bookkeeping, ~30 quads are copies of five values. Combined with hot-path graph-name scans (STRSTARTS(STR(?g)), SELECT DISTINCT ?g — the adapter's own "dominant idle-node CPU cost"), this drives the rc.17 idle-CPU saturation. The RFC specifies: Phase 0 dead code, Phase 1 zero-reader drops (~-24/KA), Phase 2 dedupe via small reader migrations (~-25), Phase 3 aggressive decision points (UAL/token collapse, URN merge, provenance-events flag, ~45-50/KA), and the query-side fixes (graph registry to kill name scans, event-driven reconcilers). Every verdict cites its writers and readers by file:line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements docs/rfcs/ka-metadata-trim.md: one 1-triple KA publish drops from ~134 resident quads to ~50 (~40 with metadata.provenanceEvents=false), with every removed/relocated triple justified by a writer+reader audit and every migrated reader reading both old and new shapes. - Phase 0: dead AssertionPublished writer+gate; orphan kcUal read; dangling partition bnode. - Phase 1: zero-reader drops (kaCount, blockTimestamp, publisherAddress, chainId, blockNumber, tokenId, publicTripleCount, AuthorshipProof block, Publication node, URN type/contextGraph/wasGeneratedBy rows). publishedAt KEPT after adversarial review found the kafka-plugin discovery reader. - Phase 2: KC/KA type rows -> predicate-based counters; single rootEntity (entity alias dropped outside the signed seal); wm/swm pointers written only on divergence; fromLayer/toLayer + wasAssociatedWith derived/optional; publicSnapshotRef collapsed; WM marker updated (not deleted) at VM flip. - Phase 3: UAL+<ual>/<n> collapsed to one node (read-both in resolveKA, access-handler incl. <ual>/<n> fallback for old clients, RS prover, sync, kafka discovery, async-lift, EPCIS, counters); metadata.provenanceEvents config (default true) gating all four lifecycle event writers; ShareTransition dropped (node-ui receipt reads the seal subject, legacy fallback retained); partition CONSTRUCT-copy -> documented minimal shape (REMAP keeps wholesale move). Lifecycle-URN->seal merge DEFERRED with a worked plan (TODO(rfc-ka-trim) at assertionLifecycleUri) after the audit surfaced signed-material collision + identity double-allocation hazards. - Adversarial-review fixes: kafka read-both queries; provenanceEvents gating on discard/update; multi-root private access attestation always matches the served triples (computePrivateRoot fallback); stale re-promote stays a no-op; isAlreadyConfirmed read-both vs the minimal partition shape. Author-signed seal material untouched. Suites green: core 1039, query 261, random-sampling 62, publisher 1150, agent 1226, cli 2031, kafka-plugin 169, node-ui 1456 (+tsc/builds). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- sync-verify: collapsed-shape KCs now Merkle-verified on sync (self-map in both duplicated impls, multi-map roots, dual-shape dedupe guard) — a wrong merkle root on a collapsed KC is now rejected instead of accepted on trust. - node-ui receipt: replaced the never-binding lifecycle join with the member-entity join pinned by the URN/URI tail correspondence; the identical latent bug in the legacy hop fixed too; real executed-query tests added. - multi-root access: conditional collapse — multi-root KAs re-emit per-token pairing rows (manifest order) alongside the collapsed shape, so legacy <ual>/<n> resolves exactly root N with matching attestation; single-root publishes (dominant case) keep the full collapse. Handler serves the first root that has a private bag; F3 recompute guard retained. - graph-viz wasGeneratedBy: documentation-only correction (generic matcher, no stranded feature) — RFC + inline comment amended. Tests: agent unit + 9 sync-path hardhat suites green; publisher full 1232; node-ui full 1423 (+ new executed-query tests); graph-viz 140; builds green. Full agent lane delegated to CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gration/data-store-and-query-rework # Conflicts: # packages/storage/src/adapters/sparql-http.ts # packages/storage/test/sparql-http.test.ts
…ion/data-store-and-query-rework # Conflicts: # packages/agent/src/dkg-agent-lifecycle.ts
…n/data-store-and-query-rework # Conflicts: # packages/agent/src/sync/responder/sync-handler.ts
Both fail on their own source branches in isolation; neither is a behavior regression. Fixed as part of assembling integration/data-store-and-query-rework so the combined suite is green. Should be upstreamed to the source PRs. - sync-responder-concurrent-interleaving (A1 #1141): the multi-graph page-query probe regex omitted DISTINCT, but readSwmMetaRowsPage correctly emits SELECT DISTINCT ?g ... for the multi-graph VALUES join. The single-graph arm already tolerated DISTINCT; gave the multi-graph arm parity. - agent.part-13 curator-authority (C2 #1151): C2 repointed getContextGraphCreator at the ContextGraphMetaProjection cache; the test forges foreign-creator state via raw store.insert that bypasses markDirty, leaving the projection stale. Real sync paths call markDirtyFromQuads after insert, so the code is correct; the test now invalidates the projection after its out-of-band write. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Live idle-CPU A/B measurement (handoff context)Ran the projection-flag A/B on a real data-rich node to validate what this branch does and does not fix for idle-node CPU. Posting full results since the work is being handed off. Setup: HariSeldon testnet node, 1.1 GB store, native Result
The projection flag does NOT fix idle CPU. Both phases saturate ~4.5–9 cores; the means are within run noise (the boot-time C1 GraphSetIndexStore seed scan + testnet sync confound the first ~7 samples each phase). There is a real secondary signal: the on phase reached a sustained genuine-idle window (4 consecutive samples @ 1–6%) that off never touched, and median dropped 445%→204% — consistent with the The actual bottleneck (identical in both phases)The C4 (#1153) slow-query telemetry caught the dominant cost: This is the 30-second metrics sweep — Two actionable findings
NetMerging this branch is necessary but not sufficient for idle CPU — empirically confirmed. When Proposal A (the typed graph registry: CG roster + per-CG Reproduce# integration build is packages/cli/dist (already built on this branch)
export DKG_HOME=~/.dkg
DKG_LIST_CONTEXT_GRAPHS_PROJECTION=on node packages/cli/dist/cli.js start # vs unset for baseline
# sample: ps -o %cpu= -p $(pgrep -f oxigraph-v0) ; grep "slow query" ~/.dkg/daemon.log
node packages/cli/dist/cli.js stopDaemon lifecycle is Flag state reminder for the takeover
🤖 Generated with Claude Code |
|
Closing as redundant — Track-C reconciled into Verified at content level (not just patch-id): this branch is 169 commits behind main; 6/7 of its distinctive SWM-admission identifiers exist in main ( |
What
Integration branch combining the data-store volume work (#1155) with the query/scan + sync half of the #1138 scalability wave, assembled so the two halves can be tested, flag-flipped, and measured together — and so RFC Part 2 (Proposal A) can be built on a single stable base instead of four moving branches.
Merged onto
mainin dependency order:listGraphsDISTINCT ?gscanContextGraphMetaProjection(event-driven per-CG meta, Part-2 Proposal B) · projection-backedlistContextGraphs· host-sweep bound + slow-query taggingConflicts resolved (each compiled + tested)
storage/.../sparql-http.ts(C1 ↔ C4) — kept C1's in-adapter listGraphs cache removal (GraphSetIndexStore owns graph-list indexing now) and C4's slow-query telemetry. These were entangled: C4'smaybeEmitSlowQuerysharesthis.now, so the monotonic clock stays for timing while the cache machinery (graphListCache,scanGraphs, in-flight coalescing,LIST_GRAPHS_CACHE_TTL_MS) is dropped.listGraphs()is now a direct scan carrying asourcetag.agent/.../dkg-agent-lifecycle.ts(A1 ↔ C-track) — both appended distinct helpers at the same location (listGraphFamily/listGraphsByPrefixvsgetSharedMemorySubGraphAdmission/isKnownContextGraphUri); kept both.agent/.../sync/responder/sync-handler.ts(Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155 ↔ A1) — A1's bounded-page-serving structure won (Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155's only hunk here was the read-both delta arm). Re-ported Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155's read-both collapsed-UAL arm into A1'sdurableDeltaWhereClauseForGraphs(graph-plan.ts): legacypartOftoken-row arm + collapsed?ualc dkg:rootEntity ; dkg:batchIdarm. Without it, collapsed-shape KAs bind no?deltaBatch→ are re-sent on every delta onceDKG_SYNC_DELTAis enabled (correct, but defeats the optimization).Two test fixes — pre-existing on their source branches, not regressions
Both were proven failing on their own branch in isolation (built + run on A1 and C2 in a throwaway worktree); both are test-only and should be upstreamed to the source PRs:
sync-responder-concurrent-interleaving.test.ts(A1 A1 — sync responder: bound-graph page serving (the ~95% win) #1141): the multi-graph page-query probe regex omittedDISTINCT, butreadSwmMetaRowsPagecorrectly emitsSELECT DISTINCT ?g …for the multi-graphVALUESjoin. The single-graph arm already toleratedDISTINCT; gave the multi-graph arm parity.agent.part-13.test.tscurator-authority (C2 C2 — ContextGraphMetaProjection (one policy cache) #1151): C2 repointedgetContextGraphCreatorat theContextGraphMetaProjectioncache. The test forges foreign-creator state via rawstore.insertthat bypassesmarkDirty, leaving the projection stale → registration rejected with the wrong message. Real sync paths callmarkDirtyFromQuadsafter insert, so the code is correct; the test now invalidates the projection after its out-of-band write.Verification
DKG_LIST_CONTEXT_GRAPHS_PROJECTIONdefaults OFF — C3's projection-backedlistContextGraphsis dead code until enabled. Merging this branch does not by itself reduce the idle-node enumeration cost; the flag must be flipped + measured.listDeclaredContextGraphIds()enumeration scan still survives (live, uncachedSELECT DISTINCT ?ctxGraph+STRSTARTSarm), as do the 15-min SWM-cleanup double scan, the 30s metricsCOUNTs, and theSTRSTARTSgraph-name sites. These are the RFC Part 2 / Proposal A residual — a typed graph registry in oxigraph (CG roster + per-CGcg → hasGraph → gmembership) — to be built on top of this branch.Next
DKG_LIST_CONTEXT_GRAPHS_PROJECTION+DKG_SYNC_DELTAon a test node, measure idle CPU, confirm the surviving scans match the Proposal-A residual.🤖 Generated with Claude Code