Integration: data-store & query rework (C-track #1150–53 + A1 #1141 + #1155) by branarakic · Pull Request #1156 · OriginTrail/dkg

branarakic · 2026-06-13T11:40:51Z

What

Integration branch combining the data-store volume work (#1155) with the query/scan + sync half of the #1138 scalability wave, assembled so the two halves can be tested, flag-flipped, and measured together — and so RFC Part 2 (Proposal A) can be built on a single stable base instead of four moving branches.

Merged onto main in dependency order:

Stream	PR	Role
C1	#1150 GraphSetIndexStore	write-through graph-name index; eliminates the steady-state `listGraphs` `DISTINCT ?g` scan
C2/C3/C4	#1151/#1152/#1153	`ContextGraphMetaProjection` (event-driven per-CG meta, Part-2 Proposal B) · projection-backed `listContextGraphs` · host-sweep bound + slow-query tagging
A1	#1141	sync responder bounded-graph page serving (the ~95% sync win)
#1155	feat/ka-metadata-trim	per-KA metadata trim, ~134 → ~50 quads/KA

Conflicts resolved (each compiled + tested)

storage/.../sparql-http.ts (C1 ↔ C4) — kept C1's in-adapter listGraphs cache removal (GraphSetIndexStore owns graph-list indexing now) and C4's slow-query telemetry. These were entangled: C4's maybeEmitSlowQuery shares this.now, so the monotonic clock stays for timing while the cache machinery (graphListCache, scanGraphs, in-flight coalescing, LIST_GRAPHS_CACHE_TTL_MS) is dropped. listGraphs() is now a direct scan carrying a source tag.
agent/.../dkg-agent-lifecycle.ts (A1 ↔ C-track) — both appended distinct helpers at the same location (listGraphFamily/listGraphsByPrefix vs getSharedMemorySubGraphAdmission/isKnownContextGraphUri); kept both.
agent/.../sync/responder/sync-handler.ts (Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155 ↔ A1) — A1's bounded-page-serving structure won (Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155's only hunk here was the read-both delta arm). Re-ported Per-KA metadata trim: ~134 → ~50 quads/KA (RFC + Phases 0–3 implementation) #1155's read-both collapsed-UAL arm into A1's durableDeltaWhereClauseForGraphs (graph-plan.ts): legacy partOf token-row arm + collapsed ?ualc dkg:rootEntity ; dkg:batchId arm. Without it, collapsed-shape KAs bind no ?deltaBatch → are re-sent on every delta once DKG_SYNC_DELTA is enabled (correct, but defeats the optimization).

Two test fixes — pre-existing on their source branches, not regressions

Both were proven failing on their own branch in isolation (built + run on A1 and C2 in a throwaway worktree); both are test-only and should be upstreamed to the source PRs:

sync-responder-concurrent-interleaving.test.ts (A1 A1 — sync responder: bound-graph page serving (the ~95% win) #1141): the multi-graph page-query probe regex omitted DISTINCT, but readSwmMetaRowsPage correctly emits SELECT DISTINCT ?g … for the multi-graph VALUES join. The single-graph arm already tolerated DISTINCT; gave the multi-graph arm parity.
agent.part-13.test.ts curator-authority (C2 C2 — ContextGraphMetaProjection (one policy cache) #1151): C2 repointed getContextGraphCreator at the ContextGraphMetaProjection cache. The test forges foreign-creator state via raw store.insert that bypasses markDirty, leaving the projection stale → registration rejected with the wrong message. Real sync paths call markDirtyFromQuads after insert, so the code is correct; the test now invalidates the projection after its out-of-band write.

Verification

Full build green (turbo, 20/20 packages).
~7,500 tests green across every touched package: storage 209 · query 261 · publisher 1162 · agent 1282 · cli 2021 · kafka-plugin 169 · node-ui 1423.

⚠️ Flag state + remaining work

DKG_LIST_CONTEXT_GRAPHS_PROJECTION defaults OFF — C3's projection-backed listContextGraphs is dead code until enabled. Merging this branch does not by itself reduce the idle-node enumeration cost; the flag must be flipped + measured.
The listDeclaredContextGraphIds() enumeration scan still survives (live, uncached SELECT DISTINCT ?ctxGraph + STRSTARTS arm), as do the 15-min SWM-cleanup double scan, the 30s metrics COUNTs, and the STRSTARTS graph-name sites. These are the RFC Part 2 / Proposal A residual — a typed graph registry in oxigraph (CG roster + per-CG cg → hasGraph → g membership) — to be built on top of this branch.

One 1-triple KA publish leaves ~134 resident quads (live-measured); ~97% is bookkeeping, ~30 quads are copies of five values. Combined with hot-path graph-name scans (STRSTARTS(STR(?g)), SELECT DISTINCT ?g — the adapter's own "dominant idle-node CPU cost"), this drives the rc.17 idle-CPU saturation. The RFC specifies: Phase 0 dead code, Phase 1 zero-reader drops (~-24/KA), Phase 2 dedupe via small reader migrations (~-25), Phase 3 aggressive decision points (UAL/token collapse, URN merge, provenance-events flag, ~45-50/KA), and the query-side fixes (graph registry to kill name scans, event-driven reconcilers). Every verdict cites its writers and readers by file:line. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Implements docs/rfcs/ka-metadata-trim.md: one 1-triple KA publish drops from ~134 resident quads to ~50 (~40 with metadata.provenanceEvents=false), with every removed/relocated triple justified by a writer+reader audit and every migrated reader reading both old and new shapes. - Phase 0: dead AssertionPublished writer+gate; orphan kcUal read; dangling partition bnode. - Phase 1: zero-reader drops (kaCount, blockTimestamp, publisherAddress, chainId, blockNumber, tokenId, publicTripleCount, AuthorshipProof block, Publication node, URN type/contextGraph/wasGeneratedBy rows). publishedAt KEPT after adversarial review found the kafka-plugin discovery reader. - Phase 2: KC/KA type rows -> predicate-based counters; single rootEntity (entity alias dropped outside the signed seal); wm/swm pointers written only on divergence; fromLayer/toLayer + wasAssociatedWith derived/optional; publicSnapshotRef collapsed; WM marker updated (not deleted) at VM flip. - Phase 3: UAL+<ual>/<n> collapsed to one node (read-both in resolveKA, access-handler incl. <ual>/<n> fallback for old clients, RS prover, sync, kafka discovery, async-lift, EPCIS, counters); metadata.provenanceEvents config (default true) gating all four lifecycle event writers; ShareTransition dropped (node-ui receipt reads the seal subject, legacy fallback retained); partition CONSTRUCT-copy -> documented minimal shape (REMAP keeps wholesale move). Lifecycle-URN->seal merge DEFERRED with a worked plan (TODO(rfc-ka-trim) at assertionLifecycleUri) after the audit surfaced signed-material collision + identity double-allocation hazards. - Adversarial-review fixes: kafka read-both queries; provenanceEvents gating on discard/update; multi-root private access attestation always matches the served triples (computePrivateRoot fallback); stale re-promote stays a no-op; isAlreadyConfirmed read-both vs the minimal partition shape. Author-signed seal material untouched. Suites green: core 1039, query 261, random-sampling 62, publisher 1150, agent 1226, cli 2031, kafka-plugin 169, node-ui 1456 (+tsc/builds). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- sync-verify: collapsed-shape KCs now Merkle-verified on sync (self-map in both duplicated impls, multi-map roots, dual-shape dedupe guard) — a wrong merkle root on a collapsed KC is now rejected instead of accepted on trust. - node-ui receipt: replaced the never-binding lifecycle join with the member-entity join pinned by the URN/URI tail correspondence; the identical latent bug in the legacy hop fixed too; real executed-query tests added. - multi-root access: conditional collapse — multi-root KAs re-emit per-token pairing rows (manifest order) alongside the collapsed shape, so legacy <ual>/<n> resolves exactly root N with matching attestation; single-root publishes (dominant case) keep the full collapse. Handler serves the first root that has a private bag; F3 recompute guard retained. - graph-viz wasGeneratedBy: documentation-only correction (generic matcher, no stranded feature) — RFC + inline comment amended. Tests: agent unit + 9 sync-path hardhat suites green; publisher full 1232; node-ui full 1423 (+ new executed-query tests); graph-viz 140; builds green. Full agent lane delegated to CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…ery-rework

…gration/data-store-and-query-rework # Conflicts: # packages/storage/src/adapters/sparql-http.ts # packages/storage/test/sparql-http.test.ts

…ion/data-store-and-query-rework # Conflicts: # packages/agent/src/dkg-agent-lifecycle.ts

…n/data-store-and-query-rework # Conflicts: # packages/agent/src/sync/responder/sync-handler.ts

Both fail on their own source branches in isolation; neither is a behavior regression. Fixed as part of assembling integration/data-store-and-query-rework so the combined suite is green. Should be upstreamed to the source PRs. - sync-responder-concurrent-interleaving (A1 #1141): the multi-graph page-query probe regex omitted DISTINCT, but readSwmMetaRowsPage correctly emits SELECT DISTINCT ?g ... for the multi-graph VALUES join. The single-graph arm already tolerated DISTINCT; gave the multi-graph arm parity. - agent.part-13 curator-authority (C2 #1151): C2 repointed getContextGraphCreator at the ContextGraphMetaProjection cache; the test forges foreign-creator state via raw store.insert that bypasses markDirty, leaving the projection stale. Real sync paths call markDirtyFromQuads after insert, so the code is correct; the test now invalidates the projection after its out-of-band write. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

branarakic · 2026-06-13T13:11:12Z

Live idle-CPU A/B measurement (handoff context)

Ran the projection-flag A/B on a real data-rich node to validate what this branch does and does not fix for idle-node CPU. Posting full results since the work is being handed off.

Setup: HariSeldon testnet node, 1.1 GB store, native oxigraph-server backend, booted on this integration branch, idle. Sampled oxigraph-server %CPU 15×8s per phase after 50s stabilization. DKG_SYNC_DELTA is not flippable here — only A1 (#1141) is merged, not A5 (delta activation) — so this is projection-only (DKG_LIST_CONTEXT_GRAPHS_PROJECTION).

Result

`DKG_LIST_CONTEXT_GRAPHS_PROJECTION`	mean oxigraph CPU	median	range
off	454%	445%	12.8 – 894%
on	426%	204%	1.1 – 981%

The projection flag does NOT fix idle CPU. Both phases saturate ~4.5–9 cores; the means are within run noise (the boot-time C1 GraphSetIndexStore seed scan + testnet sync confound the first ~7 samples each phase). There is a real secondary signal: the on phase reached a sustained genuine-idle window (4 consecutive samples @ 1–6%) that off never touched, and median dropped 445%→204% — consistent with the listContextGraphs enumeration being relieved by the projection. But that is not the bottleneck.

The actual bottleneck (identical in both phases)

The C4 (#1153) slow-query telemetry caught the dominant cost:

SELECT (COUNT(DISTINCT ?kc) AS ?c) WHERE { GRAPH ?g { ?kc … } }
   source=unknown  operation=select  elapsedMs≈30001  thresholdMs=10000   (~13×/phase)

This is the 30-second metrics sweep — COUNT(DISTINCT) over all named graphs (metrics-queries.ts + lifecycle.ts:1871-1900, the getTotalKAs/KCs/triples + getContextGraphCount getters). On a 1.1 GB store it TIMES OUT at the 30s SPARQL HTTP limit every 30 seconds. A wildcard GRAPH ?g COUNT(DISTINCT) over a large RocksDB store drives heavy multi-core read amplification + compaction — it is the #1 idle driver, and nothing in the C-track touches it (it is not behind the projection flag).

Two actionable findings

The dominant query is emitted as source=unknown — C4's slow-query tagging does not tag the metrics COUNT, so its own telemetry mis-attributes the single biggest cost. Tag the metrics getters (and any other untagged reconciler reads) so the telemetry is usable.
Priority-1 for the Proposal-A follow-up is the metrics COUNT, not the CG enumeration. A maintained counter (or registry-backed count) for KC/KA/triple totals + CG count kills the 30s timeout sweep. The enumeration/listContextGraphs work the projection already half-relieves is secondary. Per the adversarial review on the design, do NOT hand-maintain these as naive integers without an invalidation contract — they are operator-visible (dashboards) and writes arrive from gossip/sync/migration outside a single chokepoint; back them on the same dirty-set discipline C2's ContextGraphMetaProjection already establishes (markDirty/markDirtyFromQuads).

Net

Merging this branch is necessary but not sufficient for idle CPU — empirically confirmed. When Proposal A (the typed graph registry: CG roster + per-CG cg → hasGraph → g membership, in an oxigraph system graph) is built on top of this branch, it should back the metrics counters first, then repoint the enumeration / STRSTARTS sites.

Reproduce

# integration build is packages/cli/dist (already built on this branch)
export DKG_HOME=~/.dkg
DKG_LIST_CONTEXT_GRAPHS_PROJECTION=on  node packages/cli/dist/cli.js start   # vs unset for baseline
# sample: ps -o %cpu= -p $(pgrep -f oxigraph-v0)   ; grep "slow query" ~/.dkg/daemon.log
node packages/cli/dist/cli.js stop

Daemon lifecycle is dkg start / stop / status (NOT dkg daemon …). Raw 30-sample run is at /tmp/measure-projection.out on the dev box.

Flag state reminder for the takeover

DKG_LIST_CONTEXT_GRAPHS_PROJECTION defaults OFF — C3's projection-backed listContextGraphs is dead code until enabled; merging alone changes nothing for the enumeration.
The read-both collapsed-UAL delta arm was re-ported into A1's durableDeltaWhereClauseForGraphs (graph-plan.ts) but is inert until A5 lands the DKG_SYNC_DELTA activation — correct and ready, just not exercisable on this branch.

🤖 Generated with Claude Code

branarakic · 2026-06-17T11:35:26Z

Closing as redundant — Track-C reconciled into main via #1201 (integration/track-c-merge).

Verified at content level (not just patch-id): this branch is 169 commits behind main; 6/7 of its distinctive SWM-admission identifiers exist in main (createResponderSubGraphRegistrationMemo, createSubGraphNameMemo, filterSharedMemoryMetaQuads, isSharedMemoryBucketDescendantDataGraph, effectiveRegisteredSubGraphNames, subGraphRegistrationMemo), and the 7th — the "honor child CG collisions in SWM admission" fix (3795aec11) — is present in main under renamed identifiers (graph-plan.ts:643: childCgUri + isKnownContextGraph, the same child-CG-collision skip logic). No unique unmerged work found. Reopen if I've missed something.

Jurij Skornik and others added 27 commits June 12, 2026 06:30

fix(agent): bound sync responder graph serving

1e1e318

feat(storage): add graph set index decorator

246a81c

fix(agent): project context graph metadata

601679c

feat(agent): flag projection-backed context graph list

371a8be

fix(agent): bound host sweep and tag sparql queries

5d92563

fix(sync): page responder graph reads in store

5fafada

fix(sync): stabilize responder graph paging

0ebd5c4

fix(sync): align SWM descendant verification

8799b89

fix(sync): harden responder paging and SWM verification

fe88baa

fix(sync): bootstrap SWM subgraph registration

b0250c2

fix(sync): honor child CG collisions in SWM admission

3795aec

fix(sync): keep SWM registration prelude unpaged

f2c1d92

fix(sync): preserve unpaged SWM registration preludes

200cf74

fix(sync): bound SWM registration preludes

d8da674

fix(sync): trust only local SWM subgraph admission

fce7d3c

fix(sync): seed SWM admission from durable registrations

f919e6b

refactor(sync): share SWM descendant graph parser

b5836a3

fix(sync): stabilize SWM admission paging

c5bce43

fix(sync): separate durable registrations from SWM admission

ec9c589

merge: C1 #1150 GraphSetIndexStore into integration/data-store-and-qu…

3ce7476

…ery-rework

merge: C2/C3/C4 (#1151/#1152/#1153) projection + host-sweep into inte…

610f141

…gration/data-store-and-query-rework # Conflicts: # packages/storage/src/adapters/sparql-http.ts # packages/storage/test/sparql-http.test.ts

merge: A1 #1141 sync responder bound-graph page serving into integrat…

0fdead7

…ion/data-store-and-query-rework # Conflicts: # packages/agent/src/dkg-agent-lifecycle.ts

merge: #1155 per-KA metadata trim (data-store volume) into integratio…

4452a33

…n/data-store-and-query-rework # Conflicts: # packages/agent/src/sync/responder/sync-handler.ts

branarakic closed this Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integration: data-store & query rework (C-track #1150–53 + A1 #1141 + #1155)#1156

Integration: data-store & query rework (C-track #1150–53 + A1 #1141 + #1155)#1156
branarakic wants to merge 27 commits into
mainfrom
integration/data-store-and-query-rework

branarakic commented Jun 13, 2026

Uh oh!

branarakic commented Jun 13, 2026

Uh oh!

branarakic commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

branarakic commented Jun 13, 2026

What

Conflicts resolved (each compiled + tested)

Two test fixes — pre-existing on their source branches, not regressions

Verification

⚠️ Flag state + remaining work

Next

Uh oh!

branarakic commented Jun 13, 2026

Live idle-CPU A/B measurement (handoff context)

Result

The actual bottleneck (identical in both phases)

Two actionable findings

Net

Reproduce

Flag state reminder for the takeover

Uh oh!

branarakic commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant