test: reproducing tests for all 25 high/pre-mainnet issues (red while live, green when fixed)#1129
Open
Bojan131 wants to merge 23 commits into
Open
test: reproducing tests for all 25 high/pre-mainnet issues (red while live, green when fixed)#1129Bojan131 wants to merge 23 commits into
Bojan131 wants to merge 23 commits into
Conversation
Encodes confirmed-live GitHub issues from the rc.17 QA sweep as runnable tests using the it.fails / test.fixme convention: each asserts the CORRECT behaviour, fails today (bug live → it.fails reports pass → CI green), and flips RED when fixed (signalling the issue can close). Zero chain/network mocks; written against a real node + live devnet. Tier 1 (run in the normal turbo test CI lanes): #1125 skill.md (dynamic) placeholder — cli/skill-md-dynamic-section #675 #184 sub-graph view scoping — query/subgraph-view-scoping #416 escaper control bytes — core/escape-rdf-literal-control-chars #709 EPCIS document-container in events — epcis/event-type-container-filter #15 .jsonld @context ingest — cli/rdf-parser-jsonld #787 #306 #158 #309 #757 daemon routes — cli/issue-liveness-daemon-routes (real edge daemon vs shared Hardhat) Tier 2 (manual-run devnet suite, pnpm test:devnet:issue-liveness): #705 #923 peer lifecycle-meta replication #872 imported Markdown source-byte replication (a CONTROL test proves SWM replicated, so the it.fails can't pass wrongly) Deferred with rationale (see docs/testing/ISSUE_LIVENESS_TESTS.md): #614 and #1091 are audit-grade contract / design-property issues where a speculative test would give false signal; UI count-caps (#1112/#1113/#1015) and #966 need fixtures too heavy for CI. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 11, 2026
Per manager request — every high-priority issue gets a test that reproduces it (it.fails / it.skip-with-recipe), so a fix can flip it green and stay in CI. Runnable it.fails repros (14): unit — #11 (op-wallets plaintext), #1121 #1122 (async-lift encryption + canonicalization), plus existing #184 #675 #757 devnet — #886 #1093 #1094 #1095 #1096 #1097 #1098 #1104 (devnet/issue-liveness/high-issues.test.ts; 11 pass = bugs live) Documented it.skip stubs with exact repro recipes (11) — where a faithful test needs a fixture/design/topology that doesn't exist yet (a wrong test is worse): #1091 #614 contract/design (grindable seed, billing-window sweep) #1124 host-mode sharded topology (devnet cores are all CG members) #1099 gossip-retention timing (repros on testnet, not fast local devnet) #1013 #936 publisher-runtime / 2-replica-reconcile harness #999 #1008 load-dependent store saturation (verified live on testnet) #723 emergent network-wide RS metric #462 MessageHandler ACL harness (skill_request has no authz) #1078 layer-scoped private-store API The 9 fix-in-flight highs (#886, #1093-#1099, #1104) are fixed on PR #1107 — when it merges their it.fails repros start passing → unwrap them. Full map in docs/testing/ISSUE_LIVENESS_TESTS.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 11, 2026
Turbo runs tasks in a strict env, so `pnpm test:issue-liveness` (`RUN_ISSUE_LIVENESS=1 turbo run test`) would not have reached the vitest process — the gated liveness repros would silently skip (green) instead of running red. Add RUN_ISSUE_LIVENESS to globalPassThroughEnv so the dedicated issue-liveness command actually activates the repros. Verified: storage liveness test goes RED under the command, stays skipped on the default lane. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- cli daemon-routes: gate the `beforeAll` behind RUN_ISSUE_LIVENESS so the default lane no longer spins a real daemon when all repros are skipped; and create the #757 CG as CURATED (accessPolicy:1) so the curator-only access check is actually exercised (an open CG has no curator-moderated join flow). - package.json test:issue-liveness filter: the CLI package is `@origintrail-official/dkg` (not `-cli`), and core was missing — both are now included so the command actually runs every gated liveness suite. - core/escape-rdf, cli/rdf-parser, cli/skill-md (Tier-1 #416/#15/#1125): converted from the dropped `it.fails` convention to the gated plain-`it()` convention for uniformity. #416 now lower-cases the output before comparing, since RDF `\u` UCHAR hex is case-insensitive (a lowercase-hex fix must not keep it red). - random-sampling #1091: wrap the prevrandao/automine RPC mutations in try/finally so `evm_setAutomine(true)` is always restored (an exception mid-flight no longer poisons the shared Hardhat node), and compare the FULL (cgId, kaId, chunkId) draw tuple so a prediction can't look successful while having picked a different context graph. - devnet high-issues: the seed CG is now PUBLIC (accessPolicy:0, renamed SEED_CG) so the #1098/#886 cross-node replication repros aren't masked by the subscriber lacking curated-CG membership. Validated: #416 and #1091 skip by default and go RED under RUN_ISSUE_LIVENESS=1; the real prover e2e still passes (automine restored); all edited files parse. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…filter, control-char nit) - Gate flag now parses explicitly (`process.env.RUN_ISSUE_LIVENESS === '1'`) across all liveness suites, so RUN_ISSUE_LIVENESS=0/false no longer enables the intentionally-red repros. Verified: =0 skips, =1 runs red. - test:issue-liveness filter adds @origintrail-official/dkg-evm-module so the contract liveness file (#614/#1091 recipes) is compiled/executed on the lane. - #416 comment: replaced a raw vertical-tab byte with the literal `�` text (hidden control chars trip diffs/tooling). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… fix-agnostic #11/#15) - devnet high-issues (RED): probe ALL cores for a working publisher (not a fixed candidate set), then pick the pre-subscribed peer AFTER the publisher is known (any core != pubNode). Previously node 2 was reserved as the pre-sub peer and excluded from publishers, so the whole suite aborted if node 2 was the only core that could still reach publish quorum. #1098 still uses a peer distinct from the publisher. - #15 (rdf-parser): assert the fix-agnostic INVARIANT — a `.jsonld` doc with `@context` parses, OR `.jsonld` is no longer advertised — so the documented option-A fix (stop advertising) turns it green instead of leaving it red. - #11 (op-wallets): scan EVERY persisted file under the data dir (and the bare hex form), not just `wallets.json`, so a fix that moves secrets into an encrypted keystore / renames the artifact still turns it green. Verified: #11 and #15 reproduce red under RUN_ISSUE_LIVENESS=1; high-issues parses clean. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…1097/#309) - #1124 (RED): removed the agent unit repro. `ingestSwmHostModeEnvelope` drops a public-CG plaintext share at TWO independent gates — the `isCiphertext` sniff AND the curated-agent authority check (which rejects a CG with no agent allowlist, i.e. the public-CG case). Since the fix must change both, isolating either gate is a false signal (stub authority → false green; don't → false red). #1124 is back to a documented pending stub that needs the host-mode sharded fixture exercising the full public-CG ingest path. - #1097 (RED): assert the one-shot flow actually WORKS — create returns 2xx and publish-by-assertionName returns 200 with a success status — instead of merely `!== 500` (a 404/409/422 would have falsely passed a "flow works" test). - #309 (yellow): assert `defaultAgentAddress` matches a real `0x…40` EVM address rather than just `toBeDefined()` (null/"" would still leave WM-query scoping broken). All-25 map is now 11 CI unit/integration + 8 devnet multi-node + 6 documented pending/emergent (#614 #1099 #1124 #723 #999 #1008). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…flakiness) - #1094: assert the wm/pull-from {layer:vm} edit path actually WORKS — 200 + no error body, then read the KA back as an editable WM draft — instead of merely `!== 500` (a 404/409/422 would have looked fixed). - #1098 / #886: replace fixed 8s/12s sleeps with a pollUntil() against a generous deadline (60s/90s). Replication latency on a slow devnet no longer flips these into latency tests; they only fail if the KA NEVER materializes. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…drop stray artifact) - #1091 (RED): wrap the whole repro in takeSnapshot/revertSnapshot so its chain mutations (REC2 createChallenge, pinned prevrandao, mined blocks) are rolled back and can't leak into the regular prover E2E that reuses the shared Hardhat fixture under RUN_ISSUE_LIVENESS=1. Verified: #1091 still red, prover still passes, state isolated. - Reverted packages/evm-module/deployments/localhost_contracts.json — it was swept into an earlier commit by `git add -u` (a local-deploy artifact with branch/commit/timestamp churn), not part of this PR. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hardhat e2e fixture regenerates this file with the current branch/commit/timestamp on every deploy; an earlier commit re-captured that churn. Reset to main's version — it is not part of this PR. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…dingTable test Wire the per-issue reproduction tests into CI so the high/pre-mainnet bugs are actually exercised there, not just runnable on demand. Adds a dedicated "Issue-liveness repros" lane that runs every repro under RUN_ISSUE_LIVENESS=1. The lane is RED while the bugs are live and each test flips GREEN when its bug is fixed; it is INFORMATIONAL (must not be a required check) so it never blocks unrelated PRs. The normal package lanes still gate these files OFF (skip), so they stay green/mergeable. So a red in this lane always means "a repro caught its bug" and never an unrelated suite failure: - each package exposes a `test:liveness` script listing ONLY its repro files (turbo task, cache:false); the root `test:issue-liveness` runs them with `--continue` so all packages report, not just the first to fail. - #1091 drives a real Hardhat chain, so the lane compiles the EVM contracts first (the shared build skips Solidity). Harden three repros so their red can only come from the real bug, not a setup/transport failure (Codex review on #1129): - #462: assert the attacker's skill_request was actually DELIVERED (victim emits MESSAGE_RECEIVED after decrypt+verify+parse) before asserting the handler didn't run — a transport/signature regression now turns it RED instead of falsely green. - #306: assert the KA create precondition succeeds, so the wm/write 4xx is quad-shape validation and not a missing-KA 404. - #158: assert the exact 404 the issue requires, not any 4xx, so a wrong remap to 400/403/422 can't masquerade as fixed. Also fix a pre-existing stale assertion unrelated to these repros: the ShardingTable unit test hardcoded version '10.0.2' but the contract is '10.0.3' — the only CI failure that was a test-maintenance issue rather than a caught bug. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EpochStorage.sol is at _VERSION "10.0.3" but its unit test still asserted "10.0.2" — the same test-maintenance drift as the ShardingTable fix in the previous commit, surfacing as the Solidity [2/4] shard failure. Scanned every `.version()).to.equal(...)` assertion against its contract's `_VERSION`: ShardingTable and EpochStorage were the only two stale ones (Conviction- StakingStorage / RandomSampling tests were already bumped). No product change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…et lane Fold the per-issue regression tests into the normal test suites instead of a separate opt-in lane: - Remove the RUN_ISSUE_LIVENESS gate from all 13 repro files. They now run as ordinary tests in their packages' existing CI lanes (Tornado: agent / publisher / core+storage+chain, Bura: cli / query, Kosava: epcis bundle / random-sampling), failing while their bug is live and passing once it is fixed, like any regression test. - Drop the dedicated CI lane and its plumbing (per-package test:liveness scripts, root test:issue-liveness, turbo task, globalPassThroughEnv). package.json/turbo.json are back to main's state. - Add "Tornado: devnet integration (multi-node publish/sync)": boots a 6-node devnet via scripts/devnet.sh (same pattern as the node-ui e2e lane) and runs devnet/issue-liveness — the inherently multi-node publish → quorum → replication coverage that cannot run in single-process lanes. bootstrap.cjs is not part of the lane: the suite probes for a publisher and seeds its own data, and bootstrap's seed publishes abort on a quorum-degraded devnet, which would kill the job before any test runs. Verified locally: all 8 package repro sets fail on their bug assertions in plain `vitest run` (no env), and a full dry-run of the devnet lane (clean → start 6 → suite) boots all 6 nodes and fails only on real API assertions (8 failed / 4 passed / 6 skipped), no connection errors. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…o test/issue-liveness-suite # Conflicts: # packages/evm-module/test/unit/EpochStorage.test.ts # pnpm-lock.yaml
, #462) These two tests were red not because their issues are unfixed, but because each was written against an outdated view of HOW the (already-merged) fix works. Both production fixes are present on this branch; the tests just had to exercise the real mechanism. All other red liveness lanes were verified to be tests correctly catching genuinely-unfixed issues — those stay red. - query/subgraph-view-scoping (#184/#675): the merged #675 fan-out (discoverRegisteredSubGraphNames) unions sub-graph WM data only for sub-graphs registered in the ROOT _meta graph. The test seeded the WM data graphs but never registered the sub-graph, so the fan-out found nothing to union and the WM-view/#184-scoping assertions failed. Seed the `research-alpha` SubGraph registration in _meta (urn:dkg:subgraph:…, rdf:type SubGraph, schema:name), mirroring the passing sub-graph-query.test.ts. 3/3 green. - agent/issue-462-skill-acl: #462 is fixed — MessageHandler exposes the setSkillAcl gate and the daemon (lifecycle.ts) installs default-deny for every node; a bare library MessageHandler stays accept-all for back-compat. The test built a bare handler and never installed the gate, so the unauthorized skill_request was (correctly, for a bare handler) accepted. Install the same default-deny gate the daemon wires so the test exercises the real #462 layer; refresh the stale header comment. 1/1 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5 tasks
Re-verified live on a fresh 6-node devnet: after a confirmed VM publish the KA descriptor reads state=published, status=vm-confirmed, memoryLayer=VM, publishedUal present — with NO discarded/vm-confirmed contradiction. #1095's substantive defect (contradictory state + no coherent published signal) is fixed. The test was asserting events[].includes('published'), but the publish transition is recorded as descriptor STATE; the provenance log keeps created/promoted rows and never adds a separate 'published' row, so the old assertion was a false negative against the real implementation. Assert the published state + publishedUal instead. Verified green on a clean devnet. #1097 and #1098 remain RED (verified genuinely broken live: the documented one-shot publish 500s without an undocumented promote:true; a pre-subscribed peer materializes the published KA only ~1/3 of the time) and have been reopened on GitHub. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…uite # Conflicts: # packages/agent/test/issue-936-tokenid-determinism.test.ts # packages/agent/test/op-wallets-at-rest-encryption.test.ts # packages/publisher/test/async-lift-canonicalization-and-encryption.test.ts # packages/storage/test/issue-1078-private-layer-scope.test.ts
…iewChallengeForSeed API The RandomSampling preview API changed to single-arg previewChallengeForSeed(seed) (reads chronos.getCurrentEpoch() internally) when PR #1226 landed the weighted BIT draw. The #1091 liveness repro still called the old 2-arg selector, so it reverted (require(false)) before reaching its assertion — a stale crash, not a real repro. Fixed to the 1-arg API. The test now reaches its assertion and is RED for the RIGHT reason: a node still reconstructs the seed from public block data and predicts its own (cgId,kaId,chunkId) draw pre-mine. #1091 remains live — #1226 is a partial mitigation only (the contract NatSpec says the prevrandao/blockhash seed is still proposer-grindable; durable fix is commit-reveal/VRF). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this is
A test suite that encodes confirmed-live GitHub issues as runnable tests, so the team can (a) prove which open issues are still real and (b) get an automatic signal when one gets fixed. This is the TDD pass requested off the back of the rc.17 QA sweep.
All 25 high / pre-mainnet issues now have a test entry. Every test was written against a real node / live devnet (zero chain or network mocks) and validated against a known-fixed build so a red is a genuine live bug, not a broken test.
Convention: red while the bug is live, green when it's fixed
Each test asserts the correct behaviour the issue asks for. While the bug is live that assertion fails — the test is RED, and that red is the point: it proves the test actually catches the bug. When the bug is fixed it goes GREEN and stays green.
So this PR is expected to be red — every red test is a live, reproduced bug. The fix PRs (#1107, #1132) are the ones that go green; as each fix merges, the matching liveness test here flips to green. (The earlier
it.fails"green-while-broken" convention was dropped in favour of this.)HIGH-priority coverage — all 25, across three tiers
CI unit / integration (11) — run in the normal
turbo testlanes (Tornado / Bura / Kosava), red today:packages/agent/test/op-wallets-at-rest-encryption.test.tspackages/query/test/subgraph-view-scoping.test.tspackages/agent/test/issue-462-skill-acl.test.tspackages/cli/test/issue-liveness-daemon-routes.test.ts(real daemon)packages/agent/test/issue-936-tokenid-determinism.test.tspackages/publisher/test/issue-1013-async-finalization-honesty.test.tspackages/storage/test/issue-1078-private-layer-scope.test.tspackages/random-sampling/test/e2e-hardhat-chain.test.ts(real Hardhat)packages/publisher/test/async-lift-canonicalization-and-encryption.test.tsDevnet multi-node (8) — publish → quorum → replication bugs that can't be reproduced in a single process; run on the devnet harness (
./scripts/devnet.sh start 6+ bootstrap,pnpm test:devnet:issue-liveness):#886, #1093, #1094, #1095, #1096, #1097, #1098, #1104 —
devnet/issue-liveness/high-issues.test.ts. ACONTROLtest proves SWM actually replicated so the repros can't pass for the wrong reason. These also turn green when #1107 merges.Pending fixture / emergent (6) —
it.skipwith the exact repro recipe, because a deterministic green-able test would be a false positive:Full map + rationale:
docs/testing/ISSUE_LIVENESS_TESTS.md.🤖 Generated with Claude Code
How they run (important): gated off the default lanes
The repros assert post-fix behaviour, so they are RED while the bug is live. To avoid making every package's default test lane fail (which would block unrelated PRs and poison local
pnpm test), they are gated behindRUN_ISSUE_LIVENESS:pnpm test/ CI lanes → repros are skipped → lanes stay green / mergeable.pnpm test:issue-liveness(RUN_ISSUE_LIVENESS=1) → repros run → RED while live, GREEN once fixed.So this PR does not break the normal lanes; the red signal lives on the dedicated issue-liveness run. The multi-node tier already follows the same opt-in pattern (
pnpm test:devnet:issue-liveness, needs a live 6-node devnet).