Skip to content

test: reproducing tests for all 25 high/pre-mainnet issues (red while live, green when fixed)#1129

Open
Bojan131 wants to merge 23 commits into
mainfrom
test/issue-liveness-suite
Open

test: reproducing tests for all 25 high/pre-mainnet issues (red while live, green when fixed)#1129
Bojan131 wants to merge 23 commits into
mainfrom
test/issue-liveness-suite

Conversation

@Bojan131

@Bojan131 Bojan131 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What this is

A test suite that encodes confirmed-live GitHub issues as runnable tests, so the team can (a) prove which open issues are still real and (b) get an automatic signal when one gets fixed. This is the TDD pass requested off the back of the rc.17 QA sweep.

All 25 high / pre-mainnet issues now have a test entry. Every test was written against a real node / live devnet (zero chain or network mocks) and validated against a known-fixed build so a red is a genuine live bug, not a broken test.

Convention: red while the bug is live, green when it's fixed

Each test asserts the correct behaviour the issue asks for. While the bug is live that assertion fails — the test is RED, and that red is the point: it proves the test actually catches the bug. When the bug is fixed it goes GREEN and stays green.

  • Bug live → test RED (it caught the bug)
  • Bug fixed → test GREEN (close the issue)

So this PR is expected to be red — every red test is a live, reproduced bug. The fix PRs (#1107, #1132) are the ones that go green; as each fix merges, the matching liveness test here flips to green. (The earlier it.fails "green-while-broken" convention was dropped in favour of this.)

HIGH-priority coverage — all 25, across three tiers

CI unit / integration (11) — run in the normal turbo test lanes (Tornado / Bura / Kosava), red today:

Issue Test
#11 packages/agent/test/op-wallets-at-rest-encryption.test.ts
#184, #675 packages/query/test/subgraph-view-scoping.test.ts
#462 packages/agent/test/issue-462-skill-acl.test.ts
#757 packages/cli/test/issue-liveness-daemon-routes.test.ts (real daemon)
#936 packages/agent/test/issue-936-tokenid-determinism.test.ts
#1013 packages/publisher/test/issue-1013-async-finalization-honesty.test.ts
#1078 packages/storage/test/issue-1078-private-layer-scope.test.ts
#1091 packages/random-sampling/test/e2e-hardhat-chain.test.ts (real Hardhat)
#1121, #1122 packages/publisher/test/async-lift-canonicalization-and-encryption.test.ts

Devnet multi-node (8) — publish → quorum → replication bugs that can't be reproduced in a single process; run on the devnet harness (./scripts/devnet.sh start 6 + bootstrap, pnpm test:devnet:issue-liveness):

#886, #1093, #1094, #1095, #1096, #1097, #1098, #1104devnet/issue-liveness/high-issues.test.ts. A CONTROL test proves SWM actually replicated so the repros can't pass for the wrong reason. These also turn green when #1107 merges.

Pending fixture / emergent (6)it.skip with the exact repro recipe, because a deterministic green-able test would be a false positive:

Full map + rationale: docs/testing/ISSUE_LIVENESS_TESTS.md.

🤖 Generated with Claude Code

How they run (important): gated off the default lanes

The repros assert post-fix behaviour, so they are RED while the bug is live. To avoid making every package's default test lane fail (which would block unrelated PRs and poison local pnpm test), they are gated behind RUN_ISSUE_LIVENESS:

  • Default pnpm test / CI lanes → repros are skipped → lanes stay green / mergeable.
  • pnpm test:issue-liveness (RUN_ISSUE_LIVENESS=1) → repros runRED while live, GREEN once fixed.

So this PR does not break the normal lanes; the red signal lives on the dedicated issue-liveness run. The multi-node tier already follows the same opt-in pattern (pnpm test:devnet:issue-liveness, needs a live 6-node devnet).

Encodes confirmed-live GitHub issues from the rc.17 QA sweep as runnable
tests using the it.fails / test.fixme convention: each asserts the CORRECT
behaviour, fails today (bug live → it.fails reports pass → CI green), and
flips RED when fixed (signalling the issue can close). Zero chain/network
mocks; written against a real node + live devnet.

Tier 1 (run in the normal turbo test CI lanes):
  #1125 skill.md (dynamic) placeholder        — cli/skill-md-dynamic-section
  #675 #184 sub-graph view scoping            — query/subgraph-view-scoping
  #416 escaper control bytes                   — core/escape-rdf-literal-control-chars
  #709 EPCIS document-container in events      — epcis/event-type-container-filter
  #15  .jsonld @context ingest                 — cli/rdf-parser-jsonld
  #787 #306 #158 #309 #757 daemon routes       — cli/issue-liveness-daemon-routes
       (real edge daemon vs shared Hardhat)

Tier 2 (manual-run devnet suite, pnpm test:devnet:issue-liveness):
  #705 #923 peer lifecycle-meta replication
  #872 imported Markdown source-byte replication
  (a CONTROL test proves SWM replicated, so the it.fails can't pass wrongly)

Deferred with rationale (see docs/testing/ISSUE_LIVENESS_TESTS.md): #614 and
#1091 are audit-grade contract / design-property issues where a speculative
test would give false signal; UI count-caps (#1112/#1113/#1015) and #966 need
fixtures too heavy for CI.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 11, 2026
Comment thread devnet/issue-liveness/vitest.config.ts Outdated
Comment thread devnet/issue-liveness/automated.test.ts Outdated
Comment thread devnet/issue-liveness/automated.test.ts Outdated
Comment thread packages/epcis/test/event-type-container-filter.test.ts Outdated
Per manager request — every high-priority issue gets a test that reproduces it
(it.fails / it.skip-with-recipe), so a fix can flip it green and stay in CI.

Runnable it.fails repros (14):
  unit     — #11 (op-wallets plaintext), #1121 #1122 (async-lift encryption +
             canonicalization), plus existing #184 #675 #757
  devnet   — #886 #1093 #1094 #1095 #1096 #1097 #1098 #1104
             (devnet/issue-liveness/high-issues.test.ts; 11 pass = bugs live)

Documented it.skip stubs with exact repro recipes (11) — where a faithful test
needs a fixture/design/topology that doesn't exist yet (a wrong test is worse):
  #1091 #614  contract/design (grindable seed, billing-window sweep)
  #1124       host-mode sharded topology (devnet cores are all CG members)
  #1099       gossip-retention timing (repros on testnet, not fast local devnet)
  #1013 #936  publisher-runtime / 2-replica-reconcile harness
  #999 #1008  load-dependent store saturation (verified live on testnet)
  #723        emergent network-wide RS metric
  #462        MessageHandler ACL harness (skill_request has no authz)
  #1078       layer-scoped private-store API

The 9 fix-in-flight highs (#886, #1093-#1099, #1104) are fixed on PR #1107 —
when it merges their it.fails repros start passing → unwrap them.

Full map in docs/testing/ISSUE_LIVENESS_TESTS.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/cli/test/issue-liveness-daemon-routes.test.ts
Comment thread packages/random-sampling/test/e2e-hardhat-chain.test.ts
Comment thread packages/random-sampling/test/e2e-hardhat-chain.test.ts Outdated
Comment thread packages/core/test/escape-rdf-literal-control-chars.test.ts Outdated
Turbo runs tasks in a strict env, so `pnpm test:issue-liveness`
(`RUN_ISSUE_LIVENESS=1 turbo run test`) would not have reached the vitest
process — the gated liveness repros would silently skip (green) instead of
running red. Add RUN_ISSUE_LIVENESS to globalPassThroughEnv so the dedicated
issue-liveness command actually activates the repros. Verified: storage liveness
test goes RED under the command, stays skipped on the default lane.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread package.json Outdated
Comment thread packages/cli/test/issue-liveness-daemon-routes.test.ts Outdated
Comment thread devnet/issue-liveness/high-issues.test.ts Outdated
- cli daemon-routes: gate the `beforeAll` behind RUN_ISSUE_LIVENESS so the
  default lane no longer spins a real daemon when all repros are skipped; and
  create the #757 CG as CURATED (accessPolicy:1) so the curator-only access
  check is actually exercised (an open CG has no curator-moderated join flow).
- package.json test:issue-liveness filter: the CLI package is
  `@origintrail-official/dkg` (not `-cli`), and core was missing — both are now
  included so the command actually runs every gated liveness suite.
- core/escape-rdf, cli/rdf-parser, cli/skill-md (Tier-1 #416/#15/#1125):
  converted from the dropped `it.fails` convention to the gated plain-`it()`
  convention for uniformity. #416 now lower-cases the output before comparing,
  since RDF `\u` UCHAR hex is case-insensitive (a lowercase-hex fix must not
  keep it red).
- random-sampling #1091: wrap the prevrandao/automine RPC mutations in
  try/finally so `evm_setAutomine(true)` is always restored (an exception
  mid-flight no longer poisons the shared Hardhat node), and compare the FULL
  (cgId, kaId, chunkId) draw tuple so a prediction can't look successful while
  having picked a different context graph.
- devnet high-issues: the seed CG is now PUBLIC (accessPolicy:0, renamed
  SEED_CG) so the #1098/#886 cross-node replication repros aren't masked by the
  subscriber lacking curated-CG membership.

Validated: #416 and #1091 skip by default and go RED under RUN_ISSUE_LIVENESS=1;
the real prover e2e still passes (automine restored); all edited files parse.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/agent/test/issue-1124-host-mode-plaintext.test.ts Outdated
Comment thread package.json Outdated
Comment thread packages/core/test/escape-rdf-literal-control-chars.test.ts Outdated
…filter, control-char nit)

- Gate flag now parses explicitly (`process.env.RUN_ISSUE_LIVENESS === '1'`)
  across all liveness suites, so RUN_ISSUE_LIVENESS=0/false no longer enables the
  intentionally-red repros. Verified: =0 skips, =1 runs red.
- test:issue-liveness filter adds @origintrail-official/dkg-evm-module so the
  contract liveness file (#614/#1091 recipes) is compiled/executed on the lane.
- #416 comment: replaced a raw vertical-tab byte with the literal `�` text
  (hidden control chars trip diffs/tooling).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread devnet/issue-liveness/high-issues.test.ts Outdated
Comment thread packages/cli/test/rdf-parser-jsonld.test.ts Outdated
Comment thread packages/agent/test/op-wallets-at-rest-encryption.test.ts Outdated
… fix-agnostic #11/#15)

- devnet high-issues (RED): probe ALL cores for a working publisher (not a fixed
  candidate set), then pick the pre-subscribed peer AFTER the publisher is known
  (any core != pubNode). Previously node 2 was reserved as the pre-sub peer and
  excluded from publishers, so the whole suite aborted if node 2 was the only
  core that could still reach publish quorum. #1098 still uses a peer distinct
  from the publisher.
- #15 (rdf-parser): assert the fix-agnostic INVARIANT — a `.jsonld` doc with
  `@context` parses, OR `.jsonld` is no longer advertised — so the documented
  option-A fix (stop advertising) turns it green instead of leaving it red.
- #11 (op-wallets): scan EVERY persisted file under the data dir (and the bare
  hex form), not just `wallets.json`, so a fix that moves secrets into an
  encrypted keystore / renames the artifact still turns it green.

Verified: #11 and #15 reproduce red under RUN_ISSUE_LIVENESS=1; high-issues
parses clean.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread packages/agent/test/issue-1124-host-mode-plaintext.test.ts Outdated
Comment thread devnet/issue-liveness/high-issues.test.ts Outdated
Comment thread packages/cli/test/issue-liveness-daemon-routes.test.ts Outdated
…1097/#309)

- #1124 (RED): removed the agent unit repro. `ingestSwmHostModeEnvelope` drops a
  public-CG plaintext share at TWO independent gates — the `isCiphertext` sniff
  AND the curated-agent authority check (which rejects a CG with no agent
  allowlist, i.e. the public-CG case). Since the fix must change both, isolating
  either gate is a false signal (stub authority → false green; don't → false
  red). #1124 is back to a documented pending stub that needs the host-mode
  sharded fixture exercising the full public-CG ingest path.
- #1097 (RED): assert the one-shot flow actually WORKS — create returns 2xx and
  publish-by-assertionName returns 200 with a success status — instead of merely
  `!== 500` (a 404/409/422 would have falsely passed a "flow works" test).
- #309 (yellow): assert `defaultAgentAddress` matches a real `0x…40` EVM address
  rather than just `toBeDefined()` (null/"" would still leave WM-query scoping
  broken).

All-25 map is now 11 CI unit/integration + 8 devnet multi-node + 6 documented
pending/emergent (#614 #1099 #1124 #723 #999 #1008).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread devnet/issue-liveness/high-issues.test.ts Outdated
Comment thread devnet/issue-liveness/high-issues.test.ts Outdated
…flakiness)

- #1094: assert the wm/pull-from {layer:vm} edit path actually WORKS — 200 + no
  error body, then read the KA back as an editable WM draft — instead of merely
  `!== 500` (a 404/409/422 would have looked fixed).
- #1098 / #886: replace fixed 8s/12s sleeps with a pollUntil() against a generous
  deadline (60s/90s). Replication latency on a slow devnet no longer flips these
  into latency tests; they only fail if the KA NEVER materializes.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread package.json Outdated
Comment thread packages/random-sampling/test/e2e-hardhat-chain.test.ts
Comment thread packages/evm-module/deployments/localhost_contracts.json Outdated
Bojan131 and others added 2 commits June 12, 2026 13:03
…drop stray artifact)

- #1091 (RED): wrap the whole repro in takeSnapshot/revertSnapshot so its chain
  mutations (REC2 createChallenge, pinned prevrandao, mined blocks) are rolled
  back and can't leak into the regular prover E2E that reuses the shared Hardhat
  fixture under RUN_ISSUE_LIVENESS=1. Verified: #1091 still red, prover still
  passes, state isolated.
- Reverted packages/evm-module/deployments/localhost_contracts.json — it was
  swept into an earlier commit by `git add -u` (a local-deploy artifact with
  branch/commit/timestamp churn), not part of this PR.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The hardhat e2e fixture regenerates this file with the current
branch/commit/timestamp on every deploy; an earlier commit re-captured that
churn. Reset to main's version — it is not part of this PR.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Comment thread package.json Outdated
Comment thread turbo.json Outdated
Comment thread packages/cli/test/issue-liveness-daemon-routes.test.ts Outdated
Comment thread packages/cli/test/issue-liveness-daemon-routes.test.ts Outdated
Comment thread packages/agent/test/issue-462-skill-acl.test.ts
…dingTable test

Wire the per-issue reproduction tests into CI so the high/pre-mainnet bugs
are actually exercised there, not just runnable on demand. Adds a dedicated
"Issue-liveness repros" lane that runs every repro under RUN_ISSUE_LIVENESS=1.
The lane is RED while the bugs are live and each test flips GREEN when its bug
is fixed; it is INFORMATIONAL (must not be a required check) so it never blocks
unrelated PRs. The normal package lanes still gate these files OFF (skip), so
they stay green/mergeable.

So a red in this lane always means "a repro caught its bug" and never an
unrelated suite failure:
- each package exposes a `test:liveness` script listing ONLY its repro files
  (turbo task, cache:false); the root `test:issue-liveness` runs them with
  `--continue` so all packages report, not just the first to fail.
- #1091 drives a real Hardhat chain, so the lane compiles the EVM contracts
  first (the shared build skips Solidity).

Harden three repros so their red can only come from the real bug, not a
setup/transport failure (Codex review on #1129):
- #462: assert the attacker's skill_request was actually DELIVERED (victim
  emits MESSAGE_RECEIVED after decrypt+verify+parse) before asserting the
  handler didn't run — a transport/signature regression now turns it RED
  instead of falsely green.
- #306: assert the KA create precondition succeeds, so the wm/write 4xx is
  quad-shape validation and not a missing-KA 404.
- #158: assert the exact 404 the issue requires, not any 4xx, so a wrong
  remap to 400/403/422 can't masquerade as fixed.

Also fix a pre-existing stale assertion unrelated to these repros: the
ShardingTable unit test hardcoded version '10.0.2' but the contract is
'10.0.3' — the only CI failure that was a test-maintenance issue rather than
a caught bug.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread turbo.json Outdated
Comment thread packages/random-sampling/package.json Outdated
EpochStorage.sol is at _VERSION "10.0.3" but its unit test still asserted
"10.0.2" — the same test-maintenance drift as the ShardingTable fix in the
previous commit, surfacing as the Solidity [2/4] shard failure. Scanned every
`.version()).to.equal(...)` assertion against its contract's `_VERSION`:
ShardingTable and EpochStorage were the only two stale ones (Conviction-
StakingStorage / RandomSampling tests were already bumped). No product change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread devnet/issue-liveness/automated.test.ts
Comment thread turbo.json Outdated
Comment thread packages/random-sampling/package.json Outdated
…et lane

Fold the per-issue regression tests into the normal test suites instead of
a separate opt-in lane:

- Remove the RUN_ISSUE_LIVENESS gate from all 13 repro files. They now run
  as ordinary tests in their packages' existing CI lanes (Tornado: agent /
  publisher / core+storage+chain, Bura: cli / query, Kosava: epcis bundle /
  random-sampling), failing while their bug is live and passing once it is
  fixed, like any regression test.
- Drop the dedicated CI lane and its plumbing (per-package test:liveness
  scripts, root test:issue-liveness, turbo task, globalPassThroughEnv).
  package.json/turbo.json are back to main's state.
- Add "Tornado: devnet integration (multi-node publish/sync)": boots a
  6-node devnet via scripts/devnet.sh (same pattern as the node-ui e2e
  lane) and runs devnet/issue-liveness — the inherently multi-node
  publish → quorum → replication coverage that cannot run in
  single-process lanes. bootstrap.cjs is not part of the lane: the suite
  probes for a publisher and seeds its own data, and bootstrap's seed
  publishes abort on a quorum-degraded devnet, which would kill the job
  before any test runs.

Verified locally: all 8 package repro sets fail on their bug assertions in
plain `vitest run` (no env), and a full dry-run of the devnet lane
(clean → start 6 → suite) boots all 6 nodes and fails only on real API
assertions (8 failed / 4 passed / 6 skipped), no connection errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread .github/workflows/ci.yml
Comment thread packages/query/test/subgraph-view-scoping.test.ts
Comment thread packages/cli/test/skill-md-dynamic-section.test.ts
Comment thread packages/core/test/escape-rdf-literal-control-chars.test.ts
Bojan131 and others added 2 commits June 18, 2026 12:00
…o test/issue-liveness-suite

# Conflicts:
#	packages/evm-module/test/unit/EpochStorage.test.ts
#	pnpm-lock.yaml
, #462)

These two tests were red not because their issues are unfixed, but because
each was written against an outdated view of HOW the (already-merged) fix
works. Both production fixes are present on this branch; the tests just had
to exercise the real mechanism. All other red liveness lanes were verified
to be tests correctly catching genuinely-unfixed issues — those stay red.

- query/subgraph-view-scoping (#184/#675): the merged #675 fan-out
  (discoverRegisteredSubGraphNames) unions sub-graph WM data only for
  sub-graphs registered in the ROOT _meta graph. The test seeded the WM
  data graphs but never registered the sub-graph, so the fan-out found
  nothing to union and the WM-view/#184-scoping assertions failed. Seed the
  `research-alpha` SubGraph registration in _meta (urn:dkg:subgraph:…, rdf:type
  SubGraph, schema:name), mirroring the passing sub-graph-query.test.ts. 3/3 green.

- agent/issue-462-skill-acl: #462 is fixed — MessageHandler exposes the
  setSkillAcl gate and the daemon (lifecycle.ts) installs default-deny for
  every node; a bare library MessageHandler stays accept-all for back-compat.
  The test built a bare handler and never installed the gate, so the
  unauthorized skill_request was (correctly, for a bare handler) accepted.
  Install the same default-deny gate the daemon wires so the test exercises
  the real #462 layer; refresh the stale header comment. 1/1 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-verified live on a fresh 6-node devnet: after a confirmed VM publish the KA
descriptor reads state=published, status=vm-confirmed, memoryLayer=VM,
publishedUal present — with NO discarded/vm-confirmed contradiction. #1095's
substantive defect (contradictory state + no coherent published signal) is
fixed. The test was asserting events[].includes('published'), but the publish
transition is recorded as descriptor STATE; the provenance log keeps
created/promoted rows and never adds a separate 'published' row, so the old
assertion was a false negative against the real implementation. Assert the
published state + publishedUal instead. Verified green on a clean devnet.

#1097 and #1098 remain RED (verified genuinely broken live: the documented
one-shot publish 500s without an undocumented promote:true; a pre-subscribed
peer materializes the published KA only ~1/3 of the time) and have been
reopened on GitHub.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bojan131 and others added 2 commits June 19, 2026 09:29
…uite

# Conflicts:
#	packages/agent/test/issue-936-tokenid-determinism.test.ts
#	packages/agent/test/op-wallets-at-rest-encryption.test.ts
#	packages/publisher/test/async-lift-canonicalization-and-encryption.test.ts
#	packages/storage/test/issue-1078-private-layer-scope.test.ts
…iewChallengeForSeed API

The RandomSampling preview API changed to single-arg previewChallengeForSeed(seed)
(reads chronos.getCurrentEpoch() internally) when PR #1226 landed the weighted BIT
draw. The #1091 liveness repro still called the old 2-arg selector, so it reverted
(require(false)) before reaching its assertion — a stale crash, not a real repro.

Fixed to the 1-arg API. The test now reaches its assertion and is RED for the RIGHT
reason: a node still reconstructs the seed from public block data and predicts its
own (cgId,kaId,chunkId) draw pre-mine. #1091 remains live — #1226 is a partial
mitigation only (the contract NatSpec says the prevrandao/blockhash seed is still
proposer-grindable; durable fix is commit-reveal/VRF).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant