feat(evm-module): O(log) value-weighted RandomSampling via Fenwick BIT + lazy settlement#1226
Conversation
…ment) Add the RandomSampling scaling RFC: replace the O(retries·N·D) twin linear scan over every CG with an O(log) Fenwick/BIT prefix draw plus lazy weight settlement, to support 5k-100k+ context graphs within block gas. Rev 2 (review-hardened) before implementation: - Exclusion retry draw subtracts excluded leaves DURING the Fenwick descent (renormalization-exact); the prior "advance to next nonzero leaf" wording skewed the distribution and would fail the renormalization test. - Fixed power-of-two BIT capacity instead of a growing size (a growing mapping-backed Fenwick silently corrupts the internal node at every 2^k boundary). - leaf>0 => CG active promoted to an enforced invariant (eligibility gate is already isContextGraphActive). - _bitUpdate gets a real capacity require() guard (an out-of-range id would otherwise skip the loop yet still mutate bitTotal). - Renamed the BIT leaf mapping to cgWeight to avoid colliding with merkle leaves; tightened settle-on-miss scope, preview/draw divergence, fairness tolerance, and gas-aware migration. Line refs verified against release/rc18. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only script + captured results against base_sepolia_v10 (epoch 521, 75 CGs). Findings that inform the RFC's open decisions: - 67/75 CGs are zero-weight and 50 sit at D=521 (never finalized) -> the status-quo twin scan replays ~O(D) SLOADs per CG (~110M gas implied for a single draw at N=75): the O(N*D) cliff is already present, not hypothetical. - eligibility == active confirmed (75/75) -> weight-only BIT sound today. - global-total oracle validated (active Sum weight == getTotalValueAtEpoch). - Open Decision #1 -> recommend settle-on-miss only for V10 (dormancy is zero-weight, which the BIT skips for free; no future-dated backlog seen); defer keeper/epoch-due index pending mainnet data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ests New fixed-capacity Fenwick/BIT over CG ids for O(log) value-weighted challenge selection (RFC: scalable-weighted-cg-sampling). Implements the rev-2 corrected algorithms: - findStrictGtExcluding subtracts excluded leaves DURING the descent (renormalization-exact); findStrictGt mirrors the strict-> straddle. - Fixed power-of-two BIT_CAPACITY (immutable, constructor arg); no growth. - _bitUpdate has a real capacity require() guard (id 0 / id>=capacity revert). - settle reconciles a leaf to ContextGraphValueStorage truth; seed/seedMany are backfill-window-gated migration primitives; cgWeight named to avoid colliding with merkle leaves. 11 unit tests pass: parity vs brute-force, renormalization equivalence (incl. the exclude-one-stays-uniform regression), fixed-capacity across 2^k boundaries (growth-corruption regression), guard, seed gating, settle self-heal. Both regression tests verified to FAIL on the original buggy algorithms (mutation-tested). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dening From the adversarial review (no correctness bugs found; rev-2 fixes confirmed faithfully implemented). Hardening: - _bitUpdate id guard is now unconditional (fires before the delta==0 short-circuit). - finishBackfill reverts NotBackfilling on re-call (loud, not a silent re-emit). Added tests (19 total): seedMany happy + length-mismatch, constructor non-pow2/zero capacity revert, settleMany, cgSettledEpoch + settle no-op path, WeightSet/ BackfillFinalized events + finishBackfill re-call guard, BIT_CAPACITY-1 boundary leaf, empty-tree out-of-range sentinel (caller must gate on bitTotal>0), and workingTotal underflow on duplicated excluded ids. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… dedicated storage) Per Phase-0 evidence and review: - #1 settlement strategy -> settle-on-spend + settle-on-miss only for V10; keeper (settleMany, already shipped permissionless) and epoch-due index deferred, reversible without redeploy; trigger to revisit logged. - #2 fairness tolerance -> provisional +/-2% draw-freq, <1% wasted-retry as a CI confidence check (not an on-chain invariant); test lands in Phase 2 where drift is observable. - #4 storage placement -> dedicated CGWeightTreeStorage (built). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tle hooks + migration Replace the O(N·D) twin linear scan in RandomSampling with the O(log) Fenwick draw (CGWeightTreeStorage), with lazy settlement. Full integration: RandomSampling: - _pickWeightedChallenge split into _pickKc (KC retry + :648/:649 filters + leaf draw, verbatim), _pickWeightedChallengeView (preview, no settle), and _pickWeightedChallengeFull (production, settle-on-miss). Seed threading is preserved bit-for-bit (verified by a draw-level parity oracle test). - createChallenge reverts ChallengeDrawPaused while the BIT is backfill-locked. - Eligibility is now checked on the DRAWN CG only (1 SLOAD, not O(N)); a deactivated CG is treated as a miss (Invariant 2 backstop — the leaf-zeroing deactivation hook stays deferred; deactivateContextGraph has no prod callers). - previewChallengeForSeed routes to the view core (documented staleness divergence). KnowledgeAssetsLifecycle: settle-on-spend after all three addCGValueForEpochRange sites (publish/extend/update) so the BIT sees fresh weights. Deploy: CGWeightTreeStorage added to 031/052 dependencies. No auto-unlock step (avoids the unit-fixture trap); a new scripts/migrate-cg-weight-tree.ts seeds active leaves off-chain-computed in batches, asserts bitTotal==Σ + the global-total cross-check, then finishBackfill (no-op seed on a fresh chain). Tests: 759 passing (full suite). RS fixtures unlock the tree + settle on seed; decay tests settle after epoch advance (modeling settle-on-miss). New: draw-level parity oracle, fairness ±2pp bar (settled state), and the backfill-lock gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n-spend The on-chain _VERSION must change when behavior does (so a redeploy is not mistaken for a no-op): - RandomSampling 10.2.0 -> 10.3.0 (draw path rewritten to the Fenwick index). - KnowledgeAssetsLifecycle 10.1.1 -> 10.1.2 (PATCH: settle-on-spend added; no ACK/attestation change, so the EIP-712 domain "10.1" stays stable). Also sharpened the _pickKc note: if deactivateContextGraph ever gets a caller, leaf-zeroing becomes mandatory (not a backstop) — >MAX_CG_RETRIES deactivated CGs could otherwise exhaust the retry budget and revert the draw. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Confirms the headline scaling property empirically: - draw (previewChallengeForSeed) gas is IDENTICAL at N=1/1k/10k (148,126) — O(log of a fixed capacity), constant in CG count (old twin scan was ~110M gas at N=75 per Phase-0). - settle gas by replay depth D: 620k (D=0) → 1.82M (D=50), ~24k/epoch — the settle-on-spend / settle-on-miss component, bounded and amortizing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ro-migration go-live) A clean deploy (devnet / clean testnet redeploy with a fresh ledger) has nothing to seed, so bring the value-weighted draw live automatically: 057 calls CGWeightTreeStorage.finishBackfill() on development+testnet (deployer is the Hub owner; owner-checked + idempotent). mainnet is skipped (deliberate operator step — seed first). Tagged 'CGWeightTreeFinishBackfill' depending on CGWeightTreeStorage so the unit fixture still gets a LOCKED tree. Verified: full deploy logs "value-weighted draw is LIVE"; full suite 761 passing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ain gas Real multi-node run (devnet-rs-validation.sh, clean redeploy, 057 auto-unlocked the BIT): publish (settle-on-spend) -> createChallenge (Fenwick draw) -> submitProof across proof periods. VERDICT PASS — 30/30 KAs published, 16 proofs submitted, 0 data-corrupted / 0 no-eligible-cg / 0 errors. Real on-chain gas (all txs ok): createChallenge ~310k avg (constant; old O(N·D) scan was ~110M at N=75 — ~350x reduction), submitProof ~298k, publish ~757k avg (incl. settle-on-spend; 1.37M first-publish cold-leaf max). Adds the gas-scan tool. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| ContextGraphValueStorage public contextGraphValueStorage; | ||
|
|
||
| /// @notice Fixed power-of-two capacity; valid CG ids are 1..BIT_CAPACITY-1. | ||
| uint256 public immutable BIT_CAPACITY; |
| function settle(uint256 cg) public { | ||
| uint256 currentEpoch = chronos.getCurrentEpoch(); | ||
| if (currentEpoch > 0) contextGraphValueStorage.finalizeCGValueUpTo(cg, currentEpoch - 1); | ||
| uint256 truth = contextGraphValueStorage.getCGValueAtEpoch(cg, currentEpoch); | ||
| _setWeight(cg, truth); | ||
| cgSettledEpoch[cg] = currentEpoch; | ||
| } |
| function settle(uint256 cg) public { | ||
| uint256 currentEpoch = chronos.getCurrentEpoch(); | ||
| if (currentEpoch > 0) contextGraphValueStorage.finalizeCGValueUpTo(cg, currentEpoch - 1); | ||
| uint256 truth = contextGraphValueStorage.getCGValueAtEpoch(cg, currentEpoch); | ||
| _setWeight(cg, truth); | ||
| cgSettledEpoch[cg] = currentEpoch; | ||
| } |
| function settle(uint256 cg) public { | ||
| uint256 currentEpoch = chronos.getCurrentEpoch(); | ||
| if (currentEpoch > 0) contextGraphValueStorage.finalizeCGValueUpTo(cg, currentEpoch - 1); | ||
| uint256 truth = contextGraphValueStorage.getCGValueAtEpoch(cg, currentEpoch); | ||
| _setWeight(cg, truth); | ||
| cgSettledEpoch[cg] = currentEpoch; | ||
| } |
| uint256 chosenCg = cgWeightTreeStorage.findStrictGtExcluding(r, exhausted); | ||
| (uint256 ka, uint256 chunk, bool found) = _pickKc(chosenCg, cgSeed, currentEpoch); | ||
| if (found) return (chosenCg, ka, chunk); | ||
| cgWeightTreeStorage.settle(chosenCg); // settle-on-miss: heal over-stated (expired) leaf |
There was a problem hiding this comment.
🔴 Bug: settle-on-miss only persists when a later retry succeeds. If every retry misses and the function reaches the final revert, the external cgWeightTreeStorage.settle(chosenCg) calls are rolled back with the rest of the transaction. That means an all-expired stale tree, or a draw that hits more than MAX_CG_RETRIES stale/empty CGs before an active one, never self-heals and can keep making createChallenge revert on every attempt until an external keeper/manual settleMany succeeds. The retry path needs a way to persist cleanup independently of a successful challenge, or the design must not rely on createChallenge-side healing for exhausted draws.
There was a problem hiding this comment.
🔴 Bug: settle(cg) only persists if _pickWeightedChallengeFull eventually returns a challenge. When all MAX_CG_RETRIES attempts hit expired/stale leaves, line 716 reverts and rolls back every settle-on-miss update, so the same stale BIT weights remain for the next createChallenge. A cluster of expired high-weight CGs can therefore keep challenge generation reverting, or make most attempts fail even while low-weight live CGs exist. The old scan recomputed true current weights and did not have this liveness failure; the fix needs a non-reverting cleanup path or an enforced keeper/pre-settle mechanism before relying on lazy settlement for expired leaves.
There was a problem hiding this comment.
🔴 Bug: settle only persists if a later retry in the same createChallenge succeeds. If the first MAX_CG_RETRIES draws all hit stale/expired CGs, the function later reverts and rolls back every settlement, leaving those stale leaves in bitTotal for the next proof period. A cohort of expired high-weight CGs can therefore repeatedly consume the retry budget and prevent reaching a live lower-weight CG unless an external settleMany happens; the old ledger scan would have assigned those expired CGs zero current weight.
There was a problem hiding this comment.
🔴 Bug: The settle-on-miss write is inside createChallenge, but if all MAX_CG_RETRIES attempts miss the function reverts at the end of the loop, rolling back every settle(cg). A cohort of expired/stale high-weight CGs can therefore keep dominating the BIT and cause repeated challenge failures without making progress, especially when live CG weight is small; the advertised self-healing only persists when a later retry finds a valid KA in the same transaction. Persist stale-leaf settlement outside the reverting challenge path or require/run a keeper before relying on this for liveness.
There was a problem hiding this comment.
🔴 Bug: settle(cg) only persists if the same createChallenge call eventually finds a challenge. When the loop exhausts MAX_CG_RETRIES and reverts at the end, every stale-leaf settlement done in prior attempts is rolled back, so a cohort of more than five expired/unchallengeable high-weight CGs can keep causing NoEligibleKnowledgeAsset without ever being healed. That is a liveness regression versus the old ledger scan, because live low-weight CGs may exist but challenges keep reverting until an external keeper manually settles enough stale leaves. The miss-settlement path needs to commit independently, or the no-challenge path must not revert after doing useful settlement work.
| const func: DeployFunction = async function (hre: HardhatRuntimeEnvironment) { | ||
| const { ethers, deployments } = hre; | ||
|
|
||
| if (hre.network.config.environment === 'mainnet') { |
There was a problem hiding this comment.
🔴 Bug: This auto-unlocks CGWeightTreeStorage on every non-mainnet environment without seeding any existing CG weights. base_sepolia_v10 is configured as environment: 'testnet' and already has existing context graphs/value ledger data, so deploying this PR there will flip backfillLocked=false with bitTotal==0; RandomSampling will then stop pausing but will ignore all existing KAs and revert NoEligibleContextGraph until new spends happen to settle leaves. The unlock script should only run automatically on a provably fresh ledger, or run the seeding migration before finishBackfill.
| // Subsequent attempts with zero-total = all eligible CGs | ||
| // exhausted; the next-period sampling tick will retry. | ||
| // O(1) working total = bitTotal - sum of excluded leaves (padded zero slots are no-ops). | ||
| uint256 wt = cgWeightTreeStorage.workingTotal(exhausted); |
There was a problem hiding this comment.
🟡 Issue: previewChallengeForSeed(seed, targetEpoch) keeps the old API/NatSpec that previews weights for an arbitrary targetEpoch, but the CG draw now uses the current BIT snapshot via workingTotal/findStrictGtExcluding; targetEpoch is only applied later for KC expiry. Any caller previewing a past or future epoch gets a CG selected from the wrong weight distribution. Either reject non-current epochs/update the API contract, or add a separate preview path that reads ledger weights for the requested epoch.
| /// @dev View selection core: renormalizing weighted draw over the BIT, WITHOUT | ||
| /// settle-on-miss. Backs `previewChallengeForSeed`. Mirrors the production | ||
| /// `_pickWeightedChallengeFull` exactly except it cannot self-heal stale leaves. | ||
| function _pickWeightedChallengeView( |
There was a problem hiding this comment.
🟡 Issue: _pickWeightedChallengeView and _pickWeightedChallengeFull duplicate the same retry, working-total, seed, and exclusion logic, with the only behavioral difference being settle-on-miss. This makes the preview and production paths easy to drift the next time draw semantics change, especially because the comments promise they mirror exactly. Consider extracting the shared attempt loop pieces, or centralizing the draw in one helper with the settle step isolated, so seed threading and error semantics have one implementation.
There was a problem hiding this comment.
🟡 Issue: The view and production pickers duplicate the retry loop, exhaustion bookkeeping, zero-total revert mapping, and seed progression. This is the highest-risk part of the sampling algorithm, and future changes now have to be made in two places to preserve distribution parity. Consider a single retry driver with a settleOnMiss flag/hook so only the miss side effect differs.
| // spend / settle-on-miss), so no O(D) on-chain work happens here. | ||
| import hre from 'hardhat'; | ||
|
|
||
| const BATCH = 200; // CGs per seedMany tx (each seed is O(log); tune down if a tx nears the gas cap) |
There was a problem hiding this comment.
🟡 Issue: The migration script hard-codes a count-based batch size (BATCH = 200) even though each seedMany entry performs O(log capacity) storage writes. For mainnet-scale nonzero CGs this can exceed the block gas limit and forces operators to edit code and retry manually. A gas-aware batching loop using estimateGas or a configurable target gas limit would keep the migration runbook maintainable and less environment-dependent.
|
|
||
| // BIT leaf is a snapshot from seed time; reconcile it to the decayed ledger | ||
| // truth (models settle-on-miss / the keeper). After settle the leaf is 0. | ||
| await CGWeightTreeStorage.connect(opSigner).settle(cgId); |
There was a problem hiding this comment.
🟡 Issue: This expiry test pre-settles the BIT leaf before calling the view picker, so it does not exercise the production settle-on-miss path in _pickWeightedChallengeFull. A regression where createChallenge fails to settle/recompute/exclude a stale expired leaf would still pass. Add a non-view/createChallenge test that leaves the stale leaf in place, triggers the draw, and asserts the leaf is reconciled and the draw either moves to the live CG or reverts as expected.
There was a problem hiding this comment.
🟡 Issue: These expiry/self-heal cases pre-settle the stale leaf before calling previewChallengeForSeed, so they do not validate the production settle-on-miss path in _pickWeightedChallengeFull. A regression where createChallenge fails to settle the missed CG, excludes the wrong weight, or does not persist the heal after a successful redraw would not be caught. Add a createChallenge test with an expired high-weight CG plus a live low-weight CG, assert the challenge lands on the live CG, and assert the expired leaf is zeroed after the transaction commits.
| // active-weight share. Deterministic seed sequence, so this is a fixed | ||
| // pass/fail check, not a flaky statistical one. | ||
| // ----------------------------------------------------------------------- | ||
| it('fairness: draw frequency within ±2pp of weight share (settled state)', async () => { |
There was a problem hiding this comment.
🟡 Issue: The advertised fairness acceptance check only covers the fully settled preview path. The RFC’s stated bar is for a drift workload with publish/expire/extend behavior and integrated draw+settle hooks, including wasted-retry rate; this test cannot catch skew introduced by stale leaves, settle-on-miss behavior, or under-statement latency. Add a drift-regime test that uses the production draw path and asserts both the ±2pp distribution and <1% wasted retries.
There was a problem hiding this comment.
🟡 Issue: The test is labeled as the RFC fairness acceptance criterion, but it only samples a fully settled static weight vector. The stated acceptance bar is for a realistic publish/expire/extend drift regime with settle-on-miss only, including wasted-retry rate < 1%; none of that workload or retry-rate evidence is exercised here. Please either rename this as settled-state parity coverage or add the drift-regime test so the lazy-settlement fairness claim is actually validated in CI.
There was a problem hiding this comment.
🟡 Issue: This is labeled as the RFC fairness acceptance test, but it only covers a fully settled static state. The stated risk is the lazy-settlement drift regime, including publish/expire/extend churn and wasted retries; this test never advances into stale leaves, never exercises under-statement, and never measures wasted-retry rate. The claimed ±2% / <1% fairness bar is therefore not actually validated.
There was a problem hiding this comment.
🟡 Issue: This is the only asserted fairness bar, but it covers only a fully settled previewChallengeForSeed path. The RFC’s stated risk is lazy-settlement drift, including publish/expire/extend behavior and wasted retries, and it says the ±2% / <1% bar should be checked against the integrated draw + settle hooks. As written, the test proves the exact Fenwick distribution in the no-drift case, but it does not validate the deferred-keeper decision or catch sampling skew from stale leaves. Add a drift-regime test that drives createChallenge through publish/expire/extend or scheduled-value scenarios and records true-weight deviation plus wasted-retry rate.
There was a problem hiding this comment.
🟡 Issue: This is labeled as the RFC fairness acceptance criterion, but it only samples a fully settled static tree and does not cover the drift regime described in the RFC: publish/expire/extend workload plus wasted-retry rate < 1%. As written it can pass while settle-on-miss/under-statement behavior violates the documented fairness bar. Add a workload test that advances epochs, expires leaves, extends/publishes value, and checks both draw-share deviation and wasted retry rate.
There was a problem hiding this comment.
🟡 Issue: The fairness acceptance test only covers a fully settled, static weight vector. The RFC decision being validated is specifically the drift regime with settle-on-miss only under publish/expire/extend activity, but this test creates long-lived KAs once, settles the leaves via seedCGValue, and then samples previewChallengeForSeed. That does not validate under-statement, over-statement, wasted-retry rate, or the keeper deferral trigger. Add an integrated workload that mutates/extends/expires value and drives the production draw/settlement path before claiming the ±2pp fairness bar is covered.
There was a problem hiding this comment.
🟡 Issue: The fairness acceptance test only covers a fully settled snapshot and only checks per-CG draw frequency. The RFC/measurement evidence sets the bar as a drift-regime test over publish/expire/extend with both ±2% share and wasted-retry rate <1%, but this test never advances through stale leaves, future/extended value, or measures wasted retries. A regression that keeps settled distributions correct while producing excessive retry waste or skew under lazy settlement would still pass the current validation.
There was a problem hiding this comment.
🟡 Issue: The advertised fairness acceptance test only covers a settled, static weight vector through previewChallengeForSeed. It does not exercise the lazy-settlement drift regime called out by the RFC: publish/expire/extend traffic, stale over/under-stated leaves, production createChallenge settle-on-miss, or wasted-retry rate. This would still pass if stale leaves skew production sampling beyond the ±2% / <1% retry bar, so add an integrated drift workload test before treating the fairness criterion as validated.
| /* getter may not exist on older deploys; skip cross-check */ | ||
| } | ||
| console.log(`bitTotal=${bitTotal} (== Σseeded ✓) ledger getTotalValueAtEpoch=${globalTotal}`); | ||
| if (globalTotal !== 0n && globalTotal !== bitTotal) { |
There was a problem hiding this comment.
🟡 Issue: The migration validation only warns when getTotalValueAtEpoch(currentEpoch) differs from the seeded BIT total, then still calls finishBackfill() below. The RFC says this cross-check is part of the unlock validation; leaving it non-blocking can make the migration report success and enable draws with missing or mis-seeded weights. Make this mismatch fatal unless an explicit override is supplied and recorded.
| const __dirname = path.dirname(fileURLToPath(import.meta.url)); | ||
| const evmDir = path.resolve(__dirname, '..'); | ||
| // The worktree has no .env (gitignored); fall back to the main checkout's .env. | ||
| for (const p of [path.join(evmDir, '.env'), '/Users/aleatoric/dev/dkg/packages/evm-module/.env']) { |
There was a problem hiding this comment.
🔵 Nit: This script includes a developer-specific absolute .env fallback path. That makes the checked-in utility less portable and can confuse future runs outside the original workstation; prefer only repo-relative paths or an explicit env var such as DOTENV_CONFIG_PATH.
…ither) - 057: only auto-finishBackfill on a PROVABLY FRESH ledger (ContextGraphStorage counter == 0). Prevents unlocking with an empty BIT over an existing ledger (would orphan existing CGs until re-settled). Clean redeploy => counter 0 => safe. - RandomSampling: extract shared `_drawAttempt` view helper so the view + production draw share ONE selection implementation (parity oracle guards it). Correct the settle-on-miss framing: it persists only when createChallenge commits; on the all-miss path the tx reverts and rolls back the settles, but the per-block seed re-rolls so the node self-recovers next attempt/period — keeper/settle-on-spend are optimizations, NOT a liveness backstop. Clarify previewChallengeForSeed NatSpec: CG selection uses the current BIT snapshot; targetEpoch only gates KC expiry. - migrate script: conservative env-configurable batch (default 25, MIGRATE_BATCH); make the global-total cross-check FATAL before finishBackfill (override with MIGRATE_ALLOW_TOTAL_MISMATCH=1 for the known inactive-CG case). - phase0 script: drop the hard-coded dev .env path; use DOTENV_CONFIG_PATH. - CGWeightTreeStorage: inherit IInitializable; slither-disable the benign reentrancy (trusted Hub storage callees) + calls-loop/costly-loop on the batch settle path, with justifications. (BIT_CAPACITY stays UPPER — matches CHUNK_BYTE_SIZE.) Full suite 761 passing; draw-level parity oracle still bit-identical; clean deploy still auto-unlocks; ABI unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks for the review — all 18 bot findings triaged and addressed in d322530 (full suite 761 green, draw-level parity oracle still bit-identical, clean deploy still auto-unlocks, ABI unchanged). otReviewAgent
github-advanced-security (Slither)All on
🤖 Generated with Claude Code |
| // >MAX_CG_RETRIES deactivated-but-weighted CGs would each consume a retry and could | ||
| // exhaust the budget, reverting the draw even while active CGs exist (the legacy | ||
| // pre-filter scan could not starve this way). See RFC Invariant 2. | ||
| if (!contextGraphStorage.isContextGraphActive(chosenCg)) return (0, 0, false); |
There was a problem hiding this comment.
🔴 Bug: Filtering inactive CGs only after the BIT draw is not equivalent to the previous active-CG prefilter. deactivateContextGraph leaves the CG's value ledger positive, and CGWeightTreeStorage.settle() reconciles only to ledger truth, not to isContextGraphActive, so a deactivated CG keeps a positive BIT leaf forever. Enough inactive weighted CGs can consume the bounded retry budget and make createChallenge revert despite active weighted CGs being present. The deactivation path needs to zero the BIT leaf, or settlement/tree weighting must enforce cgWeight > 0 => active.
The new draw path makes cgWeight > 0 => active a cross-contract invariant, but it is only documented here, not enforced where ContextGraphStorage.deactivateContextGraph flips the active flag. If any Hub contract or the Hub owner uses that existing API, the BIT leaf remains positive; _pickKc treats it as a miss, but settle() will not zero it because ledger value can still be positive. More than MAX_CG_RETRIES such leaves can make draws revert while active weighted CGs still exist. This should be encapsulated in a single deactivation path that also settles/zeros the tree leaf, rather than relying on future callers to remember the invariant.
There was a problem hiding this comment.
🔴 Bug: Inactive CGs are no longer removed from the weighted total before drawing; they are only detected after the Fenwick tree selects them and then consume one of the five CG retries. A deactivated CG keeps its nonzero cgWeight because deactivateContextGraph does not zero the BIT leaf, so more than MAX_CG_RETRIES inactive/high-weight CGs can make createChallenge revert with NoEligibleKnowledgeAsset even while active weighted CGs remain. This regresses the old _isCGEligible behavior where inactive CGs were excluded from the adjusted total. The deactivation path needs to enforce inactive => leaf 0 or the draw must exclude inactive weight without burning the bounded retry budget.
The BIT invariant cgWeight > 0 => CG active is only enforced as a comment/backstop. If deactivateContextGraph is called on weighted CGs, their weight remains in bitTotal, so the draw can repeatedly land on inactive CGs and burn the bounded retry budget; with more than MAX_CG_RETRIES inactive weighted CGs, challenges can revert even while active weighted CGs exist. The deactivation path should zero/settle the BIT leaf, or the draw total should exclude inactive weight at the source.
There was a problem hiding this comment.
🔴 Bug: Inactive CGs are filtered only after the weighted draw, while deactivateContextGraph does not zero the CG's BIT weight. Deactivated high-weight CGs therefore remain in bitTotal and can consume all CG retries, causing createChallenge to revert even when active weighted CGs exist. The previous scan skipped inactive CGs before computing the draw total, so the cgWeight > 0 => active invariant needs to be enforced in the deactivation/migration path rather than treated as a best-effort miss.
There was a problem hiding this comment.
🔴 Bug: Deactivated CGs with nonzero BIT weight remain in bitTotal: _pickKa treats them as a miss, but neither this path nor CGWeightTreeStorage.settle() zeroes inactive leaves. Unlike the old full scan, which filtered inactive CGs out before drawing, several high-weight deactivated CGs can consume all MAX_CG_RETRIES and make createChallenge revert even while active weighted CGs exist. Enforce cgWeight > 0 => active by zeroing the leaf on deactivation or in settlement before relying on the BIT draw.
| let w = 0n; | ||
| try { | ||
| w = await cgValue.getCGValueAtEpoch(cg, currentEpoch); | ||
| } catch (e) { |
There was a problem hiding this comment.
🟡 Issue: The migration script silently converts a per-CG getCGValueAtEpoch failure into weight 0 and continues. The later global-total check is also optional/caught, so a transient RPC issue or a real ledger invariant break can still lead to finishBackfill() with an under-seeded tree. For an irreversible migration that controls challenge weighting, this should accumulate read errors and abort before seeding/unlocking unless an explicit, audited override is used.
| lifetime, | ||
| value, | ||
| ); | ||
| // Mirror production settle-on-spend: reconcile the BIT leaf to the new |
There was a problem hiding this comment.
🟡 Issue: The RandomSampling tests bypass the new production settle-on-spend hook by writing directly to ContextGraphValueStorage and then calling CGWeightTreeStorage.settle() manually. That means the suite would still pass if the KnowledgeAssetsLifecycle publish/update/extend hooks were removed, pointed at the wrong tree, or failed after a lifecycle refactor. Add at least one lifecycle-path test that publishes, updates, and/or extends through KnowledgeAssetsLifecycle and asserts cgWeight/bitTotal changed without any manual settle call, then creates a challenge from that value.
There was a problem hiding this comment.
🟡 Issue: The random-sampling tests seed ContextGraphValueStorage directly and then manually call CGWeightTreeStorage.settle, so they do not verify the actual settle-on-spend hooks added to KnowledgeAssetsLifecycle.publish, extendKnowledgeAssetLifetime, and update. A regression where those hooks stop running would still pass these tests while real publishes leave the BIT leaf zero or stale, causing createChallenge to revert or draw from stale weights. Add facade-level publish/update/extend coverage that asserts cgWeight/bitTotal change without a manual settle.
There was a problem hiding this comment.
🟡 Issue: The random-sampling coverage bypasses the production settle-on-spend hook by writing ContextGraphValueStorage directly and then manually calling CGWeightTreeStorage.settle. The new draw depends on KnowledgeAssetsLifecycle.publish/update/extendKnowledgeAssetLifetime keeping the BIT populated, but CI would still pass if those hook calls were removed or miswired. Add at least one lifecycle-level test that publishes/updates/extends through KnowledgeAssetsLifecycle and asserts cgWeight, bitTotal, and createChallenge work without any manual settle.
There was a problem hiding this comment.
🟡 Issue: The new picker coverage seeds ContextGraphValueStorage directly and then manually calls CGWeightTreeStorage.settle, so it bypasses the production settle-on-spend hook added to KnowledgeAssetsLifecycle publish/update/extend. The existing V10 lifecycle tests do not assert cgWeight/bitTotal after those real flows, so a wiring, Hub registration, or authorization regression in the actual hook could still pass the BIT picker tests. Add at least one lifecycle-path assertion that publish/update/extend refreshes the BIT leaf without a manual settle.
…test, clean preview API) No backwards-compat constraint (per maintainer), so make the clean change: - previewChallengeForSeed DROPS the vestigial targetEpoch param — the CG draw can only read the current BIT snapshot, so the param was misleading. Now previews the current epoch internally. Updated all call sites + abi (no off-chain callers). - NEW deterministic createChallenge settle-on-miss test (Test F): a dominant STALE CG is drawn first -> KC expired -> settle-on-miss reconciles its leaf to 0 -> excluded -> draw lands on the live CG -> SUCCESS commits, so the reconcile PERSISTS. Exercises the production state-changing path (the prior decay tests only used the view path). - migrate: a per-CG getCGValueAtEpoch failure is now FATAL (was a silent seed-0), so a read failure can't drop a leaf and unlock with missing weight. Full suite 762 passing; gas benchmark updated for the new preview signature. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Second review pass addressed in 0c333fd (full suite 762 passing). Per maintainer guidance we're not preserving backwards-compat, so I took the clean break where it helps.
CI note: the one red lane ( 🤖 Generated with Claude Code |
| * diverge from the state-changing `_generateChallenge` draw, which self-heals — best-effort | ||
| * prediction only, not guaranteed between settle cycles (RFC §Security). | ||
| */ | ||
| function previewChallengeForSeed( |
There was a problem hiding this comment.
🔴 Bug: previewChallengeForSeed is a public contract API documented for nodes/indexers, but this change removes the existing (bytes32,uint256) selector instead of adding a compatibility overload. Any consumer still using the old ABI will call a missing selector and fail. Keep an overload that delegates to the current-epoch implementation (or explicitly rejects non-current epochs) unless this PR also coordinates every external caller migration.
There was a problem hiding this comment.
🔴 Bug: previewChallengeForSeed now accepts only bytes32, but the duplicated chain ABI still exposes previewChallengeForSeed(bytes32,uint256) and MAX_KC_RETRIES. Any external consumer using packages/chain/abi/RandomSampling.json will call a dead selector after this deployment. Please update the chain ABI/clients or preserve the old wrapper if this API must remain stable.
There was a problem hiding this comment.
🟡 Issue: previewChallengeForSeed bypasses the backfillLocked gate enforced by _generateChallenge. During a migration, createChallenge reverts ChallengeDrawPaused, but preview still reads the partially seeded or empty BIT and can return a challenge or a different NoEligible* error that production cannot create. That makes the public preview API diverge from the production picker despite the surrounding docs saying they share behavior. Consider applying the same pause check here, or clearly separating this as a raw tree/debug preview.
| uint256(p.tokenAmount) | ||
| ); | ||
| // settle-on-spend: reconcile the BIT weight leaf to the new ledger truth. | ||
| cgWeightTreeStorage.settle(p.contextGraphId); |
There was a problem hiding this comment.
🟡 Issue: The new settle-on-spend hook is only validated indirectly by the devnet note; the automated RandomSampling tests bypass KnowledgeAssetsLifecycle by writing ContextGraphValueStorage directly and then calling CGWeightTreeStorage.settle() manually. That means a regression where publish, extendKnowledgeAssetLifetime, or update stop settling the BIT leaf would not be caught by the unit/integration suite. Please add at least one production-path test that publishes/extends/updates through KnowledgeAssetsLifecycle and asserts cgWeight/bitTotal changed and createChallenge can draw without any manual settle.
| ## Verdict: PASS (11 PASS / 0 WARN / 0 FAIL) | ||
| - PREFLIGHT: 4/4 nodes up; 4/4 cores have an RS prover. | ||
| - PUBLISH: owner seed-publish registered the CG; **30/30 KAs published**. | ||
| - OBSERVE (per-period proof success): core1 100% (5/5), core2 100% (5/5), core3 75% (3/4), |
There was a problem hiding this comment.
🔵 Nit: The devnet validation evidence is internally inconsistent: the observe summary says 16 proofs were submitted, but the gas table later reports 32 successful submitProof transactions for the same run. Either the gas scan included extra transactions or the summary count is wrong; as written, the evidence cannot be reconciled and should be corrected before relying on these numbers.
… total submitProof txs) Addresses a review nit: the "16 proofs" is the rs-validation per-period-success metric (scoped to the ~5-min observation window); the gas table's 32 submitProof txs is every successful submit across the chain's full lifetime — a superset, not a contradiction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Third pass — CI is now fully green (the earlier red lane was the flaky
Net: green CI, all actionable points addressed. Ready for human review. 🤖 Generated with Claude Code |
| return; | ||
| } | ||
|
|
||
| const currentEpoch: bigint = await chronos.getCurrentEpoch(); |
There was a problem hiding this comment.
🔴 Bug: The migration takes a single off-chain snapshot of currentEpoch/weights and then seeds the BIT over multiple transactions, while publish/update/extend remain live and call settle into the same tree. A concurrent spend during the backfill window can be applied by settle-on-spend and then overwritten by a later seedMany batch using the stale snapshot; the final validation compares against the same stale expectedTotal, so the tree can be unlocked with weights that no longer match the ledger. The migration path should either pause CG value writes while backfillLocked, seed from canonical on-chain truth at execution time, or re-read/revalidate the latest ledger state after the last batch before finishBackfill.
|
|
||
| const seed = ethers.keccak256(ethers.toUtf8Bytes('gas-seed')); | ||
|
|
||
| const measure = async () => RS.previewChallengeForSeed.estimateGas(seed); |
There was a problem hiding this comment.
🟡 Issue: The scaling gas check estimates previewChallengeForSeed while the tree is intentionally backfill-locked, so it bypasses the production createChallenge gate plus the state-changing work (setNodeChallenge, proof-period update, and settle-on-miss). This can pass even if the real path regresses in gas or remains paused. Add an unlocked createChallenge gas assertion/transaction for at least the no-miss path and one stale-leaf miss path to make the automated evidence match the production claim.
There was a problem hiding this comment.
🟡 Issue: The gas benchmark measures previewChallengeForSeed on a backfill-locked tree and only scales populated leaves to ~10k. That skips the production createChallenge path, the 5-exclusion retry path, settle-on-miss finalization cost, migration batch cost, and the advertised 100k target. Because the RFC explicitly calls worst-case challenge gas variable and partly attacker-influenced, this evidence is too narrow to validate the gas safety claim. Add benchmarks for production draw with 0 and 5 exclusions at 1k/10k/100k, plus representative cold/warm settle-on-miss and migration batches.
There was a problem hiding this comment.
🟡 Issue: The scaling gas evidence estimates previewChallengeForSeed, not createChallenge, and uses a dominant real leaf so the draw lands on the happy path without exclusions or settle-on-miss. That misses the expensive production cases introduced by this PR: up to MAX_CG_RETRIES excluded/stale CGs plus first-touch settle replay depth. The devnet report also uses one public CG, so it does not fill this gap. Add gas coverage for production createChallenge with 0 and 5 exclusions, stale leaves with realistic D, and enough weighted CGs to exercise the full retry path.
…set (KA) terminology "Knowledge Collection" is deprecated; the current term is Knowledge Asset. Full rename (no backwards-compat constraint, per maintainer): - Storage API: getContextGraphKCCount/At/List → getContextGraphKaCount/At/List; contextGraphKCList → contextGraphKaList (ContextGraphStorage + ContextGraphs facade). - RandomSampling: _pickKc → _pickKa, MAX_KC_RETRIES → MAX_KA_RETRIES, kcCount/kcSeed → kaCount/kaSeed; all "KC"/"Knowledge Collection" comments → KA/"Knowledge Asset". - Tests: createKC → createKa + call-site/comment updates; ABIs regenerated. - Docs: RFC + devnet-validation swept to KA terminology. - Also reworded the backfill-lock gate comment: it's not "migration" for a clean deploy (057 unlocks at deploy on a fresh ledger); only an upgrade-in-place that seeds existing CGs holds it locked during seeding. Scoped out: the ~50 pre-existing KC/KCS prose comments in KnowledgeAssetsLifecycle.sol (publish/update docs) — left for a focused follow-up to avoid churning that contract. Full suite 762 passing; ABIs (ContextGraphStorage, RandomSampling) regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eightTreeStorage comments Comment-only sweep (no code/ABI change): "Knowledge Collection"→"Knowledge Asset", KCS→KAS (knowledge-asset storage), KC/KCs→KA/KAs across KnowledgeAssetsLifecycle.sol prose (publish/update/extend docs) and the two stray KC comments in CGWeightTreeStorage. Compiles clean; ABIs unchanged (comments only). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| * indexers (eth_call) where the gas cost is not charged. | ||
| */ | ||
| function getContextGraphKCList( | ||
| function getContextGraphKaList( |
There was a problem hiding this comment.
🔴 Bug: Renaming the public CG list getters to getContextGraphKa* removes the existing getContextGraphKCCount/At/List selectors, but the chain/agent packages still ship and call the old ABI/methods (packages/chain/src/evm-adapter-context-graph.ts calls getContextGraphKCCount and getContextGraphKCAt). Against these upgraded contracts, VM reconciliation and hosted-CG sync will fail at runtime. Keep backward-compatible aliases or update all SDK/agent ABIs and callers in the same PR.
Renaming these public getters removes the existing getContextGraphKC* ABI while repo consumers still call it: packages/chain/src/evm-adapter-context-graph.ts invokes getContextGraphKCCount/getContextGraphKCAt, and packages/agent gates reconciliation on those adapter methods. As written, generated/deployed contracts no longer expose the selectors those clients expect, so CG reconciliation breaks after this deploy. Keep backward-compatible wrapper methods (delegating to the new Ka names) or update every adapter/ABI consumer in the same change.
There was a problem hiding this comment.
🔴 Bug: Renaming the on-chain KC list getters to getContextGraphKa* removes the existing selectors, but the runtime chain adapter and packaged ABI still call getContextGraphKCCount / getContextGraphKCAt (packages/chain/src/evm-adapter-context-graph.ts:474-484, packages/chain/abi/ContextGraphStorage.json). After this deploy, agent reconciliation/prover sync calls will revert against ContextGraphStorage, so nodes will stop discovering registered KAs by CG. Keep backward-compatible aliases or update the chain package/agent interfaces and ABI in the same PR.
Renaming the public getContextGraphKC* selectors to getContextGraphKa* leaves duplicated client surfaces stale: packages/chain/src/evm-adapter-context-graph.ts still calls getContextGraphKCCount/At, and packages/chain/abi/ContextGraphStorage.json still advertises the old selectors. Node reconciliation/indexing paths using that adapter will encode calls to functions that no longer exist. Either keep compatibility aliases or update the chain package ABI and adapter in the same PR.
There was a problem hiding this comment.
🔴 Bug: These public getters were renamed from getContextGraphKCList/Count/At to getContextGraphKaList/Count/At with no compatibility wrappers, while ContextGraphStorage.version() is still 10.0.2. Function names are part of the ABI selector, so existing indexers/SDKs using the old selectors now fail even though the storage behavior is unchanged. Keep the old KC-named wrappers as deprecated aliases, or bump/coordinate this as an explicit breaking contract API change.
The public getContextGraphKCList/Count/At selectors are removed and replaced with Ka names without compatibility aliases, while ContextGraphStorage.version() is unchanged. This is an avoidable ABI break for SDKs, indexers, and any consumers compiled against the existing contract surface. Keep deprecated wrappers delegating to the new functions, or make this an explicit versioned breaking change with coordinated client migration.
There was a problem hiding this comment.
🔴 Bug: Renaming the exported getContextGraphKC* getters to getContextGraphKa* changes the ABI selectors, but this contract still reports version 10.0.2 and keeps no deprecated aliases. That makes the change hard for external consumers and indexers to feature-detect, and any caller compiled against the previous ABI will fail at runtime. Either keep thin getContextGraphKC* aliases for a transition period or bump/version this ABI break explicitly and update the chain ABI pins accordingly.
…tests + ABIs) Repo-wide finish of the Knowledge Collection → Knowledge Asset rename across the whole evm-module (per maintainer: full rename, this branch): Contracts (code + comments; ABIs regenerated for the conviction layer): - conviction: kcEpochs→kaEpochs, kcStartEpoch→kaStartEpoch, InvalidConvictionKcEpochs→InvalidConvictionKaEpochs (PublishingConviction, IDKGPublishingConvictionNFT, DKGPublishingConvictionNFT). - KAL: `kcs` local → `kas`; DKGKnowledgeAssets: `kc` struct local → `ka`; ContextGraphStorage: "kcIndex" revert string → "kaIndex". - Profile/ProfileLib/ProfileStorage + ContextGraphValueStorage + DKGKnowledgeAssets prose → "Knowledge Asset"; KCS→KAS. Fixed a stale /api/kc comment → /api/knowledge-assets. Tests/helpers: KCSignaturesData→KASignaturesData, getKCSignaturesData→getKASignaturesData, _autoBridgeKCToDefaultCG→_autoBridgeKAToDefaultCG, KCSContract→KASContract, TEST_KC_BYTE_SIZE→TEST_KA_BYTE_SIZE, createKC→createKa, *KcId→*KaId, kc*→ka* locals; renamed helpers kc-helpers.ts→ka-helpers.ts, v10-kc-helpers.ts→v10-ka-helpers.ts + imports. Deliberately NOT renamed (not our terminology): the external `kcTools` import (assertion-tools) and the legacy `KnowledgeCollection` contract name/type + its `contracts.kc` instance. Full suite 762 passing; contracts compile; ABIs regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| } | ||
| console.log(`weighted (active, nonzero) CGs: ${cgIds.length}, Σweight=${expectedTotal}`); | ||
|
|
||
| // 2) On-chain: seed in batches (O(k.log) per batch). |
There was a problem hiding this comment.
🟡 Issue: The migration script is deployment-critical but has no automated validation around its upgrade path: scanning existing CGs, batching seedMany, handling global-total mismatches, and unlocking. The recorded devnet run explicitly covers only a clean redeploy/auto-unlock path, so regressions in the actual existing-ledger migration could slip through. Add a fork/local test or script test that starts with pre-existing CG value, runs this migrator, verifies bitTotal/leaf parity and finishBackfill, then proves createChallenge works after unlock.
There was a problem hiding this comment.
🟡 Issue: The upgrade migration path is not exercised by the added validation. The devnet evidence is explicitly a clean redeploy/auto-unlock run, while the migration script's critical flow is seeding an existing non-empty ledger, validating totals, calling finishBackfill, and then proving createChallenge works after unlock. Without an integration test or rehearsal for this script plus the 057 non-empty skip path, regressions in deployment address lookup, batch seeding, total validation, or owner/onlyContracts permissions could leave upgraded networks paused or seeded incorrectly without CI/devnet catching it.
There was a problem hiding this comment.
🟡 Issue: The migration script's scan/seed/validate/unlock flow is not exercised end-to-end by tests; the unit tests only cover seedMany/finishBackfill primitives. Since this script is the operator path for existing CGs, regressions in deployment lookup, signer permissions, batching/idempotency, or the global-total validation would only be found during a live migration rehearsal. Add a Hardhat script-level test or fixture-backed migration test with pre-existing weighted CGs.
…tGraphKa* methods The KC→KA rename of ContextGraphStorage's public getters broke the off-chain chain adapter at RUNTIME: evm-adapter-context-graph.ts called cgs.getContextGraphKCCount/At (via an untyped ethers Contract, so not compile-caught) against the renamed contract, and packages/chain shipped stale bundled ABIs. The SWM host / reconciler use these. - evm-adapter-context-graph.ts: internal calls → cgs.getContextGraphKaCount/At. (The adapter's own wrapper method names + agent consumers keep their off-chain KC naming — they call through the wrapper; that's a separate cosmetic layer.) - Synced the 5 changed bundled ABIs (ContextGraphStorage, RandomSampling, +3 conviction) from evm-module/abi so they match the deployed contracts (RandomSampling's was also stale for the whole BIT PR — a pre-existing sync gap). chain: tsc clean; 259 unit tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| function settle(uint256 cg) public { | ||
| uint256 currentEpoch = chronos.getCurrentEpoch(); | ||
| if (currentEpoch > 0) contextGraphValueStorage.finalizeCGValueUpTo(cg, currentEpoch - 1); | ||
| uint256 truth = contextGraphValueStorage.getCGValueAtEpoch(cg, currentEpoch); |
There was a problem hiding this comment.
🔴 Bug: settle() writes the ledger truth into the BIT leaf without checking whether the context graph is active. That violates the new cgWeight > 0 => active invariant: an inactive CG with positive ledger value can remain in, or be re-added to, bitTotal by settleMany/settle-on-miss. RandomSampling then repeatedly draws it and treats it as a miss, and enough inactive weighted CGs can exhaust MAX_CG_RETRIES and revert even when active CGs exist. Make settlement active-aware and force inactive CG weights to zero, or wire deactivation so the leaf cannot be reintroduced.
| /// @notice `kcEpochs` was 0 or exceeded the account's `lockDurationEpochs`. | ||
| error InvalidConvictionKcEpochs(uint256 lockDurationEpochs, uint256 kcEpochs); | ||
| /// @notice `kaEpochs` was 0 or exceeded the account's `lockDurationEpochs`. | ||
| error InvalidConvictionKaEpochs(uint256 lockDurationEpochs, uint256 kaEpochs); |
There was a problem hiding this comment.
🟡 Issue: This error rename changes the custom-error selector while PublishingConviction still reports version 10.0.4 and the chain ABI pin table was not updated. Off-chain error decoding and pinning tests will see unexplained drift. Treat this like the other public ABI changes: either preserve the old error name until a planned breaking release, or bump/re-pin the ABI with a migration note.
Per the codebase convention (on-chain version must change when the ABI/selectors do): - ContextGraphStorage 10.0.2 → 10.0.3 (getContextGraphKa* getter rename = selector change). - PublishingConviction 10.0.4 → 10.0.5 (InvalidConvictionKaEpochs = error selector change). No behavior change; version() return only (ABI shape unchanged). Addresses PR #1226 review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Latest round addressed in Real items — fixed:
🔴 Re-raises (already addressed): settle-on-miss rollback (self-recovering via per-block seed re-roll; Test F proves success-path persistence), fairness-drift unit test + gas benchmark scope (covered by the 4-node devnet PASS: real CI note: the only red lane was 🤖 Generated with Claude Code |
| for (uint256 k = 0; k < n; k++) settle(cgs[k]); | ||
| } | ||
|
|
||
| function _setWeight(uint256 cg, uint256 weight) internal { |
There was a problem hiding this comment.
🟡 Issue: _setWeight only reaches the _bitUpdate capacity guard when weight != cached, so calls like seed(BIT_CAPACITY, 0) or settle(BIT_CAPACITY) with zero truth silently succeed and can even update cgSettledEpoch. That makes the fixed-capacity invariant fail late, at the first nonzero update, instead of at migration/backfill time. Validate cg != 0 && cg < BIT_CAPACITY before the cached-weight comparison.
| // failing CG excluded from the adjusted total until either a KC is | ||
| // found or the retry budget runs out. | ||
| uint256[] memory exhaustedCgs = new uint256[](MAX_CG_RETRIES); | ||
| uint256[] memory exhausted = new uint256[](MAX_CG_RETRIES); |
There was a problem hiding this comment.
🔵 Nit: exhaustedCount is maintained but never used because the full padded array is passed into workingTotal/findStrictGtExcluding. This leaves the exclusion contract implicit: callers must pass distinct entries and rely on zero padding, while duplicate entries can underflow. Either pass the active count through the tree helpers or drop the unused counter and document/enforce distinctness at the API boundary.
…orage, PublishingConviction) The chain ABI-pinning drift detector (CH-5) caught the intentional rename: - ContextGraphStorage: getContextGraphKCCount/At/List → getContextGraphKaCount/At/List - PublishingConviction: error InvalidConvictionKcEpochs → InvalidConvictionKaEpochs Reviewed + intentional (adapter updated in 6a63e71, contracts bumped to 10.0.3 / 10.0.5). Confirmed only packages/chain bundles these ABIs; no other package needs regen. Pins repinned. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| // there is no reentrancy surface. State-after-call ordering is required (finalize must run | ||
| // before the value read). | ||
| // slither-disable-next-line reentrancy-no-eth,reentrancy-benign,reentrancy-events | ||
| function settle(uint256 cg) public { |
There was a problem hiding this comment.
🔴 Bug: settle() remains callable while backfillLocked is true, so the migration seeding path and live settle-on-spend can race. During an upgrade backfill, seedMany() writes snapshot weights with _setWeight, while publishes/updates are still allowed and call cgWeightTreeStorage.settle(...); if a publish settles CG X after the snapshot is computed but before a later seed batch writes X, the seed overwrites the newer leaf and bitTotal loses that value until some future manual settlement. Since createChallenge is the only paused path, this can leave newly published value underweighted or invisible after unlock. The backfill window needs to also exclude live settlement/value-changing writes, or seed must reconcile against current ledger state without overwriting newer settles.
Summary
Closes the one mainnet scaling cliff in RandomSampling: the value-weighted CG draw did two full linear scans over every CG ever created, and each per-CG weight read replayed every un-applied epoch —
O(retries · 2 · N · D). Phase-0 onbase_sepolia_v10showed this is already real (≈110M gas for a single draw at just N=75).This replaces the twin scan with a fixed-capacity Fenwick/BIT (
O(log)prefix draw) plus lazy settlement (weights are an approximate, eventually-consistent index over the existingcgValueDiffledger — safe because the KC-expiry filter,RandomSampling.sol:648, means proof validity never depends on weight freshness).Result: the draw is now ~constant ~310k gas regardless of CG count (devnet-measured), supporting 5k–100k+ CGs. RFC:
docs/rfcs/scalable-weighted-cg-sampling.md.What's in here
CGWeightTreeStorage— fixed power-of-two-capacity Fenwick over CG ids;findStrictGtExcludingsubtracts excluded leaves during the descent (renormalization-exact);settle/seed/settleMany; explicitcgWeightleaves.RandomSamplingdraw rewrite —_pickWeightedChallengesplit into_pickKc(KC filters + leaf draw, verbatim), aviewcore (backspreviewChallengeForSeed), and a state-changing wrapper with settle-on-miss.createChallengerevertsChallengeDrawPausedwhile the index is backfill-locked. Eligibility checked on the drawn CG only (1 SLOAD, not O(N)).KnowledgeAssetsLifecycle— settle-on-spend after eachaddCGValueForEpochRange(publish/extend/update).057auto-finishBackfillon non-mainnet (zero-migration go-live on a clean deploy);scripts/migrate-cg-weight-tree.tsfor an eventual mainnet seed.Validation
docs/rfcs/devnet-validation-bit-draw.md): publish →createChallenge→submitProofacross proof periods; 30/30 KAs, 16 proofs, 0 data-corrupted / 0 no-eligible-cg / 0 errors. On-chain gas:createChallenge~310k avg (≈350× below the old cliff),publish~757k (incl. settle-on-spend; 1.37M cold first-publish).Resolved open decisions (see RFC)
settleManyalready ships); Phase-0 showed dormancy is zero-weight (skipped for free) and no future-dated backlog.2^knode).Reviewer notes / deferred (not blockers)
createChallengegas is variable: up toMAX_CG_RETRIESsettle-on-miss calls, eachO(D)first-touch finalize. Bounded & amortizing; on a clean ledger D stays small.deactivateContextGraphhas no callers, so the per-draw active-check suffices. If it's ever wired, leaf-zeroing becomes mandatory (else >MAX_CG_RETRIESdeactivated-but-weighted CGs could exhaust the retry budget) — noted in code.finishBackfillvia the migration script);057only auto-unlocks dev/testnet.🤖 Generated with Claude Code