Skip to content

Component audit snapshot + 6 ideation follow-ups (replay tooling, golden vectors, linoss delegation, population cap)#11

Open
bionicbutterfly13 wants to merge 14 commits into
mainfrom
chore/audit-snapshot-and-followups
Open

Component audit snapshot + 6 ideation follow-ups (replay tooling, golden vectors, linoss delegation, population cap)#11
bionicbutterfly13 wants to merge 14 commits into
mainfrom
chore/audit-snapshot-and-followups

Conversation

@bionicbutterfly13

Copy link
Copy Markdown
Owner

Summary

This branch captures a clean snapshot of in-flight work plus six evidence-grounded
follow-ups that came out of an open-ended /ideate pass over the codebase. All
work is green: 1271 tests passing, ruff clean, byte-stability golden vectors intact.

It has two layers:

  1. Baseline snapshot — committed the prior uncommitted worktree as logical commits
    (post-audit gap-closing src+tests, doc reconciliation, the 10 component benchmark
    scripts, the component benchmark audit report, the ideation artifact, plus a
    .gitignore entry for the .serena tool cache).
  2. Six ideation follow-ups — the ranked, red-teamed survivors from the ideation,
    each addressing a finding from the component benchmark audit.

Ideation follow-ups

Item Change Audit finding
S5 Scope the cross-platform float-hash claim honestly — the README promised "mismatch by construction" and doc 21 justified it with a false "plain NumPy ops" premise; the core is BLAS-routed (@). Corrected both; BLAS identity belongs in a separate diagnostic field, not the hash. Cross-cutting (folded-not-proven)
S4 tests/golden/ byte-stability corpus pinning post_state_hash, content digests, snapshot bytes, and the rng_state_out pickle path the other vectors miss. Records the blessing NumPy version; carries a do-not-re-bless discipline note. P2 (no golden-vector test)
S3-A elume.envelope.replay(operation, scenario, n=2) -> ReplayResult(matched, hashes, outputs), re-exported; both quickstarts now call it instead of hand-rolling a double-run assert (quickstart code smoke-tested). Operator-pain (no replay tooling)
S6-P1 Optional max_population cap on generational growth (caps children/gen, never deletes — lineage/no-dangling-parents invariant preserved). Kills the measured ~131k-strategy OOM hazard. P0-1 (unbounded growth)
S1 elume.linoss.solver now delegates the oscillator step to the linoss-dynamics runtime instead of carrying a verbatim duplicate of the IM/IMEX math. Thin adapter keeps validation, ReLU-A, vector-dt, and the metrics contract. New test_linoss_runtime_parity.py guards against drift; renamed the misnamed old "parity" test to characterization. P0-3 (phantom dependency / no delegation)
S3-B Pre-image segment diff tool: build_preimage / diff_preimage localize a hash mismatch to a named segment (platform, result, rng_state_out, provider snapshot, …); EnvelopeOutput now carries provider_snapshot_out; preimage_from_envelope pairs with replay(). Hash byte-identical (golden vectors unchanged). Operator-pain (opaque mismatch)

Notes for reviewers

  • Large PR by design — it bundles the baseline snapshot with the follow-ups (the
    follow-ups build on the snapshot). Reviewing commit-by-commit reads cleanly; happy to
    split into stacked PRs if preferred.
  • ab1d188 is a corrective commit: the S1 commit (6c1e24b) lost solver.py + docs to
    an atomic git add failure (stale renamed pathspec); the corrective restores them. The
    on-disk code was always correct and tested — this only fixes history.
  • Full ranked ideation (7 survivors, 29 eliminated, red-team verdicts) is in
    .planning/20260606-ideation-open-ended.md.

Deferred (not in this PR)

  • S2 — IMEX silent-NaN guard (now logically belongs in linoss-dynamics, or as an
    isfinite check in the adapter).
  • S6 Phase 2 — true population truncation with a content-addressed lineage store.
  • S7 — single shared freeze utility (low priority).

Test status

1271 passed
ruff: All checks passed
golden byte-stability vectors: intact (hash unchanged across the S3-B refactor)

…omponent tests

- Recursive genotype freezing (FrozenDict) on Strategy
- Simplify the in-memory provider
- Expand unit/contract/integration coverage across components
- Remove the placeholder contract test
The README claimed cross-platform drift 'surfaces as a hash mismatch by
construction', and doc 21 justified deferring a pinned numeric backend by
asserting the core uses 'plain NumPy ops'. Both are false: Hopfield
(basins/hopfield.py) and the LinOSS encoder/solver use BLAS-routed matmul
(@ / np.dot), and the 5-field platform fingerprint does not capture the
BLAS backend, threads, or SIMD. Two hosts with the same fingerprint but
different BLAS can match-hash over divergent floats.

Scope the claim in both README bullets and correct doc 21's premise;
note that BLAS identity belongs in a separate diagnostic field, not the
canonical pre-image. No code change.
Pins exact digests/bytes for the determinism-critical envelope surfaces so a
NumPy/pickle/canonical-JSON change that silently shifts the stored byte layout
fails loudly instead of passing green:

- compute_post_state_hash (platform tag pinned via the override, so the vector
  is host-independent)
- strategy_content_digest, content-addressed root_hash, full-mode snapshot_bytes
- rng_state_out(default_rng(SEED)) — the pickle-of-numpy-bit-generator path the
  other vectors don't reach (they take rng_state_out as opaque bytes), plus a
  round-trip-to-seed-state check

Records the blessing NumPy version (2.4.4) for self-explaining failure messages
rather than hard-gating it, and carries a regeneration block + a do-not-re-bless
discipline note. Verified non-tautological: tampering the platform tag fails the
assertion.
…art (S3-A)

Adds elume.envelope.replay(operation, scenario, n=2) -> ReplayResult(matched,
hashes, outputs), re-exported from elume.envelope. Runs an op n times on one
input and reports whether every post_state_hash matches — the core replay-safety
contract, previously hand-rolled as a double-run assert in the quickstart.

- operation accepts a registered op name (resolved) or an Operation object
- ReplayResult also carries outputs so callers inspect results without re-running
- n < 2 raises ValueError; a non-operation raises TypeError
- README + docs/quickstart.md now call replay() instead of resolve()+run()+run();
  quickstart code smoke-tested (replay matched: True)

This is S3-A; the pre-image segment diff tool (Unit B) is a separate follow-on.
…S6-P1)

The loop spawned len(population)-elite_k children per generation, growing ~2x/gen
with no bound (measured ~131k strategies in 141s at 15 generations -- a real OOM
hazard). Adds an optional max_population to evolve_one_generation (threaded through
AutoStrategyEvolver) that reduces children spawned as the population nears the
ceiling and lands exactly on it, then plateaus.

Crucially this does NOT delete from the provider, so the immutable-past /
no-dangling-parents invariant holds (covered by a test). max_population<1 raises;
None (default) is uncapped and byte-identical to prior behavior.

This is S6 Phase 1. Phase 2 (true truncation with a content-addressed lineage
store + transitive-ancestor retention) remains a separate, deferred task.
… duplicate math (S1)

elume.linoss.solver carried a verbatim duplicate of the IM/IMEX oscillator math
that linoss-dynamics already owns (declared as a dependency but never imported) —
two copies of the physics that could silently drift. solver.py now imports
linoss-dynamics and delegates the actual state step + energy diagnostics to it.

Elume keeps only the integration layer: input validation, the paper's ReLU-A
clamp (the runtime does not clamp), vector-dt support (the runtime is scalar-dt
only, so the diagonal system is stepped component-wise), and the 10-key metrics
contract. Behavior is preserved byte-for-byte (all prior solver tests pass).

- NEW tests/contract/test_linoss_runtime_parity.py: pins elume == linoss-dynamics
  byte-parity on the shared domain (non-neg A, scalar dt, all forcing shapes),
  plus the two adapter value-adds (ReLU-A clamp, vector dt). Guards against drift.
- RENAME test_linoss_solver_parity.py -> _characterization.py: it never tested
  runtime parity (it pins Elume's metrics/energy contract); the name was a misnomer.
- AGENTS.md + benchmark-report finding #1 updated to reflect real delegation.

Resolves the phantom-dependency / no-delegation gap (report P0-3).
…tches (S3-B)

A failed replay used to yield an opaque 64-char divergence with no way to see
why. This adds segment-level diffing:

- hashing.py: compute_post_state_hash now builds named pre-image segments
  (platform, schema_version, scenario_operation, result, result_arrays,
  rng_state_out, provider_snapshot, provider_snapshot_arrays). The hash is
  byte-identical to before (golden vectors unchanged) — it is just the
  concatenation of the same segments.
- build_preimage() returns per-segment BLAKE2b digests; diff_preimage(a, b)
  reports which named segment(s) diverged (or matched).
- EnvelopeOutput now carries provider_snapshot_out (frozen, default None) so a
  run's pre-image can be rebuilt from (input, output) without re-running.
- preimage_from_envelope(input, output) pairs with replay(): build a pre-image
  from each mismatching run and diff them.

All re-exported from elume.envelope. Verified: rebuilt pre-image hashes to the
recorded post_state_hash; diffs localize platform/result/array/rng/snapshot
divergence to the right segment (cross-platform branch tested via the platform
override). This is S3-B; byte-offset granularity within a segment is out of scope.
The S1 commit (6c1e24b) was meant to include the solver.py delegation refactor
plus the AGENTS.md and benchmark-report notes, but an atomic 'git add' failure
(a stale renamed pathspec) silently dropped them, so 6c1e24b landed with only
the new parity test and the test rename. This commit adds the actual delegation
of the oscillator step to linoss-dynamics and the accompanying doc updates.
End state is unchanged from what was tested (1271 passing, ruff clean).
@ecc-tools

ecc-tools Bot commented Jun 6, 2026

Copy link
Copy Markdown

Analyzing 200 commits...

@ecc-tools

ecc-tools Bot commented Jun 6, 2026

Copy link
Copy Markdown

Analysis Complete

Generated ECC bundle from 14 commits | Confidence: 60%

View Pull Request #12

Repository Profile
Attribute Value
Language Python
Framework Not detected
Commit Convention conventional
Test Directory separate
Changed Files (78)
Metric Value
Files changed 78
Additions 9597
Deletions 317

Top hotspots

Path Status +/-
reference_service/src/reference_service/visualization.py added +1490 / -0
docs/elume-component-benchmark-report.md added +802 / -0
benchmarks/bench_cognition.py added +652 / -0
benchmarks/bench_models.py added +596 / -0
benchmarks/bench_envelope.py added +583 / -0

Top directories

Directory Files Total changes
benchmarks 10 4731
reference_service/src/reference_service 2 1549
docs 5 1458
tests/unit 10 335
tests/unit/envelope 4 296
Analysis Depth Readiness (evidence-backed, 57%)

ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.

Area Status Evidence / Next Step
Commit history Ready 14 commits sampled
CI/CD signals Missing Add workflow files or CI troubleshooting evidence so ECC Tools can reason about pipeline setup.
Security evidence Missing Add AgentShield, audit, SARIF, SBOM, or security review evidence so recommendations can cover security posture.
Harness configuration Ready tests/unit/adapters/memevolve/test_ingest.py, tests/unit/adapters/memevolve/test_shaping.py
Reference/eval evidence Ready benchmarks/bench_basins.py, benchmarks/bench_cognition.py, benchmarks/bench_embedders.py
AI routing and cost controls Ready .planning/20260606-ideation-open-ended.md, docs/plans/archon-adoption-phase-1.md, docs/plans/phase-2-handoff.md
Team handoff and project tracking Missing Add roadmap, runbook, project, Linear, or follow-up tracking docs so generated work can land in a team queue.
Reference Set Readiness (2/7, 29%)
Area Status Evidence / Next Step
Deep analyzer corpus Missing Add analyzer fixture, golden, benchmark, or reference-set files that can catch analyzer regressions.
RAG/evaluator comparison Present benchmarks/bench_basins.py, benchmarks/bench_cognition.py, benchmarks/bench_embedders.py
PR salvage/review corpus Missing Add stale-PR, review-thread, reopen-flow, or salvage reference cases for queue cleanup automation.
Discussion triage corpus Missing Add public discussion triage fixtures, golden cases, or reference sets for informational, answered, and no-response classifications.
Harness compatibility Present tests/unit/adapters/memevolve/test_ingest.py, tests/unit/adapters/memevolve/test_shaping.py
Security evidence Missing Attach security evidence such as SBOMs, SARIF, audit reports, or AgentShield evidence packs.
CI failure-mode evidence Missing Add captured CI failure logs, dry-run fixtures, or troubleshooting docs for common workflow failure modes.
Likely Future Issues (1)
Severity Signal Why it may show up
HIGH Schema or model changes may ship without migration follow-up 4 schema/model paths changed; 0 migration files changed
  • Schema or model changes may ship without migration follow-up: The PR changes schema or model-facing files but does not include any obvious migration artifact.
Suggested Follow-up Work (1)
Type Suggested title Targets
PR db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py src/elume/models/basin.py, src/elume/models/belief.py
  • db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py: Backfill the missing migration artifact before another schema or model change lands on top.

Copy-ready bodies

db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py

## Summary
- Add the missing migration or schema rollout step for the recently changed schema surface.

## Why
- Backfill the missing migration artifact before another schema or model change lands on top.

## Touched paths
- `src/elume/models/basin.py`
- `src/elume/models/belief.py`

## Validation
- Create the migration or schema rollout artifact used by this repo.
- Run the repo migration / schema validation flow and verify the changed models still match production expectations.
Detected Workflows (5)
Workflow Description
feature-development Standard feature implementation workflow
feature-development-with-tests-and-docs Implements a new feature or capability, accompanied by updates to documentation and relevant tests.
documentation-reconciliation-or-expansion Updates, reconciles, or expands documentation across multiple docs/ files and/or the main README.
test-suite-expansion-or-golden-vector-addition Adds new tests, expands coverage, or introduces golden vector/corpus tests for determinism and regression.
refactor-or-internal-api-change-with-contract-tests Refactors internal logic or delegates implementation to a dependency, ensuring contract tests pin behavior.
Generated Instincts (28)
Domain Count
git 4
code-style 9
testing 6
workflow 9

After merging, import with:

/instinct-import .claude/homunculus/instincts/inherited/elume-instincts.yaml

Files

  • .claude/ecc-tools.json
  • .claude/skills/elume/SKILL.md
  • .agents/skills/elume/SKILL.md
  • .agents/skills/elume/agents/openai.yaml
  • .claude/identity.json
  • .codex/config.toml
  • .codex/AGENTS.md
  • .codex/agents/explorer.toml
  • .codex/agents/reviewer.toml
  • .codex/agents/docs-researcher.toml
  • .claude/homunculus/instincts/inherited/elume-instincts.yaml
  • .claude/commands/feature-development.md
  • .claude/commands/feature-development-with-tests-and-docs.md
  • .claude/commands/documentation-reconciliation-or-expansion.md

ECC Tools | Everything Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant