Component audit snapshot + 6 ideation follow-ups (replay tooling, golden vectors, linoss delegation, population cap) by bionicbutterfly13 · Pull Request #11 · bionicbutterfly13/elume

bionicbutterfly13 · 2026-06-06T17:08:42Z

Summary

This branch captures a clean snapshot of in-flight work plus six evidence-grounded
follow-ups that came out of an open-ended /ideate pass over the codebase. All
work is green: 1271 tests passing, ruff clean, byte-stability golden vectors intact.

It has two layers:

Baseline snapshot — committed the prior uncommitted worktree as logical commits
(post-audit gap-closing src+tests, doc reconciliation, the 10 component benchmark
scripts, the component benchmark audit report, the ideation artifact, plus a
.gitignore entry for the .serena tool cache).
Six ideation follow-ups — the ranked, red-teamed survivors from the ideation,
each addressing a finding from the component benchmark audit.

Ideation follow-ups

Item	Change	Audit finding
S5	Scope the cross-platform float-hash claim honestly — the README promised "mismatch by construction" and doc 21 justified it with a false "plain NumPy ops" premise; the core is BLAS-routed (`@`). Corrected both; BLAS identity belongs in a separate diagnostic field, not the hash.	Cross-cutting (folded-not-proven)
S4	`tests/golden/` byte-stability corpus pinning `post_state_hash`, content digests, snapshot bytes, and the `rng_state_out` pickle path the other vectors miss. Records the blessing NumPy version; carries a do-not-re-bless discipline note.	P2 (no golden-vector test)
S3-A	`elume.envelope.replay(operation, scenario, n=2) -> ReplayResult(matched, hashes, outputs)`, re-exported; both quickstarts now call it instead of hand-rolling a double-run assert (quickstart code smoke-tested).	Operator-pain (no replay tooling)
S6-P1	Optional `max_population` cap on generational growth (caps children/gen, never deletes — lineage/no-dangling-parents invariant preserved). Kills the measured ~131k-strategy OOM hazard.	P0-1 (unbounded growth)
S1	`elume.linoss.solver` now delegates the oscillator step to the `linoss-dynamics` runtime instead of carrying a verbatim duplicate of the IM/IMEX math. Thin adapter keeps validation, ReLU-A, vector-`dt`, and the metrics contract. New `test_linoss_runtime_parity.py` guards against drift; renamed the misnamed old "parity" test to characterization.	P0-3 (phantom dependency / no delegation)
S3-B	Pre-image segment diff tool: `build_preimage` / `diff_preimage` localize a hash mismatch to a named segment (platform, result, rng_state_out, provider snapshot, …); `EnvelopeOutput` now carries `provider_snapshot_out`; `preimage_from_envelope` pairs with `replay()`. Hash byte-identical (golden vectors unchanged).	Operator-pain (opaque mismatch)

Notes for reviewers

Large PR by design — it bundles the baseline snapshot with the follow-ups (the
follow-ups build on the snapshot). Reviewing commit-by-commit reads cleanly; happy to
split into stacked PRs if preferred.
ab1d188 is a corrective commit: the S1 commit (6c1e24b) lost solver.py + docs to
an atomic git add failure (stale renamed pathspec); the corrective restores them. The
on-disk code was always correct and tested — this only fixes history.
Full ranked ideation (7 survivors, 29 eliminated, red-team verdicts) is in
.planning/20260606-ideation-open-ended.md.

Deferred (not in this PR)

S2 — IMEX silent-NaN guard (now logically belongs in linoss-dynamics, or as an
isfinite check in the adapter).
S6 Phase 2 — true population truncation with a content-addressed lineage store.
S7 — single shared freeze utility (low priority).

Test status

1271 passed
ruff: All checks passed
golden byte-stability vectors: intact (hash unchanged across the S3-B refactor)

…omponent tests - Recursive genotype freezing (FrozenDict) on Strategy - Simplify the in-memory provider - Expand unit/contract/integration coverage across components - Remove the placeholder contract test

The README claimed cross-platform drift 'surfaces as a hash mismatch by construction', and doc 21 justified deferring a pinned numeric backend by asserting the core uses 'plain NumPy ops'. Both are false: Hopfield (basins/hopfield.py) and the LinOSS encoder/solver use BLAS-routed matmul (@ / np.dot), and the 5-field platform fingerprint does not capture the BLAS backend, threads, or SIMD. Two hosts with the same fingerprint but different BLAS can match-hash over divergent floats. Scope the claim in both README bullets and correct doc 21's premise; note that BLAS identity belongs in a separate diagnostic field, not the canonical pre-image. No code change.

Pins exact digests/bytes for the determinism-critical envelope surfaces so a NumPy/pickle/canonical-JSON change that silently shifts the stored byte layout fails loudly instead of passing green: - compute_post_state_hash (platform tag pinned via the override, so the vector is host-independent) - strategy_content_digest, content-addressed root_hash, full-mode snapshot_bytes - rng_state_out(default_rng(SEED)) — the pickle-of-numpy-bit-generator path the other vectors don't reach (they take rng_state_out as opaque bytes), plus a round-trip-to-seed-state check Records the blessing NumPy version (2.4.4) for self-explaining failure messages rather than hard-gating it, and carries a regeneration block + a do-not-re-bless discipline note. Verified non-tautological: tampering the platform tag fails the assertion.

…art (S3-A) Adds elume.envelope.replay(operation, scenario, n=2) -> ReplayResult(matched, hashes, outputs), re-exported from elume.envelope. Runs an op n times on one input and reports whether every post_state_hash matches — the core replay-safety contract, previously hand-rolled as a double-run assert in the quickstart. - operation accepts a registered op name (resolved) or an Operation object - ReplayResult also carries outputs so callers inspect results without re-running - n < 2 raises ValueError; a non-operation raises TypeError - README + docs/quickstart.md now call replay() instead of resolve()+run()+run(); quickstart code smoke-tested (replay matched: True) This is S3-A; the pre-image segment diff tool (Unit B) is a separate follow-on.

…S6-P1) The loop spawned len(population)-elite_k children per generation, growing ~2x/gen with no bound (measured ~131k strategies in 141s at 15 generations -- a real OOM hazard). Adds an optional max_population to evolve_one_generation (threaded through AutoStrategyEvolver) that reduces children spawned as the population nears the ceiling and lands exactly on it, then plateaus. Crucially this does NOT delete from the provider, so the immutable-past / no-dangling-parents invariant holds (covered by a test). max_population<1 raises; None (default) is uncapped and byte-identical to prior behavior. This is S6 Phase 1. Phase 2 (true truncation with a content-addressed lineage store + transitive-ancestor retention) remains a separate, deferred task.

… duplicate math (S1) elume.linoss.solver carried a verbatim duplicate of the IM/IMEX oscillator math that linoss-dynamics already owns (declared as a dependency but never imported) — two copies of the physics that could silently drift. solver.py now imports linoss-dynamics and delegates the actual state step + energy diagnostics to it. Elume keeps only the integration layer: input validation, the paper's ReLU-A clamp (the runtime does not clamp), vector-dt support (the runtime is scalar-dt only, so the diagonal system is stepped component-wise), and the 10-key metrics contract. Behavior is preserved byte-for-byte (all prior solver tests pass). - NEW tests/contract/test_linoss_runtime_parity.py: pins elume == linoss-dynamics byte-parity on the shared domain (non-neg A, scalar dt, all forcing shapes), plus the two adapter value-adds (ReLU-A clamp, vector dt). Guards against drift. - RENAME test_linoss_solver_parity.py -> _characterization.py: it never tested runtime parity (it pins Elume's metrics/energy contract); the name was a misnomer. - AGENTS.md + benchmark-report finding #1 updated to reflect real delegation. Resolves the phantom-dependency / no-delegation gap (report P0-3).

…tches (S3-B) A failed replay used to yield an opaque 64-char divergence with no way to see why. This adds segment-level diffing: - hashing.py: compute_post_state_hash now builds named pre-image segments (platform, schema_version, scenario_operation, result, result_arrays, rng_state_out, provider_snapshot, provider_snapshot_arrays). The hash is byte-identical to before (golden vectors unchanged) — it is just the concatenation of the same segments. - build_preimage() returns per-segment BLAKE2b digests; diff_preimage(a, b) reports which named segment(s) diverged (or matched). - EnvelopeOutput now carries provider_snapshot_out (frozen, default None) so a run's pre-image can be rebuilt from (input, output) without re-running. - preimage_from_envelope(input, output) pairs with replay(): build a pre-image from each mismatching run and diff them. All re-exported from elume.envelope. Verified: rebuilt pre-image hashes to the recorded post_state_hash; diffs localize platform/result/array/rng/snapshot divergence to the right segment (cross-platform branch tested via the platform override). This is S3-B; byte-offset granularity within a segment is out of scope.

The S1 commit (6c1e24b) was meant to include the solver.py delegation refactor plus the AGENTS.md and benchmark-report notes, but an atomic 'git add' failure (a stale renamed pathspec) silently dropped them, so 6c1e24b landed with only the new parity test and the test rename. This commit adds the actual delegation of the oscillator step to linoss-dynamics and the accompanying doc updates. End state is unchanged from what was tested (1271 passing, ruff clean).

ecc-tools · 2026-06-06T17:08:50Z

Analyzing 200 commits...

ecc-tools · 2026-06-06T17:09:37Z

Analysis Complete

Generated ECC bundle from 14 commits | Confidence: 60%

View Pull Request #12

Repository Profile

Attribute	Value
Language	Python
Framework	Not detected
Commit Convention	conventional
Test Directory	`separate`

Changed Files (78)

Metric	Value
Files changed	78
Additions	9597
Deletions	317

Top hotspots

Path	Status	+/-
`reference_service/src/reference_service/visualization.py`	added	+1490 / -0
`docs/elume-component-benchmark-report.md`	added	+802 / -0
`benchmarks/bench_cognition.py`	added	+652 / -0
`benchmarks/bench_models.py`	added	+596 / -0
`benchmarks/bench_envelope.py`	added	+583 / -0

Top directories

Directory	Files	Total changes
`benchmarks`	10	4731
`reference_service/src/reference_service`	2	1549
`docs`	5	1458
`tests/unit`	10	335
`tests/unit/envelope`	4	296

Analysis Depth Readiness (evidence-backed, 57%)

ECC Tools uses this to decide whether recommendations should stay at commit-history/setup guidance or expand into CI, security, harness, reference-set, AI-routing, and team backlog work.

Area	Status	Evidence / Next Step
Commit history	Ready	`14 commits sampled`
CI/CD signals	Missing	Add workflow files or CI troubleshooting evidence so ECC Tools can reason about pipeline setup.
Security evidence	Missing	Add AgentShield, audit, SARIF, SBOM, or security review evidence so recommendations can cover security posture.
Harness configuration	Ready	`tests/unit/adapters/memevolve/test_ingest.py`, `tests/unit/adapters/memevolve/test_shaping.py`
Reference/eval evidence	Ready	`benchmarks/bench_basins.py`, `benchmarks/bench_cognition.py`, `benchmarks/bench_embedders.py`
AI routing and cost controls	Ready	`.planning/20260606-ideation-open-ended.md`, `docs/plans/archon-adoption-phase-1.md`, `docs/plans/phase-2-handoff.md`
Team handoff and project tracking	Missing	Add roadmap, runbook, project, Linear, or follow-up tracking docs so generated work can land in a team queue.

Reference Set Readiness (2/7, 29%)

Area	Status	Evidence / Next Step
Deep analyzer corpus	Missing	Add analyzer fixture, golden, benchmark, or reference-set files that can catch analyzer regressions.
RAG/evaluator comparison	Present	`benchmarks/bench_basins.py`, `benchmarks/bench_cognition.py`, `benchmarks/bench_embedders.py`
PR salvage/review corpus	Missing	Add stale-PR, review-thread, reopen-flow, or salvage reference cases for queue cleanup automation.
Discussion triage corpus	Missing	Add public discussion triage fixtures, golden cases, or reference sets for informational, answered, and no-response classifications.
Harness compatibility	Present	`tests/unit/adapters/memevolve/test_ingest.py`, `tests/unit/adapters/memevolve/test_shaping.py`
Security evidence	Missing	Attach security evidence such as SBOMs, SARIF, audit reports, or AgentShield evidence packs.
CI failure-mode evidence	Missing	Add captured CI failure logs, dry-run fixtures, or troubleshooting docs for common workflow failure modes.

Likely Future Issues (1)

Severity	Signal	Why it may show up
HIGH	Schema or model changes may ship without migration follow-up	4 schema/model paths changed; 0 migration files changed

Schema or model changes may ship without migration follow-up: The PR changes schema or model-facing files but does not include any obvious migration artifact.

Suggested Follow-up Work (1)

Type	Suggested title	Targets
PR	`db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py`	`src/elume/models/basin.py`, `src/elume/models/belief.py`

db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py: Backfill the missing migration artifact before another schema or model change lands on top.

Copy-ready bodies

db: add migration follow-up for src/elume/models/basin.py + src/elume/models/belief.py

## Summary
- Add the missing migration or schema rollout step for the recently changed schema surface.

## Why
- Backfill the missing migration artifact before another schema or model change lands on top.

## Touched paths
- `src/elume/models/basin.py`
- `src/elume/models/belief.py`

## Validation
- Create the migration or schema rollout artifact used by this repo.
- Run the repo migration / schema validation flow and verify the changed models still match production expectations.

Detected Workflows (5)

Workflow	Description
feature-development	Standard feature implementation workflow
feature-development-with-tests-and-docs	Implements a new feature or capability, accompanied by updates to documentation and relevant tests.
documentation-reconciliation-or-expansion	Updates, reconciles, or expands documentation across multiple docs/ files and/or the main README.
test-suite-expansion-or-golden-vector-addition	Adds new tests, expands coverage, or introduces golden vector/corpus tests for determinism and regression.
refactor-or-internal-api-change-with-contract-tests	Refactors internal logic or delegates implementation to a dependency, ensuring contract tests pin behavior.

Generated Instincts (28)

Domain	Count
git	4
code-style	9
testing	6
workflow	9

After merging, import with:

/instinct-import .claude/homunculus/instincts/inherited/elume-instincts.yaml

Files

.claude/ecc-tools.json
.claude/skills/elume/SKILL.md
.agents/skills/elume/SKILL.md
.agents/skills/elume/agents/openai.yaml
.claude/identity.json
.codex/config.toml
.codex/AGENTS.md
.codex/agents/explorer.toml
.codex/agents/reviewer.toml
.codex/agents/docs-researcher.toml
.claude/homunculus/instincts/inherited/elume-instincts.yaml
.claude/commands/feature-development.md
.claude/commands/feature-development-with-tests-and-docs.md
.claude/commands/documentation-reconciliation-or-expansion.md

_{ECC Tools | Everything Claude Code}

bionicbutterfly13 added 14 commits May 20, 2026 23:12

Add realtime reference service visualization panel

5e7c75d

chore: gitignore .serena tool cache

c425dac

chore: post-audit gap closing — tighten record immutability, expand c…

f924ec7

…omponent tests - Recursive genotype freezing (FrozenDict) on Strategy - Simplify the in-memory provider - Expand unit/contract/integration coverage across components - Remove the placeholder contract test

docs: reconcile component and readiness docs

422a5c5

test(benchmarks): add per-component promise-and-benchmark scripts

a3c6f75

docs: add component benchmark audit report and belief-state explainer

fe1d32b

docs: ideation artifact (open-ended, 7 survivors)

7efae11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Component audit snapshot + 6 ideation follow-ups (replay tooling, golden vectors, linoss delegation, population cap)#11

Component audit snapshot + 6 ideation follow-ups (replay tooling, golden vectors, linoss delegation, population cap)#11
bionicbutterfly13 wants to merge 14 commits into
mainfrom
chore/audit-snapshot-and-followups

bionicbutterfly13 commented Jun 6, 2026

Uh oh!

ecc-tools Bot commented Jun 6, 2026

Uh oh!

ecc-tools Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bionicbutterfly13 commented Jun 6, 2026

Summary

Ideation follow-ups

Notes for reviewers

Deferred (not in this PR)

Test status

Uh oh!

ecc-tools Bot commented Jun 6, 2026

Uh oh!

ecc-tools Bot commented Jun 6, 2026

Analysis Complete

View Pull Request #12

Files

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant