fix(verify-gate): keep TEST-PROOF-* on sorry-grep — lake build exceeds gate budget#230
Merged
Merged
Conversation
…s gate budget PR #227 reverted TEST-PROOF-* steps to `cd proofs && lake build` on the assumption smithy's ELAN_HOME deploy made it possible. It does work — `lake` runs cleanly — but the lake build against Mathlib on a cold elan cache routinely exceeds 30 minutes. PR #229 bumped the gate budget to 60min and the FIRST PR after #229 ALSO hit the new ceiling. The dedicated "Lean proof typecheck (lake build)" CI workflow has `actions/cache@v4` on `proofs/.lake/build`, so its run completes in seconds on warm cache. The rivet verification gate doesn't cache that path, so it pays the full Mathlib cold-compile cost every run. Right structural fix: the verify-gate's purpose is "surface drift between YAML claims and reality". A sorry-free grep on the proof file IS a meaningful drift check: it catches "agent accidentally left a sorry". Full type-checking is the dedicated workflow's job — defense in depth, separation of concerns. Reverts to the sorry-grep + file-exists pattern from before #227. Future work: add lake cache to verify-gate so this restriction lifts. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rivet verification gate✅ 19/19 passed
Filter: Failed artifacts(none) Updated automatically by |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
3 tasks
avrabe
added a commit
that referenced
this pull request
May 20, 2026
v0.10.0 ships: **Mermaid emission (M1 + M2 + M3)** - spar-mermaid foundation crate with `emit_flowchart` (#220) - `spar emit --format mermaid` CLI subcommand (#222) - `emit_class_diagram` + `emit_requirement_diagram` + matching CLI flags `--format mermaid-class` / `mermaid-req` (#228) **Soundness deepening** - Lean 4 sorry-free proofs of end-to-end latency monotonicity and ARINC 653 partition isolation, alongside the pre-existing RTA / EDF / Network Calculus proofs (#223) - Kani BMC harnesses on generated-code AADL contract preservation (thread Period, port Direction, bus access right) — spar's Logika-equivalent strategy for verified codegen (#224) **Safety analysis** - EMV2 error-propagation traversal across the AADL connection graph (closes the #1 gap vs OSATE/HAMR in safety-case reviews) (#225) **Verification infrastructure** - Rivet-driven verification gate that executes every artifact's `fields.steps[].run` commands and posts a sticky PR comment with pass/fail counts and failed artifact IDs (#221) - Workflow tuning: gate timeout 30→60 min for future Mathlib-heavy workloads; TEST-PROOF-* stay on sorry-grep until lake cache lands (#227, #229, #230) **Chore** - Pruned stale dev artifacts (.playwright-mcp logs + dashboard-render PNGs) and tightened gitignore (#226) Bumps Cargo.toml + vscode-spar/package.json from 0.9.3 → 0.10.0. The release workflow's `check-versions` job enforces tag/Cargo/vsix agreement, so these must move together with the v0.10.0 tag push. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reverts #227's
lake buildstep back to sorry-grep + file-exists. The verify-gate doesn't cache Mathlib build artifacts, so cold-compile takes >60min — past even #229's bumped timeout.Why this changed mid-cycle
cd proofs && lake buildas the verification step.ELAN_HOMEto runner env.lake buildnow that Lean works.proofs/.lake/build; the verify-gate doesn't.Right structural decision
The verify-gate's job is drift detection, not exhaustive verification:
! grep -E '^[[:space:]]*sorry[[:space:]]*(--.*)?$' Latency.leancatches "agent accidentally left a sorry" — the most likely failure mode for a proof artifact.Two checks on the same source, two budgets, one shared trust boundary. The verify-gate is the fast layer; the dedicated workflow is the slow layer.
Future work
A
actions/cache@v4step onproofs/.lake/build(with key based onproofs/lakefile.toml+proofs/lean-toolchainhashes) inside the verify-gate workflow would let us restorelake buildcleanly. Smithy team has the bigger picture on whether caching is added at workflow level or at sccache-style runner level.Test plan
rivet validatecleanCo-Authored-By: Claude Opus 4.7 noreply@anthropic.com