perf: persist grind preprocessing caches across calls by sgraf812 · Pull Request #12657 · leanprover/lean4

sgraf812 · 2026-02-23T18:49:46Z

Summary

This PR persists per-call traversal caches in markNestedSubsingletons, canonImpl, and unfoldReducible across grind preprocessing calls. Previously these steps created fresh HashMaps each invocation, re-traversing all shared subexpressions. For workloads that generate many facts with growing shared structure (e.g. nested Nat subtraction chains), total work was O(n²).

Changes:

Add persistent cache fields to Grind.State (markSubsingletonCache, unfoldReducibleCache) and Canon.State (visitCache)
markNestedSubsingletons / markProof: load/save traversal cache from State instead of starting fresh
canonImpl: load/save visit cache from Canon.State instead of starting fresh
unfoldReducibleCached: new helper using Meta.transformWithCache with persistent ExprStructEq cache, replacing Sym.unfoldReducible in both preprocessImpl and preprocessLight
Add strategic shareCommon calls in preprocessImpl to restore pointer identity before cached steps
Use unfoldReducibleCached inside markNestedSubsingletons.preprocess (which previously called Sym.unfoldReducible with a fresh cache)

Benchmark results

On nested (+ s₂ - s₂) chain at n=100:

Metric	Before	After	Reduction
`grind canon`	~95ms	~6ms	94%
`grind mark subsingleton`	~73ms	~19ms	74%

Scaling analysis

Profiling across problem sizes reveals that the grind core (internalization, congruence closure) still dominates at larger n and scales poorly:

n	grind total	simp	mark sub	canon
100	245ms	185ms	19ms	6ms
200	1.5s	476ms	66ms	17ms
400	13.6s	1.36s	254ms	53ms
600	74s	2.73s	558ms	108ms
1000	>120s	-	-	-

Preprocessing (simp + mark sub + canon) totals ~3.4s at n=600 and scales roughly as O(n^1.5). The grind core accounts for 70s of the 74s total at n=600, scaling approximately as O(n^4). This is a separate issue outside preprocessing.

Test plan

All 375 grind tests pass
New scaling regression test grind_add_sub_cancel_fvar.lean checks that 4x problem size increase (n=25 → n=100) results in < 10x time increase

🤖 Generated with Claude Code

This PR persists per-call traversal caches in `markNestedSubsingletons`, `canonImpl`, and `unfoldReducible` across grind preprocessing calls. Previously these steps created fresh HashMaps each invocation, re-traversing all shared subexpressions. For workloads that generate many facts with growing shared structure (e.g. nested Nat subtraction chains), total work was O(n²). Persisting the caches and adding strategic `shareCommon` calls to restore pointer identity reduces `grind canon` time by ~94% and `grind mark subsingleton` time by ~74% on the included benchmark at n=100. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

nomeata · 2026-02-23T19:02:12Z

!bench

leanprover-radar · 2026-02-23T19:02:19Z

Benchmark results for f81340c against 8f80881 are in! @nomeata

✅ build//instructions: -7.5G (-0.06%)

Large changes (3✅)

✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddResult//instructions: -5.4G (-10.00%)
✅ grind_bitvec2.lean//instructions: -10.8G (-5.37%)
✅ grind_list2.lean//instructions: -2.3G (-3.97%)

Medium changes (1✅)

✅ cbv tactic (System F normalization)//instructions: -3.4G (-2.41%)

Small changes (11✅, 7🟥)

✅ build/module/Init.Data.Char.Ordinal//instructions: -127.7M (-1.64%)
✅ build/module/Init.Data.Range.Polymorphic.Fin//instructions: -120.7M (-3.60%) (reduced significance based on absolute threshold)
✅ build/module/Init.Data.Vector.Algebra//instructions: -255.9M (-3.04%) (reduced significance based on absolute threshold)
🟥 build/module/Lean.Meta.Tactic.Grind.Canon//instructions: +124.1M (+1.05%) (reduced significance based on *//lines)
🟥 build/module/Lean.Meta.Tactic.Grind.Main//instructions: +36.0M (+0.28%)
🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean: +7kiB (+11.71%)
🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean.private: +300kiB (+87.51%) (reduced significance based on *//lines)
🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean.server: +464B (+14.80%)
🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//instructions: +1.8G (+42.03%) (reduced significance based on *//lines)
🟥 build/module/Lean.Meta.Tactic.Grind.Simp//instructions: +236.3M (+5.65%) (reduced significance based on *//lines)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.CNF//instructions: -68.3M (-3.88%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Clause//instructions: -231.3M (-3.68%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.CompactLRATCheckerSound//instructions: -23.6M (-1.51%)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.Lemmas//instructions: -926.7M (-4.77%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RatAddResult//instructions: -733.7M (-5.64%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RatAddSound//instructions: -589.4M (-2.83%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddSound//instructions: -654.6M (-3.65%) (reduced significance based on absolute threshold)
✅ build/module/Std.Tactic.BVDecide.LRAT.Internal.LRATCheckerSound//instructions: -23.5M (-0.71%)

mathlib-lean-pr-testing · 2026-02-23T19:52:32Z

Mathlib CI status (docs):

❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 8f80881c2f7611c61d0228dac94b27582b44d5d0 --onto e7e3588c973226e85d79ed0f3154cfca79b61c6f. You can force Mathlib CI using the force-mathlib-ci label. (2026-02-23 19:52:32)

leanprover-bot · 2026-02-23T19:52:34Z

Reference manual CI status:

❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 8f80881c2f7611c61d0228dac94b27582b44d5d0 --onto 8038a8b8904f89ad9542c8eda11379f8f006eab1. You can force reference manual CI using the force-manual-ci label. (2026-02-23 19:52:34)

sgraf812 requested a review from leodemoura as a code owner February 23, 2026 18:49

sgraf812 added the changelog-tactics User facing tactics label Feb 23, 2026

sgraf812 marked this pull request as draft February 23, 2026 18:50

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Feb 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: persist grind preprocessing caches across calls#12657

perf: persist grind preprocessing caches across calls#12657
sgraf812 wants to merge 1 commit intomasterfrom
sg/grind-persistent-preprocess-caches

sgraf812 commented Feb 23, 2026

Uh oh!

nomeata commented Feb 23, 2026

Uh oh!

leanprover-radar commented Feb 23, 2026 •

edited

Loading

Uh oh!

mathlib-lean-pr-testing bot commented Feb 23, 2026

Uh oh!

leanprover-bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sgraf812 commented Feb 23, 2026

Summary

Benchmark results

Scaling analysis

Test plan

Uh oh!

nomeata commented Feb 23, 2026

Uh oh!

leanprover-radar commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathlib-lean-pr-testing bot commented Feb 23, 2026

Uh oh!

leanprover-bot commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

leanprover-radar commented Feb 23, 2026 •

edited

Loading