Skip to content

perf: persist grind preprocessing caches across calls#12657

Draft
sgraf812 wants to merge 1 commit intomasterfrom
sg/grind-persistent-preprocess-caches
Draft

perf: persist grind preprocessing caches across calls#12657
sgraf812 wants to merge 1 commit intomasterfrom
sg/grind-persistent-preprocess-caches

Conversation

@sgraf812
Copy link
Contributor

Summary

This PR persists per-call traversal caches in markNestedSubsingletons, canonImpl, and unfoldReducible across grind preprocessing calls. Previously these steps created fresh HashMaps each invocation, re-traversing all shared subexpressions. For workloads that generate many facts with growing shared structure (e.g. nested Nat subtraction chains), total work was O(n²).

Changes:

  • Add persistent cache fields to Grind.State (markSubsingletonCache, unfoldReducibleCache) and Canon.State (visitCache)
  • markNestedSubsingletons / markProof: load/save traversal cache from State instead of starting fresh
  • canonImpl: load/save visit cache from Canon.State instead of starting fresh
  • unfoldReducibleCached: new helper using Meta.transformWithCache with persistent ExprStructEq cache, replacing Sym.unfoldReducible in both preprocessImpl and preprocessLight
  • Add strategic shareCommon calls in preprocessImpl to restore pointer identity before cached steps
  • Use unfoldReducibleCached inside markNestedSubsingletons.preprocess (which previously called Sym.unfoldReducible with a fresh cache)

Benchmark results

On nested (+ s₂ - s₂) chain at n=100:

Metric Before After Reduction
grind canon ~95ms ~6ms 94%
grind mark subsingleton ~73ms ~19ms 74%

Scaling analysis

Profiling across problem sizes reveals that the grind core (internalization, congruence closure) still dominates at larger n and scales poorly:

n grind total simp mark sub canon
100 245ms 185ms 19ms 6ms
200 1.5s 476ms 66ms 17ms
400 13.6s 1.36s 254ms 53ms
600 74s 2.73s 558ms 108ms
1000 >120s - - -

Preprocessing (simp + mark sub + canon) totals ~3.4s at n=600 and scales roughly as O(n^1.5). The grind core accounts for 70s of the 74s total at n=600, scaling approximately as O(n^4). This is a separate issue outside preprocessing.

Test plan

  • All 375 grind tests pass
  • New scaling regression test grind_add_sub_cancel_fvar.lean checks that 4x problem size increase (n=25 → n=100) results in < 10x time increase

🤖 Generated with Claude Code

This PR persists per-call traversal caches in `markNestedSubsingletons`,
`canonImpl`, and `unfoldReducible` across grind preprocessing calls. Previously
these steps created fresh HashMaps each invocation, re-traversing all shared
subexpressions. For workloads that generate many facts with growing shared
structure (e.g. nested Nat subtraction chains), total work was O(n²). Persisting
the caches and adding strategic `shareCommon` calls to restore pointer identity
reduces `grind canon` time by ~94% and `grind mark subsingleton` time by ~74%
on the included benchmark at n=100.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sgraf812 sgraf812 requested a review from leodemoura as a code owner February 23, 2026 18:49
@sgraf812 sgraf812 added the changelog-tactics User facing tactics label Feb 23, 2026
@sgraf812 sgraf812 marked this pull request as draft February 23, 2026 18:50
@nomeata
Copy link
Collaborator

nomeata commented Feb 23, 2026

!bench

@leanprover-radar
Copy link

leanprover-radar commented Feb 23, 2026

Benchmark results for f81340c against 8f80881 are in! @nomeata

  • build//instructions: -7.5G (-0.06%)

Large changes (3✅)

  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddResult//instructions: -5.4G (-10.00%)
  • grind_bitvec2.lean//instructions: -10.8G (-5.37%)
  • grind_list2.lean//instructions: -2.3G (-3.97%)

Medium changes (1✅)

  • cbv tactic (System F normalization)//instructions: -3.4G (-2.41%)

Small changes (11✅, 7🟥)

  • build/module/Init.Data.Char.Ordinal//instructions: -127.7M (-1.64%)
  • build/module/Init.Data.Range.Polymorphic.Fin//instructions: -120.7M (-3.60%) (reduced significance based on absolute threshold)
  • build/module/Init.Data.Vector.Algebra//instructions: -255.9M (-3.04%) (reduced significance based on absolute threshold)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Canon//instructions: +124.1M (+1.05%) (reduced significance based on *//lines)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Main//instructions: +36.0M (+0.28%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean: +7kiB (+11.71%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean.private: +300kiB (+87.51%) (reduced significance based on *//lines)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//bytes .olean.server: +464B (+14.80%)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.MarkNestedSubsingletons//instructions: +1.8G (+42.03%) (reduced significance based on *//lines)
  • 🟥 build/module/Lean.Meta.Tactic.Grind.Simp//instructions: +236.3M (+5.65%) (reduced significance based on *//lines)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.CNF//instructions: -68.3M (-3.88%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Clause//instructions: -231.3M (-3.68%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.CompactLRATCheckerSound//instructions: -23.6M (-1.51%)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.Lemmas//instructions: -926.7M (-4.77%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RatAddResult//instructions: -733.7M (-5.64%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RatAddSound//instructions: -589.4M (-2.83%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.Formula.RupAddSound//instructions: -654.6M (-3.65%) (reduced significance based on absolute threshold)
  • build/module/Std.Tactic.BVDecide.LRAT.Internal.LRATCheckerSound//instructions: -23.5M (-0.71%)

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Feb 23, 2026
@mathlib-lean-pr-testing
Copy link

Mathlib CI status (docs):

  • ❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase 8f80881c2f7611c61d0228dac94b27582b44d5d0 --onto e7e3588c973226e85d79ed0f3154cfca79b61c6f. You can force Mathlib CI using the force-mathlib-ci label. (2026-02-23 19:52:32)

@leanprover-bot
Copy link
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI will not be attempted unless your PR branches off the nightly-with-manual branch. Try git rebase 8f80881c2f7611c61d0228dac94b27582b44d5d0 --onto 8038a8b8904f89ad9542c8eda11379f8f006eab1. You can force reference manual CI using the force-manual-ci label. (2026-02-23 19:52:34)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-tactics User facing tactics toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants