Add post-mono MIR optimizations by cjgillot · Pull Request #156858 · rust-lang/rust

cjgillot · 2026-05-23T17:34:09Z

This is mostly a rebase of #131650 by @saethlin.

MIR optimizations are limited since they run on polymorphic code. They cannot know of all types nor of their layout.

To work around this limitation @saethlin added a MIR traversal which monomorphizes during traversal (#121421). We also already have a pass #139088 which is explicitly waiting for post-mono MIR passes to happen.

This PR creates a build_codegen_mir query. That query has a peculiar Steal<Cow<'tcx, Body<'tcx>>> return type. This allows reusing optimized_mir when the body is already monomorphic, and also to free memory when we need to clone it. With this device we still have a sizeable max-rss regression.

All this allows to remove just-in-time monomorphization from codegen code. Follow-up PRs can try migrating transforms that happen at codegen time to a post-mono MIR pass.

cjgillot · 2026-05-23T17:34:30Z

@bors try @rust-timer queue

Add post-mono MIR optimizations

rust-bors · 2026-05-23T19:50:17Z

☀️ Try build successful (CI)
Build commit: c40ae76 (c40ae76fdfbb0d687aafd24fdcd2354ede04422c, parent: 54333ff079780f803f65dcee30c544050b35f544)

rust-timer · 2026-05-23T20:31:25Z

Finished benchmarking commit (c40ae76): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.1%, 3.4%]	80
Regressions ❌ (secondary)	0.7%	[0.1%, 3.1%]	70
Improvements ✅ (primary)	-0.4%	[-0.7%, -0.3%]	4
Improvements ✅ (secondary)	-0.4%	[-1.4%, -0.0%]	6
All ❌✅ (primary)	0.8%	[-0.7%, 3.4%]	84

Max RSS (memory usage)

Results (primary 11.7%, secondary 2.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	12.1%	[0.8%, 53.9%]	48
Regressions ❌ (secondary)	3.3%	[0.7%, 6.7%]	18
Improvements ✅ (primary)	-8.1%	[-8.1%, -8.1%]	1
Improvements ✅ (secondary)	-5.8%	[-5.8%, -5.8%]	1
All ❌✅ (primary)	11.7%	[-8.1%, 53.9%]	49

Cycles

Results (primary 2.7%, secondary 2.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.6%	[1.8%, 5.5%]	15
Regressions ❌ (secondary)	3.4%	[1.9%, 4.4%]	10
Improvements ✅ (primary)	-4.1%	[-4.6%, -3.6%]	2
Improvements ✅ (secondary)	-3.3%	[-4.5%, -2.1%]	2
All ❌✅ (primary)	2.7%	[-4.6%, 5.5%]	17

Binary size

Results (primary -0.2%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.6%]	15
Regressions ❌ (secondary)	0.3%	[0.0%, 0.5%]	5
Improvements ✅ (primary)	-0.3%	[-3.0%, -0.0%]	46
Improvements ✅ (secondary)	-0.1%	[-0.4%, -0.0%]	34
All ❌✅ (primary)	-0.2%	[-3.0%, 0.6%]	61

Bootstrap: 510.282s -> 521.256s (2.15%)
Artifact size: 400.55 MiB -> 398.41 MiB (-0.53%)

cjgillot · 2026-05-23T22:00:56Z

@bors try @rust-timer queue

Add post-mono MIR optimizations

rust-bors · 2026-05-24T00:10:55Z

☀️ Try build successful (CI)
Build commit: 0a3713a (0a3713a7df23eb1f82606bf484689d5bf5886931, parent: 54333ff079780f803f65dcee30c544050b35f544)

rust-timer · 2026-05-24T00:53:31Z

Finished benchmarking commit (0a3713a): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.2%, 16.5%]	27
Regressions ❌ (secondary)	0.3%	[0.1%, 0.7%]	16
Improvements ✅ (primary)	-0.4%	[-0.8%, -0.1%]	17
Improvements ✅ (secondary)	-0.4%	[-0.7%, -0.0%]	8
All ❌✅ (primary)	0.5%	[-0.8%, 16.5%]	44

Max RSS (memory usage)

Results (primary 7.3%, secondary 1.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	8.4%	[1.0%, 36.4%]	35
Regressions ❌ (secondary)	2.5%	[0.8%, 4.2%]	8
Improvements ✅ (primary)	-2.5%	[-7.8%, -0.7%]	4
Improvements ✅ (secondary)	-2.7%	[-2.7%, -2.7%]	1
All ❌✅ (primary)	7.3%	[-7.8%, 36.4%]	39

Cycles

Results (primary 2.0%, secondary 4.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	5.4%	[1.7%, 17.4%]	5
Regressions ❌ (secondary)	4.8%	[3.5%, 7.0%]	5
Improvements ✅ (primary)	-3.5%	[-4.6%, -2.3%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.0%	[-4.6%, 17.4%]	8

Binary size

Results (primary -0.2%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.6%]	16
Regressions ❌ (secondary)	0.3%	[0.0%, 0.5%]	5
Improvements ✅ (primary)	-0.3%	[-3.0%, -0.0%]	46
Improvements ✅ (secondary)	-0.1%	[-0.4%, -0.0%]	34
All ❌✅ (primary)	-0.2%	[-3.0%, 0.6%]	62

Bootstrap: 510.282s -> 514.101s (0.75%)
Artifact size: 400.55 MiB -> 398.40 MiB (-0.54%)

cjgillot · 2026-05-24T08:33:30Z

@bors try @rust-timer queue

Kobzol · 2026-05-25T13:51:29Z

rustc-perf supports dhat, massif and bytehound (https://github.com/rust-lang/rustc-perf/tree/master/collector#preparation).

RalfJung · 2026-05-26T06:54:25Z

To work around this limitation @saethlin added a MIR traversal which monomorphizes one the run (#121421).

I can't parse this sentence, could you fix the grammar please? :)

This PR creates a build_codegen_mir query. That query has a peculiar Steal<Cow<'tcx, Body<'tcx>>> return type. This allows reusing optimized_mir when the body is already monomorphic, and also to free memory when we need to clone it. With this device we still have a sizeable max-rss regression.

At some point in the future it could be interesting to use this query in Miri, so that we don't monomorphize the same function / MIR block over and over again.

rustbot · 2026-06-12T00:05:30Z

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

rust-bors · 2026-06-12T06:58:16Z

☔ The latest upstream changes (presumably #157794) made this pull request unmergeable. Please resolve the merge conflicts.