Skip to content

Speed up clippy by not calling empty check_foo methods.#157762

Draft
nnethercote wants to merge 10 commits into
rust-lang:mainfrom
nnethercote:runtime_lint_pass
Draft

Speed up clippy by not calling empty check_foo methods.#157762
nnethercote wants to merge 10 commits into
rust-lang:mainfrom
nnethercote:runtime_lint_pass

Conversation

@nnethercote

@nnethercote nnethercote commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Most clippy lints are put into RuntimeCombinedLateLintPass. It's defined via a macro which expands out to something like this:

impl<'tcx> LateLintPass<'tcx> for RuntimeCombinedLateLintPass<'tcx> {
    check_item(&mut self, context: &LateContext<'tcx>, i: &'tcx Item<'tcx>) {
        for pass in self.passes.iter_mut() {
            pass.check_item(context, i);
        }
    }

    ... // and another 31 similar methods
}

A function like the above check_item is called (via dynamic dispatch) for every HIR node. In a default Clippy run there are 230 late lints enabled. Which means the loop executes 230 times, each time doing a call.

However, most lints only implement a small fraction of the 32 checking methods; many of them only implement one of them. Which means that most of the calls in the loop are to functions that do nothing.

(It's a similar story for early lints, but there are only ~50 of them.)

This commit changes things so that the unnecessary calls are avoided.

  • First, RuntimeCombined{Early,Late}LintPass are changed so instead of having a single list of passes, it has one list per check_foo method.
  • Then, only passes that impl check_foo get added to the check_foo list. This means we avoid calling empty check_foo methods.
  • This requires knowing which of the empty default check_foo methods are overridden. This commit adds a check_foo_needed method that returns a boolean to indicate this; each one default to false.
  • Writing check_foo_needed methods by hand would be error-prone, so the commit adds a new attribute proc macro #[runtime_lint_pass] which detects when a check_foo is defined for a pass and adds the corresponding check_foo_needed method.
  • Also, because each pass can end up on multiple lists within RuntimeCombined{Early,Late}LintPass, {Early,Late}LintPassObject is changed from using Box to using Rc<RefCell>. RefCell has a small overhead but this is dwarfed by the wins from avoiding empty check_foo calls.

It's a lot of plumbing changes, but it results in some huge wins: in the best cases clippy's runtime is reduced by 30%.

By using `retain` instead of `into_iter`/`filter`/`collect`.
It doesn't define any `check_*` methods so there's no point adding it to
the passes list. (It's only there for `HardwiredLints::lint_vec`.) This
makes it more like `SoftLints`.
@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 11, 2026
Comment thread compiler/rustc_lint/Cargo.toml Outdated
All the other paired methods in this trait have the form
`check_foo`/`check_foo_post`.
Currently the use points need to handle method attributes, due to a
single low-value doc comment in each macro's body. This commit moves
those doc comments so the use points can be simplified. The comments are
also made more accurate -- there are now multiple `_post` methods and
the comments now cover all of them, not just one of them.
That reflects how they're mostly used and avoids the need for some long
signatures. And it matches `{Early,Late}LintPassObject` nicely. Also
make them public so that clippy can use them.
There's a lot of repetition here that can be avoided.
Most clippy lints are put into `RuntimeCombinedLateLintPass`. It's
defined via a macro which expands out to something like this:
```
impl<'tcx> LateLintPass<'tcx> for RuntimeCombinedLateLintPass<'tcx> {
    check_item(&mut self, context: &LateContext<'tcx>, i: &'tcx Item<'tcx>) {
        for pass in self.passes.iter_mut() {
            pass.check_item(context, i);
        }
    }

    ... // and another 31 similar methods
}
```
A function like the above `check_item` is called (via dynamic dispatch)
for every HIR node. In a default Clippy run there are 230 late lints
enabled. Which means the loop executes 230 times, each time doing a
call.

However, most lints only implement a small fraction of the 32 checking
methods; many of them only implement one of them. Which means that most
of the calls in the loop are to functions that do nothing.

(It's a similar story for early lints, but there are only ~50 of them.)

This commit changes things so that the unnecessary calls are avoided.
- First, `RuntimeCombined{Early,Late}LintPass` are changed so instead of
  having a single list of passes, it has one list per `check_foo`
  method.
- Then, only passes that impl `check_foo` get added to the `check_foo`
  list. This means we avoid calling empty `check_foo` methods.
- This requires knowing which of the empty default `check_foo` methods
  are overridden. This commit adds a `check_foo_needed` method that
  returns a boolean to indicate this; each one default to `false`.
- Writing `check_foo_needed` methods by hand would be error-prone, so
  the commit adds a new attribute proc macro `#[runtime_lint_pass]`
  which detects when a `check_foo` is defined for a pass and adds the
  corresponding `check_foo_needed` method.
- Also, because each pass can end up on multiple lists within
  `RuntimeCombined{Early,Late}LintPass`, `{Early,Late}LintPassObject` is
  changed from using `Box` to using `Rc<RefCell>`. `RefCell` has a small
  overhead but this is dwarfed by the wins from avoiding empty
  `check_foo` calls.

It's a lot of plumbing changes, but it results in some huge wins: in the
best cases clippy's runtime is reduced by 30%.
@nnethercote nnethercote changed the title runtime_lint_pass Speed up clippy by not calling empty check_foo methods. Jun 12, 2026
@nnethercote

Copy link
Copy Markdown
Contributor Author

clippy doesn't run on rustc-perf on CI, but I did some local measurements. First, instruction counts are great, with the best results at -15%!

image

@nnethercote

nnethercote commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Next, wall-time, where the results are even better. wall-time is pretty noisy on my machine so you can't trust any single number too much, but the trend is clear, with multiple results exceeding -30%:

image

@nnethercote

nnethercote commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Finally, why is wall-time so much better than instruction counts? There must be something microarchitectural happening. And there is... here are the branch-misses (misprediction) results, with reductions of -97% in the best case!

image

This is because the PR avoids so many dynamically dispatched method calls, which involve indirect branches for the vtable lookups.

@nnethercote

Copy link
Copy Markdown
Contributor Author

cc @blyxyas @ada4a

@nnethercote

Copy link
Copy Markdown
Contributor Author

Non-clippy runs shouldn't have their perf affected. Let's check:

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 12, 2026
rust-bors Bot pushed a commit that referenced this pull request Jun 12, 2026
Speed up clippy by not calling empty `check_foo` methods.
@nnethercote nnethercote mentioned this pull request Jun 12, 2026
@nnethercote

Copy link
Copy Markdown
Contributor Author

This PR currently overlaps with #157689. My plan is for that PR to merge first, doing all the preliminary cleanups, leaving just the final (and main) commit in this PR.

@rust-bors

rust-bors Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: fa5d275 (fa5d275fc9625c4ca4b4a39e6b52b91e932cae49, parent: 09a371361240e42b0d69438fd1179efcf212e576)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (fa5d275): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up.

@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.2%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results (primary -0.8%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-0.8%, -0.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.8% [-0.8%, -0.8%] 1

Cycles

Results (primary -2.5%, secondary -3.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-2.5%, -2.5%] 1
Improvements ✅
(secondary)
-3.0% [-3.0%, -3.0%] 1
All ❌✅ (primary) -2.5% [-2.5%, -2.5%] 1

Binary size

This perf run didn't have relevant results for this metric.

Bootstrap: 517.481s -> 517.738s (0.05%)
Artifact size: 401.31 MiB -> 400.84 MiB (-0.12%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 12, 2026
@rust-bors

rust-bors Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

☔ The latest upstream changes (presumably #157779) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-clippy Relevant to the Clippy team. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants