Speed up clippy by not calling empty check_foo methods.#157762
Speed up clippy by not calling empty check_foo methods.#157762nnethercote wants to merge 10 commits into
check_foo methods.#157762Conversation
These structs can own the `Vec`.
By using `retain` instead of `into_iter`/`filter`/`collect`.
It doesn't define any `check_*` methods so there's no point adding it to the passes list. (It's only there for `HardwiredLints::lint_vec`.) This makes it more like `SoftLints`.
All the other paired methods in this trait have the form `check_foo`/`check_foo_post`.
a14a893 to
4b8ebe2
Compare
Currently the use points need to handle method attributes, due to a single low-value doc comment in each macro's body. This commit moves those doc comments so the use points can be simplified. The comments are also made more accurate -- there are now multiple `_post` methods and the comments now cover all of them, not just one of them.
That reflects how they're mostly used and avoids the need for some long
signatures. And it matches `{Early,Late}LintPassObject` nicely. Also
make them public so that clippy can use them.
There's a lot of repetition here that can be avoided.
Most clippy lints are put into `RuntimeCombinedLateLintPass`. It's
defined via a macro which expands out to something like this:
```
impl<'tcx> LateLintPass<'tcx> for RuntimeCombinedLateLintPass<'tcx> {
check_item(&mut self, context: &LateContext<'tcx>, i: &'tcx Item<'tcx>) {
for pass in self.passes.iter_mut() {
pass.check_item(context, i);
}
}
... // and another 31 similar methods
}
```
A function like the above `check_item` is called (via dynamic dispatch)
for every HIR node. In a default Clippy run there are 230 late lints
enabled. Which means the loop executes 230 times, each time doing a
call.
However, most lints only implement a small fraction of the 32 checking
methods; many of them only implement one of them. Which means that most
of the calls in the loop are to functions that do nothing.
(It's a similar story for early lints, but there are only ~50 of them.)
This commit changes things so that the unnecessary calls are avoided.
- First, `RuntimeCombined{Early,Late}LintPass` are changed so instead of
having a single list of passes, it has one list per `check_foo`
method.
- Then, only passes that impl `check_foo` get added to the `check_foo`
list. This means we avoid calling empty `check_foo` methods.
- This requires knowing which of the empty default `check_foo` methods
are overridden. This commit adds a `check_foo_needed` method that
returns a boolean to indicate this; each one default to `false`.
- Writing `check_foo_needed` methods by hand would be error-prone, so
the commit adds a new attribute proc macro `#[runtime_lint_pass]`
which detects when a `check_foo` is defined for a pass and adds the
corresponding `check_foo_needed` method.
- Also, because each pass can end up on multiple lists within
`RuntimeCombined{Early,Late}LintPass`, `{Early,Late}LintPassObject` is
changed from using `Box` to using `Rc<RefCell>`. `RefCell` has a small
overhead but this is dwarfed by the wins from avoiding empty
`check_foo` calls.
It's a lot of plumbing changes, but it results in some huge wins: in the
best cases clippy's runtime is reduced by 30%.
4b8ebe2 to
f3d9ca4
Compare
runtime_lint_passcheck_foo methods.
|
Non-clippy runs shouldn't have their perf affected. Let's check: @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Speed up clippy by not calling empty `check_foo` methods.
|
This PR currently overlaps with #157689. My plan is for that PR to merge first, doing all the preliminary cleanups, leaving just the final (and main) commit in this PR. |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (fa5d275): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking means the PR may be perf-sensitive. Consider adding rollup=never if this change is not fit for rolling up. @rustbot label: -S-waiting-on-perf -perf-regression Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.8%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.5%, secondary -3.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis perf run didn't have relevant results for this metric. Bootstrap: 517.481s -> 517.738s (0.05%) |
|
☔ The latest upstream changes (presumably #157779) made this pull request unmergeable. Please resolve the merge conflicts. |



Most clippy lints are put into
RuntimeCombinedLateLintPass. It's defined via a macro which expands out to something like this:A function like the above
check_itemis called (via dynamic dispatch) for every HIR node. In a default Clippy run there are 230 late lints enabled. Which means the loop executes 230 times, each time doing a call.However, most lints only implement a small fraction of the 32 checking methods; many of them only implement one of them. Which means that most of the calls in the loop are to functions that do nothing.
(It's a similar story for early lints, but there are only ~50 of them.)
This commit changes things so that the unnecessary calls are avoided.
RuntimeCombined{Early,Late}LintPassare changed so instead of having a single list of passes, it has one list percheck_foomethod.check_fooget added to thecheck_foolist. This means we avoid calling emptycheck_foomethods.check_foomethods are overridden. This commit adds acheck_foo_neededmethod that returns a boolean to indicate this; each one default tofalse.check_foo_neededmethods by hand would be error-prone, so the commit adds a new attribute proc macro#[runtime_lint_pass]which detects when acheck_foois defined for a pass and adds the correspondingcheck_foo_neededmethod.RuntimeCombined{Early,Late}LintPass,{Early,Late}LintPassObjectis changed from usingBoxto usingRc<RefCell>.RefCellhas a small overhead but this is dwarfed by the wins from avoiding emptycheck_foocalls.It's a lot of plumbing changes, but it results in some huge wins: in the best cases clippy's runtime is reduced by 30%.