-
Notifications
You must be signed in to change notification settings - Fork 1.6k
egraphs: Add some select optimizations
#6843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at |
jameysharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has some overlap with my PR #6214. I haven't gotten back to that in a while and I'm glad you're picking this up! Although you've marked this a draft, I got excited and reviewed it anyway since I'd reasoned through some of this already.
Huh, I don't remember there being issues with I'm looking at |
|
I'm still looking into the CI failure, but I've had to back out one of the optimizations, I think there is a somewhat similar optimization that we can pursue with It might be worth thinking about this and submitting a future PR more targeted towards that since this one is getting kinda big. |
|
This is looking great! I hadn't thought of bswap/bitrev/popcount also being in the "truthiness-preserving" category. It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the (decl multi truthy ...)
(rule (truthy (sextend _ x)) x)
(rule (truthy (uextend _ x)) x)
(rule (truthy (bmask _ x)) x)
(rule (truthy (ineg _ x)) x)
(rule (truthy (bswap _ x)) x)
(rule (truthy (bitrev _ x)) x)
(rule (truthy (popcnt _ x)) x)
(rule (simplify (bmask ty v)) (if-let x (truthy v)) (bmask ty x))
(rule (simplify (select ty v lhs rhs)) (if-let c (truthy v)) (select ty c lhs rhs))The Definitely feel free to stop at any point though; you're right that there's quite a bit going on here already. |
|
Oh, it occurs to me that rotate left/right also preserves truthiness. Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero... |
Yeah, that's a nice simplification of these rules!
I always think about Z3 with these things, but I have no idea how to actually do that. And I think we would need a way to rewrite CLIF ops into Z3 expressions, right? |
|
If I'm not mistaken, the test case that's failing in CI is (module
(func (export "f32.no_fold_lt_select") (param $x f32) (param $y f32) (result f32)
(select (local.get $x) (local.get $y) (f32.lt (local.get $x) (local.get $y)))
))
(assert_return (invoke "f32.no_fold_lt_select" (f32.const 0.0) (f32.const nan)) (f32.const nan))According to With this PR, after the |
This transform is wrong for the inputs in the attached testcase. It was introduced in bytecodealliance#5546 and has been sort of unnoticed until bytecodealliance#6843 caused some wasmtime testcode to fire this optimization rule and produce this counterexample.
Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the Once this is rebased on top of #6859 it should start passing the tests. |
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
These were unsound since we only represent the results of `{i,f}cmp` as
`-1` for vectors, and not for scalars, scalars get the output of `1`
Cleans up the previous logic a bit. This also adds `rotl`/`rotr` as thruth preserving expressions. As well as `select` when it's branches match certain constants. Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
This also cleans up some merge conflicts that happened when rebasing.
|
With #6859 now merged this should start passing CI. I've also added a entrypoint to the |
jameysharp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and I'm hopeful that CI will pass this time. 😁
I'd love to see some Sightglass numbers from this. (Someday I will remember how to use /bench_x64; is today that day?) With the wide range of cases that you've covered now, I imagine these patterns might fire fairly often, and it would be exciting if this made a measurable difference.
As future work, I think there might be value in a corresponding falsy helper that can be used to swap arguments to select. If nothing else, there's a dedicated wasm instruction that corresponds to (eq ty v 0) so I bet we'd see that match frequently.
(rule (simplify (select ty v t f)) (if-let c (falsy v)) (select ty c f t))| (rule (simplify (ne cty v zero @ (iconst ty (u64_from_imm64 0)))) | ||
| (if-let c (truthy v)) | ||
| (if-let (value_type ty) c) | ||
| (ne cty c zero)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job spotting (ne v 0) as also truthy!
I'm having a little trouble following the constraints on this rule. I think what you're doing is that the operands to icmp must both have the same type, so if truthy unwraps v to a different type then the rule doesn't apply. Is that right?
Contrary to what I said before about re-using the existing iconst, I think this rule will be better if we reconstruct the constant at whatever type c has. In the case where the types do match, we'll go through the GVN map and discover that the new iconst is identical to the old one, which is a little silly; but the rule will also succeed when the types don't match, making it more useful.
| (rule (simplify (ne cty v zero @ (iconst ty (u64_from_imm64 0)))) | |
| (if-let c (truthy v)) | |
| (if-let (value_type ty) c) | |
| (ne cty c zero)) | |
| (rule (simplify (ne cty v (iconst _ (u64_from_imm64 0)))) | |
| (if-let c (truthy v)) | |
| (if-let (value_type ty) c) | |
| (ne cty c (iconst ty (u64_from_imm64 0)))) |
That said, I believe this rule is also correct the way you wrote it; feel free to merge as-is.
In either case, it would be nice to add a comment, either saying that we're constraining value_type to be equal to the type of zero, or saying that we're extracting the type of c to make sure that the zero has the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a little trouble following the constraints on this rule. I think what you're doing is that the operands to icmp must both have the same type, so if truthy unwraps v to a different type then the rule doesn't apply. Is that right?
Yeah, that was pretty much the intent.
I think this rule will be better if we reconstruct the constant at whatever type c has.
Oh, that's really neat! I don't know how I didn't spot that.
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
|
Looks like the bench-api crate is failing to build, but I've ran sightglass manually, and sadly we don't get any improvements. Results |
|
Oh well, thanks for checking. My approval still stands; feel free to merge! |
…odealliance#6859) This transform is wrong for the inputs in the attached testcase. It was introduced in bytecodealliance#5546 and has been sort of unnoticed until bytecodealliance#6843 caused some wasmtime testcode to fire this optimization rule and produce this counterexample.
* egraphs: Remove extends before selects when possible
* egraphs: Rewrite `select+{i,fcmp}` when select arms match cmp result
* egraphs: Simplify double `bmask`
* egraphs: Simplify double `bswap`
* egraphs: Modularize select optimizations
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* egraphs: Remove `bmask+{i,f}cmp` optimizations
These were unsound since we only represent the results of `{i,f}cmp` as
`-1` for vectors, and not for scalars, scalars get the output of `1`
* egraphs: Optimize `bitrev` in `select` and `bmask`
* egraphs: Optimize `popcnt` in `select` and `bmask`
* egraphs: Optimize double `bitrev`
* egraphs: Add `thruthy` expression matcher
Cleans up the previous logic a bit.
This also adds `rotl`/`rotr` as thruth preserving expressions. As well as
`select` when it's branches match certain constants.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* egraphs: Make `icmp ne x, 0` a thruthy expression
* egraphs: Add `icmp ne x, 0` as an expression that can delete thruthy inputs
* egraphs: Add tests for comutative versions of the `icmp ne` thruthy rules
* egraphs: Restrict `icmp ne` optimization to arms with the
This also cleans up some merge conflicts that happened when rebasing.
* egraphs: Restore some accidentally deleted tests
* egraphs: Use `t` and `f` as names for select arms
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* egraphs: Allow `icmp ne` opt to match arms of different types
Co-authored-by: Jamey Sharp <jamey@minilop.net>
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
👋 Hey,
I was playing around with a project that uses cranelift as it's backend, and found some IR patterns where we could do better optimizations.
This PR adds a few optimizations:
We try to remove
{u,s}extendandireducefrom the controlling input ofselect. In many cases we can avoid them sinceselectsupports pretty much any input type.When we have
(select c (iconst -1) (iconst 0))we can remove theselectif we know that the controlling value already generates a-1or0. I've implemented this foricmpandfcmpbmask, but it ended up messing with theicmpandfcmprules, and I'm not sure how to deal with that.Delete consecutive
bmask'sDelete consecutive
bswap'sI've ran sightglass on these and it shows no difference. I've also fuzzed this change for a couple of hours and it also didn't trigger anything.