Skip to content

Conversation

@afonso360
Copy link
Contributor

👋 Hey,

I was playing around with a project that uses cranelift as it's backend, and found some IR patterns where we could do better optimizations.

This PR adds a few optimizations:

  • We try to remove {u,s}extend and ireduce from the controlling input of select. In many cases we can avoid them since select supports pretty much any input type.

  • When we have (select c (iconst -1) (iconst 0)) we can remove the select if we know that the controlling value already generates a -1 or 0. I've implemented this for icmp and fcmp

    • I wanted to add a base rule that would turn any other value into bmask, but it ended up messing with the icmp and fcmp rules, and I'm not sure how to deal with that.
  • Delete consecutive bmask's

  • Delete consecutive bswap's

I've ran sightglass on these and it shows no difference. I've also fuzzed this change for a couple of hours and it also didn't trigger anything.

@afonso360 afonso360 requested a review from a team as a code owner August 13, 2023 22:00
@afonso360 afonso360 requested review from abrown and removed request for a team August 13, 2023 22:00
@github-actions github-actions bot added cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language labels Aug 13, 2023
@github-actions
Copy link

Subscribe to Label Action

cc @cfallin, @fitzgen

Details This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

  • cfallin: isle
  • fitzgen: isle

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

@afonso360 afonso360 marked this pull request as draft August 14, 2023 13:31
@abrown
Copy link
Member

abrown commented Aug 14, 2023

I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at select? I vaguely recall it being problematic in the past...

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has some overlap with my PR #6214. I haven't gotten back to that in a while and I'm glad you're picking this up! Although you've marked this a draft, I got excited and reviewed it anyway since I'd reasoned through some of this already.

@afonso360
Copy link
Contributor Author

I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at select? I vaguely recall it being problematic in the past...

Huh, I don't remember there being issues with select, I think select_spectre_guard is the instruction that shall not be touched, right? Or are you referring to something else?

I'm looking at select because I was looking at some benchmarks for a project that uses cranelift and noticed some CLIF code that was missing some optimization opportunities. These optimizations aren't really what makes the difference, but It's just something that jumped out at me from looking at that code.

@afonso360
Copy link
Contributor Author

I'm still looking into the CI failure, but I've had to back out one of the optimizations, (bmask (fcmp x)) -> (fcmp x) is wrong since for scalars icmp/fcmp actually return 1 or 0. Only for vectors do they return -1, which is slightly confusing but ¯\_(ツ)_/¯

I think there is a somewhat similar optimization that we can pursue with (select c 1 0) however we no longer have the convenient intermediary bint to replace bmask. Doing something like (select c 1 0) -> (and (bmask c) 1) would work, but I'm not entirely sure how great that is.

It might be worth thinking about this and submitting a future PR more targeted towards that since this one is getting kinda big.

@jameysharp
Copy link
Contributor

This is looking great! I hadn't thought of bswap/bitrev/popcount also being in the "truthiness-preserving" category.

It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the bmask and select cases.

(decl multi truthy ...)
(rule (truthy (sextend _ x)) x)
(rule (truthy (uextend _ x)) x)
(rule (truthy (bmask _ x)) x)
(rule (truthy (ineg _ x)) x)
(rule (truthy (bswap _ x)) x)
(rule (truthy (bitrev _ x)) x)
(rule (truthy (popcnt _ x)) x)

(rule (simplify (bmask ty v)) (if-let x (truthy v)) (bmask ty x))
(rule (simplify (select ty v lhs rhs)) (if-let c (truthy v)) (select ty c lhs rhs))

The (select c 1 0) case could be generalized with another truthy rule, matching whenever the true branch is a non-zero constant and the false branch is a zero constant. That might be more useful than rewriting (select c -1 0) to bmask, even?

Definitely feel free to stop at any point though; you're right that there's quite a bit going on here already.

@jameysharp
Copy link
Contributor

Oh, it occurs to me that rotate left/right also preserves truthiness. Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...

@afonso360
Copy link
Contributor Author

It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the bmask and select cases.

Yeah, that's a nice simplification of these rules!

Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...

I always think about Z3 with these things, but I have no idea how to actually do that. And I think we would need a way to rewrite CLIF ops into Z3 expressions, right?

@jameysharp jameysharp requested review from jameysharp and removed request for abrown August 18, 2023 15:23
@jameysharp
Copy link
Contributor

If I'm not mistaken, the test case that's failing in CI is

(module
  (func (export "f32.no_fold_lt_select") (param $x f32) (param $y f32) (result f32)
    (select (local.get $x) (local.get $y) (f32.lt (local.get $x) (local.get $y)))
))
(assert_return (invoke "f32.no_fold_lt_select" (f32.const 0.0) (f32.const nan)) (f32.const nan))

According to wasmtime compile --emit-clif that compiles to this:

function u0:0(i64 vmctx, i64, f32, f32) -> f32 fast {
    gv0 = vmctx
    gv1 = load.i64 notrap aligned readonly gv0+8
    gv2 = load.i64 notrap aligned gv1
    stack_limit = gv2

                                block0(v0: i64, v1: i64, v2: f32, v3: f32):
@003d                               v5 = fcmp lt v2, v3
@003d                               v6 = uextend.i32 v5
@003e                               v7 = select v6, v2, v3
@003f                               jump block1(v7)

                                block1(v4: f32):
@003f                               return v4
}

With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?

afonso360 added a commit to afonso360/wasmtime that referenced this pull request Aug 18, 2023
This transform is wrong for the inputs in the attached testcase.

It was introduced in bytecodealliance#5546 and has been sort of unnoticed until bytecodealliance#6843 caused
some wasmtime testcode to fire this optimization rule and produce this
counterexample.
@afonso360
Copy link
Contributor Author

afonso360 commented Aug 18, 2023

With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?

Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the f{min,max}_pseudo rules, which are wrong. Sorry for the delay.

Once this is rebased on top of #6859 it should start passing the tests.

github-merge-queue bot pushed a commit that referenced this pull request Aug 21, 2023
This transform is wrong for the inputs in the attached testcase.

It was introduced in #5546 and has been sort of unnoticed until #6843 caused
some wasmtime testcode to fire this optimization rule and produce this
counterexample.
afonso360 and others added 6 commits August 21, 2023 19:36
Cleans up the previous logic a bit.

This also adds `rotl`/`rotr` as thruth preserving expressions. As well as
`select` when it's branches match certain constants.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
This also cleans up some merge conflicts that happened when rebasing.
@afonso360 afonso360 marked this pull request as ready for review August 21, 2023 19:02
@afonso360
Copy link
Contributor Author

With #6859 now merged this should start passing CI.

I've also added a entrypoint to the thruthy rule for (icmp ne v 0), and also added that as a truthiness preserving expression.

Copy link
Contributor

@jameysharp jameysharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great and I'm hopeful that CI will pass this time. 😁

I'd love to see some Sightglass numbers from this. (Someday I will remember how to use /bench_x64; is today that day?) With the wide range of cases that you've covered now, I imagine these patterns might fire fairly often, and it would be exciting if this made a measurable difference.

As future work, I think there might be value in a corresponding falsy helper that can be used to swap arguments to select. If nothing else, there's a dedicated wasm instruction that corresponds to (eq ty v 0) so I bet we'd see that match frequently.

(rule (simplify (select ty v t f)) (if-let c (falsy v)) (select ty c f t))

Comment on lines 130 to 133
(rule (simplify (ne cty v zero @ (iconst ty (u64_from_imm64 0))))
(if-let c (truthy v))
(if-let (value_type ty) c)
(ne cty c zero))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job spotting (ne v 0) as also truthy!

I'm having a little trouble following the constraints on this rule. I think what you're doing is that the operands to icmp must both have the same type, so if truthy unwraps v to a different type then the rule doesn't apply. Is that right?

Contrary to what I said before about re-using the existing iconst, I think this rule will be better if we reconstruct the constant at whatever type c has. In the case where the types do match, we'll go through the GVN map and discover that the new iconst is identical to the old one, which is a little silly; but the rule will also succeed when the types don't match, making it more useful.

Suggested change
(rule (simplify (ne cty v zero @ (iconst ty (u64_from_imm64 0))))
(if-let c (truthy v))
(if-let (value_type ty) c)
(ne cty c zero))
(rule (simplify (ne cty v (iconst _ (u64_from_imm64 0))))
(if-let c (truthy v))
(if-let (value_type ty) c)
(ne cty c (iconst ty (u64_from_imm64 0))))

That said, I believe this rule is also correct the way you wrote it; feel free to merge as-is.

In either case, it would be nice to add a comment, either saying that we're constraining value_type to be equal to the type of zero, or saying that we're extracting the type of c to make sure that the zero has the same type.

Copy link
Contributor Author

@afonso360 afonso360 Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a little trouble following the constraints on this rule. I think what you're doing is that the operands to icmp must both have the same type, so if truthy unwraps v to a different type then the rule doesn't apply. Is that right?

Yeah, that was pretty much the intent.

I think this rule will be better if we reconstruct the constant at whatever type c has.

Oh, that's really neat! I don't know how I didn't spot that.

afonso360 and others added 2 commits August 21, 2023 21:10
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
@afonso360
Copy link
Contributor Author

Looks like the bench-api crate is failing to build, but I've ran sightglass manually, and sadly we don't get any improvements.

Results
     Running `target/debug/sightglass-cli benchmark --processes 5 --iterations-per-process 5 --engine ./main.so --engine ./opt-select.so`

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [9966144 10602864.64 13498656] main.so
  [9952064 11139933.44 26247104] opt-select.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [715466496 839635031.68 1025560928] main.so
  [749480032 821139647.16 1066050816] opt-select.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [199200 244094.72 403680] main.so
  [191488 249337.60 390976] opt-select.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [539328 646485.76 1049376] main.so
  [556704 634736.68 798432] opt-select.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [1427180032 1502991738.48 1606635646] main.so
  [1447102577 1527005523.92 1715667119] opt-select.so

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [271264 329072.64 416160] main.so
  [267264 334138.88 644704] opt-select.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [18601799200 19312205390.44 19938090464] main.so
  [18438769888 19192694238.76 20311795325] opt-select.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [271569902 329694138.04 502450045] main.so
  [257652896 327933358.56 541305504] opt-select.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [123519936 133585408.76 146265844] main.so
  [121208800 132945476.16 176276736] opt-select.so

@jameysharp
Copy link
Contributor

Oh well, thanks for checking. My approval still stands; feel free to merge!

@afonso360 afonso360 added this pull request to the merge queue Aug 21, 2023
Merged via the queue into bytecodealliance:main with commit af35da6 Aug 21, 2023
@afonso360 afonso360 deleted the opt-select branch August 21, 2023 21:38
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Aug 22, 2023
…odealliance#6859)

This transform is wrong for the inputs in the attached testcase.

It was introduced in bytecodealliance#5546 and has been sort of unnoticed until bytecodealliance#6843 caused
some wasmtime testcode to fire this optimization rule and produce this
counterexample.
eduardomourar pushed a commit to eduardomourar/wasmtime that referenced this pull request Aug 22, 2023
* egraphs: Remove extends before selects when possible

* egraphs: Rewrite `select+{i,fcmp}` when select arms match cmp result

* egraphs: Simplify double `bmask`

* egraphs: Simplify double `bswap`

* egraphs: Modularize select optimizations

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* egraphs: Remove `bmask+{i,f}cmp` optimizations

These were unsound since we only represent the results of `{i,f}cmp` as
`-1` for vectors, and not for scalars, scalars get the output of `1`

* egraphs: Optimize `bitrev` in `select` and `bmask`

* egraphs: Optimize `popcnt` in `select` and `bmask`

* egraphs: Optimize double `bitrev`

* egraphs: Add `thruthy` expression matcher

Cleans up the previous logic a bit.

This also adds `rotl`/`rotr` as thruth preserving expressions. As well as
`select` when it's branches match certain constants.

Co-Authored-By: Jamey Sharp <jsharp@fastly.com>

* egraphs: Make `icmp ne x, 0` a thruthy expression

* egraphs: Add `icmp ne x, 0` as an expression that can delete thruthy inputs

* egraphs: Add tests for comutative versions of the `icmp ne` thruthy rules

* egraphs: Restrict `icmp ne` optimization to arms with the

This also cleans up some merge conflicts that happened when rebasing.

* egraphs: Restore some accidentally deleted tests

* egraphs: Use `t` and `f` as names for select arms

Co-authored-by: Jamey Sharp <jamey@minilop.net>

* egraphs: Allow `icmp ne` opt to match arms of different types

Co-authored-by: Jamey Sharp <jamey@minilop.net>

---------

Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift Issues related to the Cranelift code generator isle Related to the ISLE domain-specific language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants