-
Notifications
You must be signed in to change notification settings - Fork 1.6k
riscv64: Optimize bitselect+{i,f}cmp codegen
#6876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv64: Optimize bitselect+{i,f}cmp codegen
#6876
Conversation
bitselect+cmp codegenbitselect+{i,f}cmp codegen
|
This might be good for a rule like on x64 where much of the logic of bitselect can be skipped if the mask is known to have a particular pattern? (e.g. generated by icmp or fcmp) |
|
Yeah, that's a neat way of de-duplicating these rules. I also think it might be worth copying that Edit: Hmm actually I don't think we can quite deduplicate this as neatly since we have different ways of generating a Edit2: Wait, now I'm confused. We're already sort of skipping the main bitselect lowering by using |
|
Oh this may not be applicable for riscv64, but on x64 at least there's no bitselect instruction so it uses the and/or/not/etc combo by default, but x64 does have lane-based select so if a bitselect's input mask is a comparison then the and/or/not/etc combo can be skipped in favor of a single lane-based instruction. I figured the same might be possible for riscv64 where bitselect could skip the combo of instructions and generate a single
On x64 I believe this is handled by
Right yeah, my thinking was that like on x64 the lowering for |
|
Oh, okay so it's pretty much the same on RISC-V. We don't have a direct
Here's where this diverges. In the regular In this PR Does that make sense? I think I might be missing something, sorry if I am. -- Yeah I think this is where x64 and RISC-V are different. Reading the spec for the On RISC-V our masks are completely different we only use the lowest |
alexcrichton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha I see, and makes sense, thanks for explaining! In that case I don't think a pattern similar to x64 fits well here, so I think what you've written is the way to go 👍
|
A different way to deduplicate these rules would be along the lines of But I'm not sure that would be more broadly useful so I'm not sure it's worth doing. Just a thought in case it helps with something later. |
👋 Hey,
This is a followup to #6874 where
f{min,max}_pseudowas removed and replaced it withbitselect+fcmp. Here we optimize that pattern into a mask generation instruction andvmerge.vvmthat merges both inputs.This allows us to avoid the quite long sequence for bitselect (4 instructions) and also vector mask expansion (1 instruction) in these patterns.
For tests here I'm relying mostly on wasmtimes wast testsuite.