-
Notifications
You must be signed in to change notification settings - Fork 21
Description
The "shift left by vector" and "shift right by vector" operations don't lower to hardware operations on many platforms. On WebAssembly and SSE4.2, they all fall back to scalar operations. On AVX2, they're only vectorized for 32-bit and 64-bit operands.
It seems like a bit of a footgun to expose such operations in a way that looks identical to the much faster "shift by scalar". For example, clatter uses a vectorized shift as part of an RNG, and switching to a different algorithm that uses a non-vectorized shift is faster in practice.
It also makes it easier to accidentally use a vectorized shift when a scalar shift would do. You'd probably expect the two code snippets below to produce identical code:
let x: u32x4<S> = y >> 5;let x: u32x4<S> = y >> u32x4::splat(simd, 5);All the other operations take an impl SimdInto<Self, S> as the right-hand side and just call splat internally, so it's reasonable to assume that's what happens for the shifts as well. In this case, however, the latter snippet is slower.