Remaining work for x86 intrinsics

After #115 lands, here's what needs to be done to make x86 SIMD (SSE4.2 and AVX2) feature-complete:

- Load/store interleaved. There's a specialized implementation for `load_interleaved_128_u32x16` on SSE4.2; all other operations (stores, other types, AVX2) use the fallback implementation. I *think* it's implementable for all types, and may be faster if specialized for AVX2, but I haven't tried. (https://github.com/linebender/fearless_simd/pull/140)

    We currently *import* `Fallback` to implement these operations, which could increase code size. If implementing this using native intrinsics isn't possible, we could just regenerate the fallback operations in the x86 code to avoid pulling in all of `Fallback`.

- f32 to u32 conversions (maybe the other way around too?) As a TODO comment in the codegen states, we currently just do an f32 to i32 conversion and pretend it's an f32 to u32 conversion. (#134)

    This is just broken for numbers above `i32::MAX`. [This StackOverflow post](https://stackoverflow.com/questions/34066228/how-to-perform-uint32-float-conversion-with-sse) goes into detail on how to polyfill u32->f32; not sure about the other way.

    As part of this, we should add documentation to f32 -> u32 conversions noting that they are polyfilled on x86, and that f32 -> i32 -> u32 will be faster if the full range isn't needed.

- (Maybe?) AVX2-specialized multiply and shift operations on `i8x16` and `u8x16`. These must be polyfilled by widening to 16-bit, performing the operation, and then narrowing back to 8-bit. Currently, the AVX2 backend widens into two separate `__m128` vectors just like the SSE4.2 backend does, but it could use a single `__m256` instead. I'm not sure if this is actually faster.

- Everything with "precise" in the name, which currently seem to be ignored in the tests. (#133, #136)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remaining work for x86 intrinsics #119

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remaining work for x86 intrinsics #119

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions