Skip to content

Remaining work for x86 intrinsics #119

@valadaptive

Description

@valadaptive

After #115 lands, here's what needs to be done to make x86 SIMD (SSE4.2 and AVX2) feature-complete:

  • Load/store interleaved. There's a specialized implementation for load_interleaved_128_u32x16 on SSE4.2; all other operations (stores, other types, AVX2) use the fallback implementation. I think it's implementable for all types, and may be faster if specialized for AVX2, but I haven't tried. (Implement all interleaved load/store ops on x86 #140)

    We currently import Fallback to implement these operations, which could increase code size. If implementing this using native intrinsics isn't possible, we could just regenerate the fallback operations in the x86 code to avoid pulling in all of Fallback.

  • f32 to u32 conversions (maybe the other way around too?) As a TODO comment in the codegen states, we currently just do an f32 to i32 conversion and pretend it's an f32 to u32 conversion. (Implement all float<->int conversions on x86 #134)

    This is just broken for numbers above i32::MAX. This StackOverflow post goes into detail on how to polyfill u32->f32; not sure about the other way.

    As part of this, we should add documentation to f32 -> u32 conversions noting that they are polyfilled on x86, and that f32 -> i32 -> u32 will be faster if the full range isn't needed.

  • (Maybe?) AVX2-specialized multiply and shift operations on i8x16 and u8x16. These must be polyfilled by widening to 16-bit, performing the operation, and then narrowing back to 8-bit. Currently, the AVX2 backend widens into two separate __m128 vectors just like the SSE4.2 backend does, but it could use a single __m256 instead. I'm not sure if this is actually faster.

  • Everything with "precise" in the name, which currently seem to be ignored in the tests. (Nail down the semantics for min/max and min_precise/ max_precise #133, Implement new min/max/min_precise/max_precise semantics #136)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions