Skip to content

Dynamic stride support through waveasm#1091

Draft
suryajasper wants to merge 9 commits intoiree-org:mainfrom
suryajasper:dynamic-strides-waveasm
Draft

Dynamic stride support through waveasm#1091
suryajasper wants to merge 9 commits intoiree-org:mainfrom
suryajasper:dynamic-strides-waveasm

Conversation

@suryajasper
Copy link
Contributor

@suryajasper suryajasper commented Mar 10, 2026

This PR adds support for dynamic strides through the waveasm backend. There are 4 main cases that need to be addressed to ensure complete support.

  • 1. waveasm + dynamic strides

[FIXED] the existing dynamic stride logic in waveasm handles the loads correctly but not the stores. The stores go to a flat memref with a static stride of [1], and the MLIR pipeline produces an extract_strided_metadata op to compute the linearized index for the store, accounting for the dynamic strides. This part isn't being handled through the ASM backend, so I added a handler to properly load the strides and handle the linearized computation.

  • 2. waveasm + dynamic strides + dynamic dims

[IN PROGRESS] fails because including the buffer addresses + dynamic dims + dynamic strides overflows the gfx950 limit of preloaded kernel arguments. For example, a simple GEMM with buffer arguments A = MxK, B = NxK, & C = MxN produces 9 (3 buffer pointers, 3 dynamic dims, 3 leading strides) preloaded arguments, which maps to 9 * 2 = 18 preloaded SGPRs, exceeding the limit of 15. For this, I'm working on a fix to only preload the buffer args, and load the scalar args explicitly through s_load_dword. This fixes the waveasm compilation issues, but causes GPU faults, which I am debugging.

  • 3. waveasm + dynamic strides + buffer ops
  • 4. waveasm + dynamic strides + dynamic dims + buffer ops

@suryajasper suryajasper force-pushed the dynamic-strides-waveasm branch from b51b0b3 to 2bb545c Compare March 10, 2026 01:35
"--waveasm-loop-address-promotion",
"--waveasm-linear-scan=max-vgprs=512 max-agprs=512",
"--waveasm-insert-waitcnt=ticketed-waitcnt=false",
"--waveasm-insert-waitcnt=ticketed-waitcnt=true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be controlled through a compile option from the test itself.

@suryajasper suryajasper force-pushed the dynamic-strides-waveasm branch 2 times, most recently from 136a9f0 to 432dcaa Compare March 17, 2026 22:01
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
…ast to output buffer

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>
@suryajasper suryajasper force-pushed the dynamic-strides-waveasm branch from 47f2ef2 to 1995206 Compare March 25, 2026 21:00
suryajasper added a commit to suryajasper/wave that referenced this pull request Mar 25, 2026
Squashed cherry-pick of suryajasper/dynamic-strides-waveasm onto
4waveasm-256x192x256. Merges partial kernel argument preloading,
extract_strided_metadata handler, and dynamic stride test updates.

Commits included:
- Handle memref.extract_strided_metadata in waveasm backend
- Update dynamic strides test & compile options to include waveasm
- xfail waveasm dynamic strides tests w/ dynamic dims or buffer ops
- Fix dynamic strides + dynamic dims through waveasm & accumulator bitcast
- Fixed dynamic strides with bufops w/ waveasm
- Fix mxfp waveasm example to use (2,2) wave shape
- Fixed waveasm dynamic strides to use partial kernel argument preloading

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants