Dynamic stride support through waveasm by suryajasper · Pull Request #1091 · iree-org/wave

suryajasper · 2026-03-10T01:29:04Z

This PR adds support for dynamic strides through the waveasm backend. There are 4 main cases that need to be addressed to ensure complete support.

1. waveasm + dynamic strides

[FIXED] the existing dynamic stride logic in waveasm handles the loads correctly but not the stores. The stores go to a flat memref with a static stride of [1], and the MLIR pipeline produces an extract_strided_metadata op to compute the linearized index for the store, accounting for the dynamic strides. This part isn't being handled through the ASM backend, so I added a handler to properly load the strides and handle the linearized computation.

2. waveasm + dynamic strides + dynamic dims

[IN PROGRESS] fails because including the buffer addresses + dynamic dims + dynamic strides overflows the gfx950 limit of preloaded kernel arguments. For example, a simple GEMM with buffer arguments A = MxK, B = NxK, & C = MxN produces 9 (3 buffer pointers, 3 dynamic dims, 3 leading strides) preloaded arguments, which maps to 9 * 2 = 18 preloaded SGPRs, exceeding the limit of 15. For this, I'm working on a fix to only preload the buffer args, and load the scalar args explicitly through s_load_dword. This fixes the waveasm compilation issues, but causes GPU faults, which I am debugging.

3. waveasm + dynamic strides + buffer ops
4. waveasm + dynamic strides + dynamic dims + buffer ops

panditsa · 2026-03-10T22:07:05Z

wave_lang/kernel/wave/compile.py

            "--waveasm-loop-address-promotion",
            "--waveasm-linear-scan=max-vgprs=512 max-agprs=512",
-            "--waveasm-insert-waitcnt=ticketed-waitcnt=false",
+            "--waveasm-insert-waitcnt=ticketed-waitcnt=true",


This should be controlled through a compile option from the test itself.

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

…ast to output buffer Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Squashed cherry-pick of suryajasper/dynamic-strides-waveasm onto 4waveasm-256x192x256. Merges partial kernel argument preloading, extract_strided_metadata handler, and dynamic stride test updates. Commits included: - Handle memref.extract_strided_metadata in waveasm backend - Update dynamic strides test & compile options to include waveasm - xfail waveasm dynamic strides tests w/ dynamic dims or buffer ops - Fix dynamic strides + dynamic dims through waveasm & accumulator bitcast - Fixed dynamic strides with bufops w/ waveasm - Fix mxfp waveasm example to use (2,2) wave shape - Fixed waveasm dynamic strides to use partial kernel argument preloading Made-with: Cursor

suryajasper force-pushed the dynamic-strides-waveasm branch from b51b0b3 to 2bb545c Compare March 10, 2026 01:35

panditsa reviewed Mar 10, 2026

View reviewed changes

suryajasper force-pushed the dynamic-strides-waveasm branch 2 times, most recently from 136a9f0 to 432dcaa Compare March 17, 2026 22:01

suryajasper added 9 commits March 25, 2026 20:27

Handle memref.extract_strided_metadata in waveasm backend

99be0ed

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Update dynamic strides test & compile options to include waveasm

5864dbe

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

xfail waveasm dynamic strides tests w/ dynamic dims or buffer ops

9b49bf9

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Fix dynamic strides + dynamic dims through waveasm & accumulator bitc…

026542a

…ast to output buffer Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Fixed dynamic strides with bufops w/ waveasm

70e73b5

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Rebase with main

c1555ad

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Fix mxfp waveasm example to use (2,2) wave shape

a33acad

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Fixed waveasm dynamic strides to use partial kernel argument preloading

515c039

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

Removed legacy all or nothing preloading strategy

1995206

Signed-off-by: Surya Jasper <45545431+suryajasper@users.noreply.github.com>

suryajasper force-pushed the dynamic-strides-waveasm branch from 47f2ef2 to 1995206 Compare March 25, 2026 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic stride support through waveasm#1091

Dynamic stride support through waveasm#1091
suryajasper wants to merge 9 commits intoiree-org:mainfrom
suryajasper:dynamic-strides-waveasm

suryajasper commented Mar 10, 2026 •

edited

Loading

Uh oh!

panditsa Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

suryajasper commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

panditsa Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

suryajasper commented Mar 10, 2026 •

edited

Loading