Add schedule for 256x224x256 macro tile by willghatch · Pull Request #1129 · iree-org/wave

willghatch · 2026-03-13T22:43:57Z

It is a no_unroll schedule to get under the register budget. This gets the macro tile functional with the waveasm backend.

For the 7.1 example, it adds

--wave_shape flag -- Previously (1,4) was hard-coded, but the 256x224x256 tile needed (2, 2) because the N dimension was not divisible by 4 after pipelining... I think was the reason we chose that.
--no_unroll flag to access the new no_unroll schedule.

The particular 7.1 example target for this work was python examples/python/7.1_schedule.py --block 256,224,256 --shape 1024,896,8192 --wave_shape 2,2 --no-unroll --test test_dbuf_4wave_mxfp_preshuffle_b_gemm_cpp

This also adds an e2e waveasm test.

At this stage no real effort has been made to make the schedule performant, just to get it working.

harsh-nod · 2026-03-13T23:40:49Z

examples/python/7.1_schedule.py

-    schedule = get_mxfp4_asymmetric_schedule(
-        eliminate_epilogue=eliminate_epilogue, is_bscale_shuffled=True
-    )
+    if no_unroll:


Do we need a new schedule for this? Can it be an option in get_mxfp4_asymmetric_schedule ?

The new schedule is necessary. Without the new schedule it blows the register budget (even if skipping the unrolling for the schedule). The new schedule moves ops within the schedule to reduce memory pressure.

Can this not be an option in the get_mxfp4_asymmetric_schedule? If the only change in the asymmetric schedule disabling the unroll factor, you can do something like this in asymmetric schedule:

if not no_unroll: # passed as an option default false tkw.unroll ...

I've pushed a commit that now “uses the same schedule”, but that has conditions in the schedule based on whether it uses unrolling or not. I don't think it's really an improvement. But it maybe makes it easier to see the differences. The new schedule has 3 clusters instead of 2, it uses different interleaving. It is a fairly different schedule. The original schedule blows the register budget, even with no unrolling.

It is a no_unroll schedule to get under the register budget. This gets the macro tile functional with the waveasm backend. For the 7.1 example, it adds - `--wave_shape` flag -- Previously (1,4) was hard-coded, but the 256x224x256 tile needed (2, 2) because the N dimension was not divisible by 4 after pipelining... I think was the reason we chose that. - `--no_unroll` flag to access the new no_unroll schedule. The particular 7.1 example target for this work was `python examples/python/7.1_schedule.py --block 256,224,256 --shape 1024,896,8192 --wave_shape 2,2 --no-unroll --test test_dbuf_4wave_mxfp_preshuffle_b_gemm_cpp` This also adds an e2e waveasm test. At this stage no real effort has been made to make the schedule performant, just to get it working. Signed-off-by: William G Hatch <william@hatch.uno>

…c schedule The no-unroll path needs a different kernel interleaving strategy than the unrolled path: 2-group interleaving (shared A loads interleaved with MMA) with B loads and G2S prefetches in a separate third cluster, rather than 4-group interleaving that folds B loads and G2S directly into the two MMA clusters. The 4-group pattern was designed for the unrolled kernel where the larger loop body can absorb the extra live values; with unroll_factor=1 the tighter loop needs the third cluster to keep VGPR pressure in check.

willghatch force-pushed the users/willghatch/mt-256x224x256 branch from 314dad9 to 1c33bc7 Compare March 13, 2026 22:44

willghatch requested a review from harsh-nod March 13, 2026 22:45

harsh-nod reviewed Mar 13, 2026

View reviewed changes

willghatch added 2 commits March 16, 2026 11:35

willghatch force-pushed the users/willghatch/mt-256x224x256 branch from 1c33bc7 to b782e12 Compare March 16, 2026 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schedule for 256x224x256 macro tile#1129

Add schedule for 256x224x256 macro tile#1129
willghatch wants to merge 2 commits intomainfrom
users/willghatch/mt-256x224x256

willghatch commented Mar 13, 2026

Uh oh!

harsh-nod Mar 13, 2026

Uh oh!

willghatch Mar 16, 2026

Uh oh!

panditsa Mar 16, 2026

Uh oh!

willghatch Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

willghatch commented Mar 13, 2026

Uh oh!

harsh-nod Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

willghatch Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

panditsa Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

willghatch Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants