Skip to content

merge_contiguous_reads pass #1191

Open
adedespirlet wants to merge 4 commits intoiree-org:mainfrom
adedespirlet:move-pass-around
Open

merge_contiguous_reads pass #1191
adedespirlet wants to merge 4 commits intoiree-org:mainfrom
adedespirlet:move-pass-around

Conversation

@adedespirlet
Copy link
Contributor

@adedespirlet adedespirlet commented Mar 25, 2026

This PR schedules the pass merge_contiguous_reads earlier in the pipeline (before manual scheduling) so that we can reason about merged read counts instead of only pre-merge reads when building the schedule. It keeps the pass in its original late position too (calls it twice) because for some kernels the pass can't prove contiguity that early in the pipeline. After the pass simplify_indices runs, the index expressions are simplified enough for the pass to prove contiguity. This is the case for the test in scaled_gemm.py (test_dynamic_preshuffle_b_scale_coalescing).

In order to move the pass earlier before manual scheduling, some fixes were required :

  1. Pipelined bug in the 4 wave assymetric schedule:
    After merge_contiguous_reads, ExtractSlice nodes are created between Reads and their consumers Bitcasts. The manual schedule's set_stage only assigns scheduling_parameters to nodes it's given explicitly and ExtractSlice was not part of them. This PR now makes sure the scheduling parameters are propagated properly with propagate_scheduling_parameters_to_extract_slices from source Reads to eahc extract_strided slice.
    With this fix, the liveness_anaysis in Constructpipelined loop, sees the stage gap betwen extacts strided slice and bitcast and thus creates rotating registers to carry value across pipeline iterations.

  2. The other other failure happened in scaled_gemm. The auto-scheduler's create_scheduling_edges skips nodes in ignore_nodes. ExtractSlice was unknown to get_custom_operation_type (returned None), so it landed in ignore_nodes. This broke the dependency chain: edges from Read → ExtractSlice were created, but ExtractSlice → Bitcast edges were not (since ExtractSlice was skipped as a source). Without that edge, Bitcast lost its ordering constraint relative to Read and breaking the stage-transition validation. The fix made get_custom_operation_type resolve ExtractSlice recursively to its source Read's operation keeping it out of ignore_nodes and preserving the full dependency chain.

Signed-off-by: Aurore De Spirlet <aurore.despirlet@amd.com>
Signed-off-by: Aurore De Spirlet <aurore.despirlet@amd.com>
Signed-off-by: Aurore De Spirlet <aurore.despirlet@amd.com>
Signed-off-by: Aurore De Spirlet <aurore.despirlet@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant