[BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage by whx-sjtu · Pull Request #530 · ROCm/ATOM

whx-sjtu · 2026-04-09T07:11:26Z

Summary

Currently there exists an OOM error when running GLM5 on mi355 with TP4. The error is introduced by shuffle_weights after loading the model. For MoE models the gemm weights are [num_experts, k, n]. Original version shuffle this weight once. This PR solves this problem by spliting MoE gemm weights expert by expert.

Test plan

Reproduce OOM with GLM-5 in vLLM startup before fix (shuffle_weight(...).contiguous() path)
Start vLLM with this patch and verify model loads successfully without OOM
Run ATOM unit/integration tests covering model_ops shuffle paths

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

whx-sjtu changed the title ~~fix(model_ops): chunk shuffle_weights to reduce load-time OOM~~ [BugFix][model_ops]: chunk shuffle_weights to reduce load-time OOM Apr 9, 2026

whx-sjtu changed the title ~~[BugFix][model_ops]: chunk shuffle_weights to reduce load-time OOM~~ [BugFix][model_ops]: chunk shuffle_weights to reduce load-time memory usage Apr 9, 2026

whx-sjtu requested review from ganyi1996ppo and zejunchen-zejun April 9, 2026 07:18

whx-sjtu marked this pull request as draft April 9, 2026 08:12

whx-sjtu marked this pull request as ready for review April 9, 2026 08:26

fix(model_ops): handle dense and moe shuffle paths explicitly

4e5b0b7

Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>

whx-sjtu force-pushed the whx/fix-glm5-shuffle-oom branch from 811422e to 4e5b0b7 Compare April 9, 2026 08:28

format atom/model_ops/utils.py

813c84e

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

whx-sjtu changed the title ~~[BugFix][model_ops]: chunk shuffle_weights to reduce load-time memory usage~~ [BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage Apr 14, 2026

whx-sjtu requested a review from wuhuikx April 14, 2026 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage#530

[BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage#530
whx-sjtu wants to merge 2 commits intomainfrom
whx/fix-glm5-shuffle-oom

whx-sjtu commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

whx-sjtu commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

whx-sjtu commented Apr 9, 2026 •

edited

Loading