Skip to content

[BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage#530

Open
whx-sjtu wants to merge 2 commits intomainfrom
whx/fix-glm5-shuffle-oom
Open

[BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage#530
whx-sjtu wants to merge 2 commits intomainfrom
whx/fix-glm5-shuffle-oom

Conversation

@whx-sjtu
Copy link
Copy Markdown

@whx-sjtu whx-sjtu commented Apr 9, 2026

Summary

Currently there exists an OOM error when running GLM5 on mi355 with TP4. The error is introduced by shuffle_weights after loading the model. For MoE models the gemm weights are [num_experts, k, n]. Original version shuffle this weight once. This PR solves this problem by spliting MoE gemm weights expert by expert.

Test plan

  • Reproduce OOM with GLM-5 in vLLM startup before fix (shuffle_weight(...).contiguous() path)
  • Start vLLM with this patch and verify model loads successfully without OOM
  • Run ATOM unit/integration tests covering model_ops shuffle paths

@whx-sjtu whx-sjtu changed the title fix(model_ops): chunk shuffle_weights to reduce load-time OOM [BugFix][model_ops]: chunk shuffle_weights to reduce load-time OOM Apr 9, 2026
@whx-sjtu whx-sjtu changed the title [BugFix][model_ops]: chunk shuffle_weights to reduce load-time OOM [BugFix][model_ops]: chunk shuffle_weights to reduce load-time memory usage Apr 9, 2026
@whx-sjtu whx-sjtu marked this pull request as draft April 9, 2026 08:12
@whx-sjtu whx-sjtu marked this pull request as ready for review April 9, 2026 08:26
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
@whx-sjtu whx-sjtu force-pushed the whx/fix-glm5-shuffle-oom branch from 811422e to 4e5b0b7 Compare April 9, 2026 08:28
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@whx-sjtu whx-sjtu changed the title [BugFix][model_ops]: chunk shuffle_weights to reduce load-time memory usage [BugFix][model_ops] chunk shuffle_weights to reduce load-time memory usage Apr 14, 2026
@whx-sjtu whx-sjtu requested a review from wuhuikx April 14, 2026 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant