Skip to content

Fix fused indices conversion for non-power-of-two topk/local experts#150

Open
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-fused-indices-non-power2
Open

Fix fused indices conversion for non-power-of-two topk/local experts#150
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-fused-indices-non-power2

Conversation

@zhujian19891203
Copy link
Copy Markdown

@zhujian19891203 zhujian19891203 commented May 20, 2026

Summary

Port the Megatron-LM fix for fused indices-to-multihot conversion when topk or num_of_local_experts is not a power of two.
This fixes a Triton compilation failure seen with DeepSeek V2 MoE models when using DeepEP/flex MoE token dispatch, for example with topk=7.

Source

Upstream fx: NVIDIA/Megatron-LM@bc70535.

Pad Triton arange ranges for topk and local expert counts to the next power of two, while masking out padded lanes. This avoids Triton compilation failures when DeepEP/flex dispatch uses values such as topk=7.

Upstream-source: NVIDIA/Megatron-LM@bc70535

Local-port-of: 00b19053b091205316732411e0c5c6dfed355525
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant