Fix fused indices conversion for non-power-of-two topk/local experts by zhujian19891203 · Pull Request #150 · EvolvingLMMs-Lab/LLaVA-OneVision-2

zhujian19891203 · 2026-05-20T03:43:26Z

Summary

Port the Megatron-LM fix for fused indices-to-multihot conversion when topk or num_of_local_experts is not a power of two.
This fixes a Triton compilation failure seen with DeepSeek V2 MoE models when using DeepEP/flex MoE token dispatch, for example with topk=7.

Source

Upstream fx: NVIDIA/Megatron-LM@bc70535.

Pad Triton arange ranges for topk and local expert counts to the next power of two, while masking out padded lanes. This avoids Triton compilation failures when DeepEP/flex dispatch uses values such as topk=7. Upstream-source: NVIDIA/Megatron-LM@bc70535 Local-port-of: 00b19053b091205316732411e0c5c6dfed355525

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix fused indices conversion for non-power-of-two topk/local experts#150

Fix fused indices conversion for non-power-of-two topk/local experts#150
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-fused-indices-non-power2

zhujian19891203 commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhujian19891203 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Source

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhujian19891203 commented May 20, 2026 •

edited

Loading