Support MLA YaRN RoPE scaling parameters by zhujian19891203 · Pull Request #151 · EvolvingLMMs-Lab/LLaVA-OneVision-2

zhujian19891203 · 2026-05-20T04:03:21Z

Summary

Add explicit MLA YaRN RoPE parameters for beta_fast, beta_slow, and original_max_position_embeddings.
Use mscale_all_dim for MLA softmax scaling and pass original_max_position_embeddings to YarnRotaryEmbedding.
Allow loading checkpoints with different max_position_embeddings, which is needed when YaRN uses an original context length and a scaled context length.

Background

The change follows Megatron-LM MLA/YaRN RoPE behavior; the exact upstream commit was not identified. This is related to NVIDIA/Megatron-LM#1418, which describes the dual meaning of max_position_embeddings for MLA YaRN RoPE.

Add original_max_position_embeddings and expose beta_fast/beta_slow for MLA YaRN RoPE, use mscale_all_dim for softmax scaling, and avoid rejecting checkpoint loads when only max_position_embeddings changes. Local-port-of: f1a85bad9e7f05c4fa80c81181e44414e1ac8ab7 Upstream-note: Based on Megatron-LM MLA/YaRN RoPE behavior; exact upstream commit not identified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support MLA YaRN RoPE scaling parameters#151

Support MLA YaRN RoPE scaling parameters#151
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-mla-yarn-rope-scaling

zhujian19891203 commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zhujian19891203 commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zhujian19891203 commented May 20, 2026 •

edited

Loading