Skip to content

Support MLA YaRN RoPE scaling parameters#151

Open
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-mla-yarn-rope-scaling
Open

Support MLA YaRN RoPE scaling parameters#151
zhujian19891203 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
zhujian19891203:fix-mla-yarn-rope-scaling

Conversation

@zhujian19891203
Copy link
Copy Markdown

@zhujian19891203 zhujian19891203 commented May 20, 2026

Summary

  • Add explicit MLA YaRN RoPE parameters for beta_fast, beta_slow, and original_max_position_embeddings.
  • Use mscale_all_dim for MLA softmax scaling and pass original_max_position_embeddings to YarnRotaryEmbedding.
  • Allow loading checkpoints with different max_position_embeddings, which is needed when YaRN uses an original context length and a scaled context length.

Background

The change follows Megatron-LM MLA/YaRN RoPE behavior; the exact upstream commit was not identified. This is related to NVIDIA/Megatron-LM#1418, which describes the dual meaning of max_position_embeddings for MLA YaRN RoPE.

Add original_max_position_embeddings and expose beta_fast/beta_slow for MLA YaRN RoPE, use mscale_all_dim for softmax scaling, and avoid rejecting checkpoint loads when only max_position_embeddings changes.

Local-port-of: f1a85bad9e7f05c4fa80c81181e44414e1ac8ab7

Upstream-note: Based on Megatron-LM MLA/YaRN RoPE behavior; exact upstream commit not identified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant