Add GGUF support for MiniMax-M2.1 model by JoursBleu · Pull Request #44526 · huggingface/transformers

JoursBleu · 2026-03-08T09:57:38Z

What does this PR do?

Add GGUF loading support for MiniMax-M2.1 (456B MoE) model.

MiniMax-M2.1 is a large Mixture-of-Experts model with 456B total parameters (45.9B active), 256 experts and 8 experts per token. This PR enables loading its GGUF-quantized checkpoints (e.g. unsloth/MiniMax-M2.1-GGUF) via from_pretrained(..., gguf_file=...).

Changes

src/transformers/integrations/ggml.py

Add "minimax-m2" entry to GGUF_CONFIG_MAPPING with model-specific config fields (including MoE fields: expert_count, expert_used_count, expert_feed_forward_length).
Register GGUFQwen2Converter for minimax_m2 in GGUF_TO_FAST_CONVERTERS (tokenizer is compatible with Qwen2).

src/transformers/modeling_gguf_pytorch_utils.py

Add MiniMaxM2TensorProcessor class following the new TensorProcessor API introduced in Qwen2/3 MoE + GGUF model support (restored) #42854:
- preprocess_name(): strips per-expert indices from HF weight names so that multiple experts can map to one fused GGUF tensor.
- perform_fallback_tensor_mapping(): manually maps MiniMax-M2's w1/w2/w3 expert naming to GGUF's ffn_gate/down/up_exps tensor names, since gguf-py's name_map cannot resolve them.
- process(): matches GGUF MoE expert tensors and splits them per-expert.
- _split_moe_expert_tensor(): slices the fused [num_experts, ...] tensor into individual expert weights.
Register processor in TENSOR_PROCESSORS, add model type and architecture mappings.

Testing

Due to the model size (456B parameters, 227GB for Q8_0 GGUF), no CI-compatible unit tests are included. This is consistent with other large MoE models (e.g., Qwen3-30B-A3B in #42854).

Verified end-to-end on 8×AMD W7900D (48GB each) via vLLM serving the Q8_0 GGUF checkpoint:

GSM8K 8-shot: 93.7% (official BF16 baseline: 92.0%)
MMLU 5-shot: 85.66% (official BF16 baseline: 86.2%)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@SunMarc @MekkCyber @ArthurZucker

github-actions · 2026-03-10T04:16:31Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: minimax_m2

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from 42c2be9 to f73a17f Compare March 8, 2026 13:07

JoursBleu mentioned this pull request Mar 9, 2026

[Model][Quantization] Add GGUF support for MiniMax-M2.1 vllm-project/vllm#36444

Draft

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from f73a17f to a5f6a94 Compare March 9, 2026 08:44

Add GGUF support for MiniMax-M2.1 model

13ae746

JoursBleu force-pushed the feat/gguf-minimax-m2 branch from a5f6a94 to 13ae746 Compare March 10, 2026 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GGUF support for MiniMax-M2.1 model#44526

Add GGUF support for MiniMax-M2.1 model#44526
JoursBleu wants to merge 1 commit intohuggingface:mainfrom
JoursBleu:feat/gguf-minimax-m2

JoursBleu commented Mar 8, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JoursBleu commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Testing

Before submitting

Who can review?

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JoursBleu commented Mar 8, 2026 •

edited

Loading