Skip to content

[feat](minimax): support minimax-2.5 in atom-vllm mode#545

Open
PerryZhang01 wants to merge 1 commit intomainfrom
minimax
Open

[feat](minimax): support minimax-2.5 in atom-vllm mode#545
PerryZhang01 wants to merge 1 commit intomainfrom
minimax

Conversation

@PerryZhang01
Copy link
Copy Markdown
Contributor

@PerryZhang01 PerryZhang01 commented Apr 12, 2026

Motivation

This PR adds support for MiniMax-M2.5 1 in ATOM vLLM plugin mode, following the onboarding flow in add-atom-vllm-model.md. Initial bring-up surfaced gaps between the native ATOM server path and the plugin path that required extra wiring and fixes.

  • Fused QKV in plugin mode: The vLLM plugin attention stack expects a fused qkv buffer not only separate q, k, v tensors.
  • Config propagation: Some HF / runtime flags (e.g. trust_remote_code) are applied correctly at the vLLM CLI layer but were not always forwarded into ATOM’s plugin Config.
  • Kernel / layout edge cases: Remaining none bugs tied to paged-attention assembly kernels.
  • server script need add --trust-remote-code param.

so, currenly we use eager mode to run minimax-2.5 in atom-vllm mode, and these problems will be done in next prs.

Test Result

image

TODO

  • remove qkv in atom-vllm plugin mode and use q,k,v, so qkv cat can be removed
  • add eps in reshape_and_cache_with_pertoken_quant ops to avoid div zero, so plugin can use graph mode
  • fix none bugs in paged-attention assembly kernels if avaiable.

@PerryZhang01 PerryZhang01 force-pushed the minimax branch 2 times, most recently from 9f2d48c to aec43df Compare April 12, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants