Skip to content

[Feature] Support GLM-5 MTP for vLLM Pluggin.#544

Draft
whx-sjtu wants to merge 2 commits intoplugin_sparse_mlafrom
whx/glm5-mtp-vllm-followup
Draft

[Feature] Support GLM-5 MTP for vLLM Pluggin.#544
whx-sjtu wants to merge 2 commits intoplugin_sparse_mlafrom
whx/glm5-mtp-vllm-followup

Conversation

@whx-sjtu
Copy link
Copy Markdown

@whx-sjtu whx-sjtu commented Apr 11, 2026

Summary

  • wire GLM-5 MTP draft registration and load path in vLLM plugin mode, including draft/target namespace isolation and spec-decode weight remapping
  • align sparse MLA/indexer metadata flow for speculative decode so draft model receives decode/prefill metadata in plugin mode
  • fix draft logits path to consume shared lm_head in MTP mode, removing the zero-acceptance collapse in atom+vllm (run_vllm_offline.sh now reports non-zero acceptance)

Verification

  • bash /app/scripts/run_vllm_offline.sh
  • observed Avg Draft acceptance rate: 4.4% in latest run log

Merge dependency

whx-sjtu added 2 commits April 9, 2026 08:29
Signed-off-by: whx-sjtu <xiaowang990929@gmail.com>
Ensure draft/target namespaces and metadata wiring stay consistent for GLM-5 sparse MTP so speculative decoding no longer collapses to zero acceptance in atom+vllm.

Made-with: Cursor
@whx-sjtu whx-sjtu changed the title fix(plugin): align GLM-5 MTP draft path with vLLM speculative decode [Feature] Support GLM-5 MTP for vLLM Pluggin. Apr 11, 2026
@whx-sjtu whx-sjtu marked this pull request as draft April 11, 2026 13:32
@wuhuikx wuhuikx requested a review from ganyi1996ppo April 13, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant