[feat] Add support for Qwen3.5 and Qwen3-Next to ATOM-plugined SGLang#532
Open
wanzhenchn wants to merge 6 commits intoROCm:mainfrom
Open
[feat] Add support for Qwen3.5 and Qwen3-Next to ATOM-plugined SGLang#532wanzhenchn wants to merge 6 commits intoROCm:mainfrom
wanzhenchn wants to merge 6 commits intoROCm:mainfrom
Conversation
wuhuikx
reviewed
Apr 9, 2026
| q, k = self.rotary_emb(positions, q, k) | ||
|
|
||
| attn_output = self.attn(q, k, v) | ||
| attn_output = self.attn(q, k, v, positions=positions, **model_kwargs) |
Contributor
There was a problem hiding this comment.
we should remove the **model_kwargs and refer to DS how to implement the attn part.
| positions, | ||
| intermediate_tensors, | ||
| inputs_embeds, | ||
| **kwargs, |
Contributor
There was a problem hiding this comment.
Let's remove the **kwargs here and use wrapper in plugin/sglang to handle.
0049094 to
c856071
Compare
…ight shape check failed
…ATOMAttnBackendForSgl
c856071 to
a8e9882
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Background: ROCm/ATOM#355 and ROCm/ATOM#359.
PR #355 integrated ATOM with upstream SGLang through the SGLANG_EXTERNAL_MODEL_PACKAGE out-of-tree mechanism, replacing a fork-based workflow and establishing atom.plugin.sglang.models as the external entry package for ATOM-backed architectures.
Building on that foundation, this PR extends the SGLang plugin path so that two major ATOM model families—Qwen3-next (Qwen3NextForCausalLM) and Qwen3.5 (Qwen3_5ForConditionalGeneration/Qwen3_5MoeForConditionalGeneration)—can run as first-class external models inside SGLang. The goal is parity with prior ATOM-in-SGLang accuracy while improving end-to-end inference performance on the supported paths (e.g. ATOM’s fused kernels, quantization, and MLA / MoE handling tuned for ROCm), without requiring a patched SGLang tree—users continue to point
SGLANG_EXTERNAL_MODEL_PACKAGEatatom.plugin.sglang.modelsand launch with standard upstreamsglang.launch_server.Technical Details
Qwen3-next
Qwen3NextForCausalLMis registered underatom.plugin.sglang.modelsand subclasses_AtomCausalLMBaseForSglang, reusing the same SGLang-facing contract as other OOT entry points: the wrapper calls prepare_model(..., engine="sglang") to build the ATOM weight stack, runs the language model forward with pipeline-parallel state mapped from pp_proxy_tensors, applies LogitsProcessor on the last PP rank, and loads weights via load_model_in_plugin_mode.Qwen3NextSglangModelplus sglang_gdn_bridge so GDN layers see the SGLang forward_batch context they expect. At prepare time, apply_qwen3_next_sglang_model_patch swapsatom.models.qwen3_next.Qwen3NextModelto that bridged implementation; the shared prepare hook defaultsATOM_SGLANG_USE_NATIVE_AITER_ATTN_BACKENDfor Qwen3NextForCausalLM before register_ops_to_sglang.Qwen3.5
How to Run
The following models have been supported:
Accuracy
Qwen3.5-397B-A17B-FP8
Qwen3-Next-80B-A3B-Instruct
Inference Perf.