[feat] Add support for Qwen3.5 and Qwen3-Next to ATOM-plugined SGLang by wanzhenchn · Pull Request #532 · ROCm/ATOM

wanzhenchn · 2026-04-09T07:54:32Z

Motivation

Background: ROCm/ATOM#355 and ROCm/ATOM#359.

PR #355 integrated ATOM with upstream SGLang through the SGLANG_EXTERNAL_MODEL_PACKAGE out-of-tree mechanism, replacing a fork-based workflow and establishing atom.plugin.sglang.models as the external entry package for ATOM-backed architectures.

Building on that foundation, this PR extends the SGLang plugin path so that two major ATOM model families—Qwen3-next (Qwen3NextForCausalLM) and Qwen3.5 (Qwen3_5ForConditionalGeneration/Qwen3_5MoeForConditionalGeneration)—can run as first-class external models inside SGLang. The goal is parity with prior ATOM-in-SGLang accuracy while improving end-to-end inference performance on the supported paths (e.g. ATOM’s fused kernels, quantization, and MLA / MoE handling tuned for ROCm), without requiring a patched SGLang tree—users continue to point SGLANG_EXTERNAL_MODEL_PACKAGE at atom.plugin.sglang.models and launch with standard upstream sglang.launch_server.

Technical Details

Qwen3-next
- Qwen3NextForCausalLM is registered under atom.plugin.sglang.models and subclasses _AtomCausalLMBaseForSglang, reusing the same SGLang-facing contract as other OOT entry points: the wrapper calls prepare_model(..., engine="sglang") to build the ATOM weight stack, runs the language model forward with pipeline-parallel state mapped from pp_proxy_tensors, applies LogitsProcessor on the last PP rank, and loads weights via load_model_in_plugin_mode.
- The linear-attention (GDN) path is wrapped by Qwen3NextSglangModel plus sglang_gdn_bridge so GDN layers see the SGLang forward_batch context they expect. At prepare time, apply_qwen3_next_sglang_model_patch swaps atom.models.qwen3_next.Qwen3NextModel to that bridged implementation; the shared prepare hook defaults ATOM_SGLANG_USE_NATIVE_AITER_ATTN_BACKEND for Qwen3NextForCausalLM before register_ops_to_sglang.
Qwen3.5
- Qwen3.5 text / MoE / multimodal stacks reuse SGLang’s in-tree Qwen3_5* container classes while the language tower is still constructed through ATOM prepare_model. apply_prepare_model_adaptations applies Qwen3.5-only config fixes (e.g. MoE text fields, quant remaps for ROCm) and _apply_qwen35_sglang_model_patch rebinds atom.models.qwen3_5.Qwen3_5Model to the SGLang-specific implementation.
- Weights are not applied through SGLang’s default load_weights iterator; instead load_model_in_plugin_mode drives ATOM’s loader, with _QWEN35_OOT_PACKED_MODULES_MAPPING (and related mapping) kept aligned to SGLang’s packed/fused parameter layout so FP8 and fused checkpoints behave like the in-tree model.
- The same prepare hook defaults native Aiter attention backend selection for Qwen3_5* (and Qwen3-next) unless the user has already set the env var.

How to Run

export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models
python3 -m sglang.launch_server --model-path <model_path> ...

The following models have been supported:

Accuracy

# launch server
#! /usr/bin/bash

set -euxo pipefail

export PYTHONPATH=/opt/sglang/python
export SGLANG_DISABLE_CUDNN_CHECK=1

if [ $# -lt 4 ]; then
  echo "Usage: $0 <port> <model_path> <device_id> <enable_atom>"
  exit 1
fi

port=$1
model_path=$2
device_id=$3
enable_atom=$4 # true or false

if [ "${enable_atom}" == true ]; then
  export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models
fi

tp=$(echo "$device_id" |grep -o "[0-9]" |grep -c "")

HIP_VISIBLE_DEVICES=${device_id} python3 -m sglang.launch_server \
  --model-path ${model_path} --port ${port} \
  --tensor-parallel-size ${tp} --mem-fraction-static 0.9 \
  --reasoning-parser qwen3 --disable-radix-cache

model_path=$1
port=$2
task=$3

models_url="http://localhost:${port}/v1/models"
echo "Waiting for OpenAI-compatible server at ${models_url} ..." >&2
until curl -sf --connect-timeout 2 --max-time 10 "${models_url}" >/dev/null; do
  sleep 2
done
echo "Server is up; starting lm_eval." >&2

lm_eval --model local-completions \
        --model_args model=${model_path},base_url=http://localhost:${port}/v1/completions,num_concurrent=64,max_retries=3,max_gen_toks=2048,tokenized_requests=False \
        --tasks ${task} \
        --batch_size auto \
        --trust_remote_code

Qwen3.5-397B-A17B-FP8
- ATOM + SGLang
- SGLang only
Qwen3-Next-80B-A3B-Instruct
- ATOM + SGLang
- SGLang only

Inference Perf.

Qwen3.5-397B-A17B-FP8 on MI355X

Qwen3.5-397B-A17B-FP8 on MI308X

wuhuikx · 2026-04-09T08:03:10Z

atom/models/qwen3_next.py

        q, k = self.rotary_emb(positions, q, k)

-        attn_output = self.attn(q, k, v)
+        attn_output = self.attn(q, k, v, positions=positions, **model_kwargs)


we should remove the **model_kwargs and refer to DS how to implement the attn part.

wuhuikx · 2026-04-09T08:03:40Z

atom/models/qwen3_5.py

+            positions,
+            intermediate_tensors,
+            inputs_embeds,
+            **kwargs,


Let's remove the **kwargs here and use wrapper in plugin/sglang to handle.

…ight shape check failed

…ATOMAttnBackendForSgl

…d UT

wanzhenchn requested review from ganyi1996ppo, wuhuikx and zhuyuhua-v April 9, 2026 07:56

wuhuikx requested a review from Yuechguo April 9, 2026 07:59

wuhuikx reviewed Apr 9, 2026

View reviewed changes

wuhuikx requested a review from ZhiweiYan-96 April 9, 2026 08:04

wanzhenchn force-pushed the feat/qwen3.5-sgl-plugin-final branch 2 times, most recently from 0049094 to c856071 Compare April 13, 2026 23:51

wanzhenchn added 6 commits April 13, 2026 23:54

[feat] ATOM support SGLang out-of-tree for Qwen3-Next

281aab1

[feat] ATOM support SGLang out-of-tree for Qwen3.5 MoE & Dense models

610499c

[plugin][sglang] fix Qwen3.5 FP8-Blockwise weight load and shuffle_we…

307b31e

…ight shape check failed

[plugin][sglang] fix Qwen3.5 FP8-PTPC shuffle_weight check error

d0924d7

[plugin][sglang] keep AiterAttnBackend for Qwen3.5 models instead of …

2fe6a90

…ATOMAttnBackendForSgl

[plugin][sglang] code refactoring for qwen3.5 & qwen3-next models; ad…

a8e9882

…d UT

wanzhenchn force-pushed the feat/qwen3.5-sgl-plugin-final branch from c856071 to a8e9882 Compare April 13, 2026 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add support for Qwen3.5 and Qwen3-Next to ATOM-plugined SGLang#532

[feat] Add support for Qwen3.5 and Qwen3-Next to ATOM-plugined SGLang#532
wanzhenchn wants to merge 6 commits intoROCm:mainfrom
wanzhenchn:feat/qwen3.5-sgl-plugin-final

wanzhenchn commented Apr 9, 2026 •

edited

Loading

Uh oh!

wuhuikx Apr 9, 2026

Uh oh!

wuhuikx Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wanzhenchn commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

How to Run

Accuracy

Inference Perf.

Uh oh!

wuhuikx Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

wuhuikx Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wanzhenchn commented Apr 9, 2026 •

edited

Loading