[Plugin] [Feature] Supoort MLA q/k norm-quant fusion with SGLang + ATOM plugin for Deepseek by qichu-yun · Pull Request #528 · ROCm/ATOM

qichu-yun · 2026-04-09T04:04:34Z

Motivation

DeepSeek MLA preprocessing in the SGLang + ATOM plugin was still doing q/k RMSNorm and q quantization in separate steps, leaving unnecessary kernel and memory overhead in a hot path. Since ATOM already provides a gated fused norm-quant implementation for DeepSeek, this PR integrates that path into the plugin so supported workloads can benefit from the fusion while unsupported cases continue to use the existing fallback path.

before :

after :

Test Plan

lauch server:

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_AITER_FP8_PREFILL_ATTN=0
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1

model_path=/shared/data/amd_int/models/DeepSeek-R1-0528

export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models

TORCHINDUCTOR_COMPILE_THREADS=128 python3 -m sglang.launch_server \
    --model-path $model_path \
    --host localhost \
    --port 9000 \
    --trust-remote-code \
    --tensor-parallel-size 4 \
    --kv-cache-dtype fp8_e4m3 \
    --mem-fraction-static 0.9 \
    --page-size 1 \
    --disable-radix-cache \

client:

model_path=/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4

ISL=8000
OSL=1000
CON=4
NUM=$(( CON * 2 ))
RANGE_RATIO=1.0

PYTHONDONTWRITEBYTECODE=1 python "/home/qichu_qle/my_sgl/bench_serving/benchmark_serving.py" \
  --model=$model_path \
  --backend=sglang \
  --base-url=http://127.0.0.1:9000 \
  --dataset-name=random \
  --random-input-len="${ISL}" \
  --random-output-len="${OSL}" \
  --random-range-ratio "${RANGE_RATIO}" \
  --num-prompts="${NUM}" \
  --max-concurrency="${CON}" \
  --trust-remote-code \
  --request-rate=inf \
  --num-warmups="$(( 2 * CON ))" \
  --ignore-eos \
  --save-result \
  --percentile-metrics="ttft,tpot,itl,e2el" \
  --result-dir="./tmp/oot-benchmark-results" \
  --result-filename="${ISL}_${OSL}_${CON}.json" \
  --profile

Test Result

============ Serving Benchmark Result ============
Successful requests:                     8         
Benchmark duration (s):                  97.66     
Total input tokens:                      64000     
Total generated tokens:                  8000      
Request throughput (req/s):              0.08      
Output token throughput (tok/s):         81.92     
Total Token throughput (tok/s):          737.26    
---------------Time to First Token----------------
Mean TTFT (ms):                          1330.96   
Median TTFT (ms):                        1457.61   
P99 TTFT (ms):                           1891.41   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          20.20     
Median TPOT (ms):                        20.08     
P99 TPOT (ms):                           21.02     
---------------Inter-token Latency----------------
Mean ITL (ms):                           20.20     
Median ITL (ms):                         19.68     
P99 ITL (ms):                            20.15     
----------------End-to-end Latency----------------
Mean E2EL (ms):                          21514.63  
Median E2EL (ms):                        21514.56  
P99 E2EL (ms):                           21516.52  
==================================================

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…OM plugin for Deepseek

[Plugin] [Feature] Supoort MLA q/k norm-quant fusion with SGLang + AT…

4be8ee4

…OM plugin for Deepseek

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Plugin] [Feature] Supoort MLA q/k norm-quant fusion with SGLang + ATOM plugin for Deepseek#528

[Plugin] [Feature] Supoort MLA q/k norm-quant fusion with SGLang + ATOM plugin for Deepseek#528
qichu-yun wants to merge 1 commit intoROCm:mainfrom
qichu-yun:fuse_norm_quant_sgl

qichu-yun commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qichu-yun commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

qichu-yun commented Apr 9, 2026 •

edited

Loading