Skip to content

[BugFix] enable deepseek r1 fp4#527

Open
ZLkanyo009 wants to merge 1 commit intomainfrom
lingzha/enable-dpsk-fp4
Open

[BugFix] enable deepseek r1 fp4#527
ZLkanyo009 wants to merge 1 commit intomainfrom
lingzha/enable-dpsk-fp4

Conversation

@ZLkanyo009
Copy link
Copy Markdown

@ZLkanyo009 ZLkanyo009 commented Apr 9, 2026

Motivation

For FP4 DeepSeek, the attention part is fully BF16 while the MoE part is FP4. Therefore, the scale for the attention part is None. For regular DeepSeek FP8, the attention part is also FP8 and requires quantization. However, the case where scale is None was previously ignored. This PR fixes that bug.

command

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_AITER_FP8_PREFILL_ATTN=0
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1
 
model_path=/workspace/model/DeepSeek-R1-0528-MXFP4/
export PYTHONPATH=/workspace/dpsk-r1-fp4/sglang/python:/workspace/dpsk-r1-fp4/ATOM_oot/ATOM
 
 
export SGLANG_PROFILE_RECORD_SHAPES=1
export SGLANG_PROFILE_WITH_STACK=1
export SGLANG_TORCH_PROFILER_DIR=/workspace/dpsk-r1-fp4/sglang/profile_log
 
# export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.model_wrapper
export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models

export ATOM_PROFILE_MLA_ABSORBED_BMM=1

TORCHINDUCTOR_COMPILE_THREADS=128 python3 -m sglang.launch_server \
    --model-path $model_path \
    --host localhost \
    --port 8000 \
    --trust-remote-code \
    --tensor-parallel-size 8 \
    --kv-cache-dtype fp8_e4m3 \
    --mem-fraction-static 0.9 \
    --page-size 1 \
    --disable-radix-cache \
    --skip-server-warmup \
    --disable-cuda-graph > log.serve.atom.oot.fp4.log 2>&1

@ZLkanyo009 ZLkanyo009 force-pushed the lingzha/enable-dpsk-fp4 branch from d649906 to b3dd131 Compare April 13, 2026 03:01
@ZLkanyo009 ZLkanyo009 force-pushed the lingzha/enable-dpsk-fp4 branch from b3dd131 to 06eedc6 Compare April 13, 2026 03:06
@ZLkanyo009 ZLkanyo009 requested a review from zhuyuhua-v April 13, 2026 03:06
Copy link
Copy Markdown
Contributor

@zhuyuhua-v zhuyuhua-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants