Skip to content

[WIP] [AMD/ROCM] atom qwen fp8/bf16 on mi355x#1040

Open
seungrokj wants to merge 2 commits intomainfrom
srok/atom_qwen3.5_fp8_bf16
Open

[WIP] [AMD/ROCM] atom qwen fp8/bf16 on mi355x#1040
seungrokj wants to merge 2 commits intomainfrom
srok/atom_qwen3.5_fp8_bf16

Conversation

@seungrokj
Copy link
Copy Markdown
Collaborator

@seungrokj seungrokj commented Apr 16, 2026

hi,

WIP.
internally tested. shipping soon.

cc. @ChangLiu0709 @andyluo7 @chunfangamd @ajith-sirra-amd

Regards,
Seungrok

Signed-off-by: seungrokj <seungrok.jung@amd.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Signed-off-by: seungrokj <seungrok.jung@amd.com>
Comment on lines +44 to +52

python3 -m atom.entrypoints.openai_server \
--model $MODEL \
--server-port $PORT \
-tp $TP \
--kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN $EP \
--trust-remote-code \
> $SERVER_LOG 2>&1 &

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The bf16 atom script (qwen3.5_bf16_mi355x_atom.sh) is a byte-for-byte copy of the fp8 script and incorrectly applies --kv_cache_dtype fp8 (line 49) to a native BF16 model, producing non-representative BF16 benchmark results that are actually fp8-KV-quantized runs. Additionally, the bf16 atom script has no corresponding entry in .github/configs/amd-master.yaml (only fp8 and fp4 atom YAML entries were added), so the benchmark pipeline cannot invoke it at all — fix both issues before merging.

Extended reasoning...

Issue 1: --kv_cache_dtype fp8 in the bf16 atom script

The two scripts qwen3.5_bf16_mi355x_atom.sh and qwen3.5_fp8_mi355x_atom.sh are byte-for-byte identical (verified by diff in the PR). Both pass --kv_cache_dtype fp8 on line 49 of the server launch command. For an fp8-weight model (Qwen/Qwen3.5-397B-A17B-FP8), applying fp8 KV cache is appropriate and expected. For a native BF16 model (Qwen/Qwen3.5-397B-A17B), however, this forces KV cache compression that would not normally be applied in a "plain BF16" scenario, altering both memory utilization and potentially output quality.

Addressing the refutation (intentional pattern)

One verifier argued this is intentional because dsr1_fp8_mi355x_atom.sh and dsr1_fp4_mi355x_atom.sh are identical and both use --kv_cache_dtype fp8, with precision differentiation done via the YAML model field. This is partially correct for the DSR1 case — but crucially, there is a qwen3.5-bf16-mi355x-sglang config using model Qwen/Qwen3.5-397B-A17B, and the corresponding sglang script (qwen3.5_bf16_mi355x.sh) does NOT use --kv_cache_dtype fp8. Every other BF16 benchmark script in this codebase for this model follows the same pattern of omitting KV quantization. If the intent for atom-framework BF16 benchmarking is to also compress the KV cache, that should be an explicit and documented decision — not an accidental copy-paste from the fp8 variant.

Issue 2: Missing YAML config entry for qwen3.5-bf16-mi355x-atom

The PR adds qwen3.5-fp8-mi355x-atom and qwen3.5-fp4-mi355x-atom entries to .github/configs/amd-master.yaml, but no qwen3.5-bf16-mi355x-atom entry. The benchmark pipeline discovers which benchmarks to run from this YAML; without an entry, qwen3.5_bf16_mi355x_atom.sh is an orphaned script that can never be triggered. This is confirmed by grep returning no results for 'qwen3.5-bf16-mi355x-atom' anywhere in the YAML. The PR title explicitly says 'fp8/bf16 on mi355x', so bf16 atom was clearly intended to be integrated.

Concrete proof of the dual problem

Step 1: The YAML entries qwen3.5-fp8-mi355x-atom and qwen3.5-fp4-mi355x-atom both reference their respective model checkpoints (FP8 and MXFP4 variants) and have corresponding benchmark scripts. Step 2: A matching qwen3.5-bf16-mi355x-atom YAML entry with model: Qwen/Qwen3.5-397B-A17B is absent. Step 3: Even if a bf16 atom YAML entry were added now and pointed the pipeline at qwen3.5_bf16_mi355x_atom.sh, the script would still launch the server with --kv_cache_dtype fp8, meaning the resulting numbers would reflect BF16 model weights + FP8 KV cache, not a clean BF16 baseline. Step 4: Compare to qwen3.5-bf16-mi355x-sglang: that config uses Qwen/Qwen3.5-397B-A17B with no --kv_cache_dtype flag — the expected BF16 baseline behavior.

How to fix

  1. In qwen3.5_bf16_mi355x_atom.sh, remove the --kv_cache_dtype fp8 flag (or make it conditional) to match the behavior of other BF16 benchmark scripts. 2. Add a qwen3.5-bf16-mi355x-atom entry to .github/configs/amd-master.yaml with model: Qwen/Qwen3.5-397B-A17B, precision: bf16, framework: atom, and appropriate search-space parameters (similar to the fp8/fp4 atom entries but with the correct BF16 model checkpoint).

@seungrokj seungrokj added AMD and removed AMD labels Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant