[WIP] [AMD/ROCM] atom qwen fp8/bf16 on mi355x by seungrokj · Pull Request #1040 · SemiAnalysisAI/InferenceX

seungrokj · 2026-04-16T07:38:20Z

hi,

WIP.
internally tested. shipping soon.

cc. @ChangLiu0709 @andyluo7 @chunfangamd @ajith-sirra-amd

Regards,
Seungrok

Signed-off-by: seungrokj <seungrok.jung@amd.com>

github-actions · 2026-04-16T07:38:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

github-actions · 2026-04-16T07:38:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Signed-off-by: seungrokj <seungrok.jung@amd.com>

claude · 2026-04-16T07:44:59Z

+
+python3 -m atom.entrypoints.openai_server \
+    --model $MODEL \
+    --server-port $PORT \
+    -tp $TP \
+    --kv_cache_dtype fp8 $CALCULATED_MAX_MODEL_LEN $EP \
+    --trust-remote-code \
+    > $SERVER_LOG 2>&1 &
+


🔴 The bf16 atom script (qwen3.5_bf16_mi355x_atom.sh) is a byte-for-byte copy of the fp8 script and incorrectly applies --kv_cache_dtype fp8 (line 49) to a native BF16 model, producing non-representative BF16 benchmark results that are actually fp8-KV-quantized runs. Additionally, the bf16 atom script has no corresponding entry in .github/configs/amd-master.yaml (only fp8 and fp4 atom YAML entries were added), so the benchmark pipeline cannot invoke it at all — fix both issues before merging.

Extended reasoning...

Issue 1: --kv_cache_dtype fp8 in the bf16 atom script

The two scripts qwen3.5_bf16_mi355x_atom.sh and qwen3.5_fp8_mi355x_atom.sh are byte-for-byte identical (verified by diff in the PR). Both pass --kv_cache_dtype fp8 on line 49 of the server launch command. For an fp8-weight model (Qwen/Qwen3.5-397B-A17B-FP8), applying fp8 KV cache is appropriate and expected. For a native BF16 model (Qwen/Qwen3.5-397B-A17B), however, this forces KV cache compression that would not normally be applied in a "plain BF16" scenario, altering both memory utilization and potentially output quality.

Addressing the refutation (intentional pattern)

One verifier argued this is intentional because dsr1_fp8_mi355x_atom.sh and dsr1_fp4_mi355x_atom.sh are identical and both use --kv_cache_dtype fp8, with precision differentiation done via the YAML model field. This is partially correct for the DSR1 case — but crucially, there is a qwen3.5-bf16-mi355x-sglang config using model Qwen/Qwen3.5-397B-A17B, and the corresponding sglang script (qwen3.5_bf16_mi355x.sh) does NOT use --kv_cache_dtype fp8. Every other BF16 benchmark script in this codebase for this model follows the same pattern of omitting KV quantization. If the intent for atom-framework BF16 benchmarking is to also compress the KV cache, that should be an explicit and documented decision — not an accidental copy-paste from the fp8 variant.

Issue 2: Missing YAML config entry for qwen3.5-bf16-mi355x-atom

The PR adds qwen3.5-fp8-mi355x-atom and qwen3.5-fp4-mi355x-atom entries to .github/configs/amd-master.yaml, but no qwen3.5-bf16-mi355x-atom entry. The benchmark pipeline discovers which benchmarks to run from this YAML; without an entry, qwen3.5_bf16_mi355x_atom.sh is an orphaned script that can never be triggered. This is confirmed by grep returning no results for 'qwen3.5-bf16-mi355x-atom' anywhere in the YAML. The PR title explicitly says 'fp8/bf16 on mi355x', so bf16 atom was clearly intended to be integrated.

Concrete proof of the dual problem

Step 1: The YAML entries qwen3.5-fp8-mi355x-atom and qwen3.5-fp4-mi355x-atom both reference their respective model checkpoints (FP8 and MXFP4 variants) and have corresponding benchmark scripts. Step 2: A matching qwen3.5-bf16-mi355x-atom YAML entry with model: Qwen/Qwen3.5-397B-A17B is absent. Step 3: Even if a bf16 atom YAML entry were added now and pointed the pipeline at qwen3.5_bf16_mi355x_atom.sh, the script would still launch the server with --kv_cache_dtype fp8, meaning the resulting numbers would reflect BF16 model weights + FP8 KV cache, not a clean BF16 baseline. Step 4: Compare to qwen3.5-bf16-mi355x-sglang: that config uses Qwen/Qwen3.5-397B-A17B with no --kv_cache_dtype flag — the expected BF16 baseline behavior.

How to fix

In qwen3.5_bf16_mi355x_atom.sh, remove the --kv_cache_dtype fp8 flag (or make it conditional) to match the behavior of other BF16 benchmark scripts. 2. Add a qwen3.5-bf16-mi355x-atom entry to .github/configs/amd-master.yaml with model: Qwen/Qwen3.5-397B-A17B, precision: bf16, framework: atom, and appropriate search-space parameters (similar to the fp8/fp4 atom entries but with the correct BF16 model checkpoint).

atom qwen fp8/bf16 on mi355x

77fb19b

Signed-off-by: seungrokj <seungrok.jung@amd.com>

seungrokj requested a review from a team April 16, 2026 07:38

seungrokj requested review from 1am9trash, billishyahao, chunfangamd and yctseng0211 as code owners April 16, 2026 07:38

github-project-automation bot added this to InferenceMAX Board Apr 16, 2026

atom qwen fp8/bf16 on mi355x

965fc96

Signed-off-by: seungrokj <seungrok.jung@amd.com>

claude bot reviewed Apr 16, 2026

View reviewed changes

seungrokj added AMD and removed AMD labels Apr 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] [AMD/ROCM] atom qwen fp8/bf16 on mi355x#1040

[WIP] [AMD/ROCM] atom qwen fp8/bf16 on mi355x#1040
seungrokj wants to merge 2 commits intomainfrom
srok/atom_qwen3.5_fp8_bf16

seungrokj commented Apr 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

claude bot Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seungrokj commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

github-actions bot commented Apr 16, 2026

Uh oh!

claude bot Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seungrokj commented Apr 16, 2026 •

edited

Loading