[Klaud Cold] dsv4-fp4-mi355x-sglang-disagg: DeepSeek-V4-Pro SGLang disagg (8k1k conc=1 smoke test) by functionstackx · Pull Request #1708 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-11T04:39:14Z

Summary

Adds dsv4-fp4-mi355x-sglang-disagg — a DeepSeek-V4-Pro FP4 prefill/decode-disaggregated benchmark on MI355X via SGLang + MoRI. It combines the two references this work was scoped against:

the validated single-node DSv4 SGLang recipe (dsv4-fp4-mi355x-sglang and its MTP variant), and
the SGLang-disagg framework (MoRI KV transfer + sglang_router) introduced for the dsr1 / qwen3.5 / glm5 MI355X recipes (#1570, #1572, #1579).

Files

File	Change
`benchmarks/multi_node/dsv4_fp4_mi355x_sglang-disagg.sh`	New model-agnostic launcher (same shape as the qwen3.5/glm5 `sglang-disagg` wrappers, with `NODE_LIST` for local smoke). `launch_mi355x-amds.sh` resolves `dsv4`+`fp4`+`sglang-disagg` → this filename.
`benchmarks/multi_node/amd_utils/models.yaml`	New `DeepSeek-V4-Pro` entry (base/dp flags + prefill/decode profiles).
`benchmarks/multi_node/amd_utils/env.sh`	DSv4 FP4-experts `SGLANG_*` env block + deep_gemm-absence fallback, gated on `MODEL_NAME`.
`benchmarks/multi_node/amd_utils/setup_deps.sh`	Idempotent, atomic `config.json` `model_type` patch, gated on `MODEL_NAME`.
`.github/configs/amd-master.yaml`	New `dsv4-fp4-mi355x-sglang-disagg` block.

Serving config (`models.yaml`)

base_flags mirror the validated single-node DSv4 SGLang recipe so the per-worker engine config matches the known-good aggregated run:

--attention-backend dsv4, --swa-full-tokens-ratio 0.15, --page-size 256, --disable-shared-experts-fusion
--tool-call-parser deepseekv4 --reasoning-parser deepseek-v4, the DSv4 thinking chat template
disagg essentials: --disaggregation-transfer-backend mori --load-balance-method round_robin --watchdog-timeout 3600
--context-length 9472 is pinned (the model default is very long → would over-reserve KV); covers the 8k/1k smoke point.
--kv-cache-dtype left at the model default (the single-node DSv4 recipe sets none), unlike the fp8_e4m3 DeepSeek-R1 disagg entries.

Env / flags track the validated 0610 single-node recipe (PR #1701, "[AMD][MI35X] 0610 DSV4", successful run): mainline …-20260610 image, --attention-backend dsv4 (not compressed), unified_kv_triton FlashMLA, the aiter indexer, the mainline fp8 wo_a / topk-v2 fallbacks hardcoded (so no deep_gemm-presence detect), and the branch-only SGLANG_DSV4_FP4_EXPERTS / SGLANG_FORCE_TRITON_MOE_FP8 flags dropped. The prefill delayer (--enable-prefill-delayer) is intentionally not used.

The DSv4 SGLANG_* env block (SGLANG_DSV4_FP4_EXPERTS=True, SGLANG_FORCE_TRITON_MOE_FP8=0, aiter MHC, tilelang indexer, triton FlashMLA, …) are copied verbatim from the single-node recipe into env.sh, gated on MODEL_NAME.

Image: `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260610`

This is the mainline ROCm nightly the DSv4 MTP single-node recipe (dsv4-fp4-mi355x-sglang-mtp) already runs on. It is the right image for disagg because it carries both:

DSv4 model support — sgl#26383 ([AMD][DSV4], merged to mainline 2026-05-27), and
the MoRI disaggregation transfer backend — it's on the same …-mi35x-… image line as the dsr1/qwen3.5/glm5 disagg recipes.

The aggregated dsv4-fp4-mi355x-sglang entry uses rocm/sgl-dev:*-DSv4, cut from the amd/deepseek_v4 branch, which lacks #26383 and has unverified MoRI support — so it's not suitable for disagg. Mainline omits deep_gemm; env.sh detects that and routes the DSv4 fp8 wo_a / topk paths to torch fallbacks (same logic as the MTP single-node recipe), so it runs on both image lines. The v0.5.12.post1 tag also auto-applies the MoRI conn.py overlay (job.slurm) that fixes the KV wire format for hybrid/sparse-attention models.

Topology

1P1D, TP8/EP1, dp-attn false — the same conservative starting point the qwen3.5 and glm5 sglang-disagg recipes launched with.

Scope: smoke test first

Per request, this starts minimal: a single ISL/OSL (8k/1k) at conc=1, to validate end-to-end that DSv4 + MoRI disaggregation comes up and transfers KV at all on this image, before expanding to the full conc sweep (and DEP / 1P2D). At conc=1 the generator emits one config and skips eval.

Validated locally:

$ generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml \
    --framework sglang-disagg --runner-type mi355x-disagg
# 1 dsv4_8k1k config: isl=8192 osl=1024 conc=[1], 1P1D TP8/EP1, image …-20260601
$ validate_master_config(amd-master.yaml)   # all 75 entries valid
# config.json patch unit-tested: deepseek_v4 -> deepseek_v3, architectures preserved, idempotent
# bash -n on launcher / env.sh / setup_deps.sh: clean

Test plan

Apply the sweep label so run-sweep.yml exercises dsv4-fp4-mi355x-sglang-disagg at 8k1k/conc=1.
Confirm …-mi35x-20260601 imports on a fresh MI355X-disagg runner and supports --disaggregation-mode + --disaggregation-transfer-backend mori with the DSv4 model class.
Confirm DeepSeek-V4-Pro is staged at $MODEL_DIR/DeepSeek-V4-Pro; verify the setup_deps.sh config.json patch fires (and is a safe no-op on re-runs / concurrent nodes).
Verify --attention-backend dsv4 + --page-size 256 interoperate with MoRI KV transfer (the highest-risk unknown — DSv4 sparse MLA vs the MLA shapes MoRI was validated against).
On green: expand conc, then enable mori-EP decode (which will require carrying the sglang#27855 aiter fix), and evaluate DEP / 1P2D and MTP.

Risks / open questions

Highest risk: DSv4's compressed/sparse-MLA attention + --page-size 256 over the MoRI KV transport is an unvalidated combination (MoRI disagg was validated against DeepSeek-R1 MLA and Qwen3.5/GLM-5). The smoke test exists to surface exactly this.
Decode keeps radix cache enabled (the framework only disables it on prefill); harmless for random-token throughput, revisit if it affects correctness.

🤖 Generated with Claude Code

Note

Medium Risk
Touches multi-node disagg launch paths and mutates shared NFS config.json for DSv4; main unknown is DSv4 sparse MLA KV over MoRI at the chosen page size.

Overview
Adds dsv4-fp4-mi355x-sglang-disagg, a DeepSeek-V4-Pro FP4 prefill/decode-disaggregated MI355X benchmark on SGLang + MoRI, wired like the existing dsr1/qwen3.5/glm5 disagg recipes.

amd-master.yaml registers the recipe on lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260610 with a 1P1D TP8/EP1 smoke point at 8k/1k, conc=1 (dp-attn off). benchmarks/multi_node/dsv4_fp4_mi355x_sglang-disagg.sh is a thin launcher that maps topology from the master config into amd_utils/submit.sh.

amd_utils/models.yaml gains DeepSeek-V4-Pro with MoRI disagg flags aligned to the validated single-node DSv4 recipe (dsv4 attention, SWA, page-size 256, parsers/template, pinned context-length, no forced kv-cache-dtype). env.sh adds a MODEL_NAME-gated SGLANG_* block (FlashMLA/indexer/fp8 fallbacks from PR #1701). setup_deps.sh adds an idempotent atomic config.json model_type patch (deepseek_v4 → deepseek_v3) on shared NFS weights.

perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 316dd21. Bugbot is set up for automated code reviews on this repo. Configure here.}

…sagg (8k1k conc=1 smoke test) Adds a DeepSeek-V4-Pro FP4 prefill/decode-disaggregated recipe on MI355X via SGLang + MoRI, combining the validated single-node DSv4 SGLang recipe with the sglang-disagg framework used by the dsr1 / qwen3.5 / glm5 mi355x recipes (#1570, #1572, #1579). - benchmarks/multi_node/dsv4_fp4_mi355x_sglang-disagg.sh: model-agnostic launcher (same shape as the qwen3.5/glm5 wrappers, with NODE_LIST support). - amd_utils/models.yaml: DeepSeek-V4-Pro entry. Serving flags mirror the single-node recipe (compressed attention, SWA, page-size 256, deepseekv4/ deepseek-v4 parsers, DSv4 thinking chat template, shared-experts-fusion off); context-length pinned; kv-cache-dtype left at model default. - amd_utils/env.sh: DSv4 FP4-experts SGLANG_* env block + deep_gemm-absence fallback, gated on MODEL_NAME. - amd_utils/setup_deps.sh: idempotent, atomic config.json model_type patch (deepseek_v4 -> deepseek_v3, architectures preserved), gated on MODEL_NAME. - amd-master.yaml: dsv4-fp4-mi355x-sglang-disagg, 1P1D TP8/EP1 dp-attn false, image v0.5.12.post1-rocm720-mi35x-20260601 (mainline w/ DSv4 #26383 + MoRI disagg; auto-applies the MoRI conn.py overlay). Starts at a single ISL/OSL (8k/1k) conc=1 to smoke-test that DSv4 + MoRI disagg comes up and transfers KV on this image before expanding the sweep. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T04:39:21Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-11T04:39:21Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T04:39:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-06-11T04:39:22Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

…sh fix) DeepSeek-V4-Pro + MoRI expert-parallel aborts at warmup with "dynamic_per_group_scaled_quant_kernel not implemented for dtype fp4x2" on the clamped-SwiGLU/INTERLEAVE path. sgl-project/sglang#27855 fixes it in moe_runner/aiter.py:_pre_permute_deepep_to_aiter (W4A4 + FP4-dispatch branch that dequants the FP4 activation to BF16 via upscale_mxfp4) but is unmerged and absent from the pinned image. setup_deps.sh now source-patches aiter.py at container start, gated on MODEL_NAME == DeepSeek-V4-Pro: idempotent, atomic write, warn+skip if the image's aiter.py predates the anchored structure. Verified byte-identical to the PR head against current sglang main. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T05:33:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27324471640
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27324471640

…ention backend Realigns the DSv4 sglang-disagg recipe with the validated 0610 single-node recipe (PR #1701, "[AMD][MI35X] 0610 DSV4", successful run): - image -> lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260610 - env.sh DSv4 block replaced with #1701's: unified_kv_triton FlashMLA, aiter indexer (not tilelang), mainline fp8 wo_a / topk-v2 fallbacks hardcoded (SGLANG_OPT_FP8_WO_A_GEMM=false, SGLANG_OPT_USE_TOPK_V2=false) instead of the deep_gemm-presence detect; SGLANG_DEFAULT_THINKING / SGLANG_DSV4_REASONING_EFFORT; multi-stream overlap off. Branch-only SGLANG_DSV4_FP4_EXPERTS / SGLANG_FORCE_TRITON_MOE_FP8 dropped (DSv4 main no longer needs them). - models.yaml base_flags: --attention-backend compressed -> dsv4; dp_flags add --enable-prefill-delayer --prefill-delayer-max-delay-ms 5000 (the #1701 DP path). Still a v0.5.12.post1 tag, so the MoRI conn.py overlay auto-applies; the #27855 aiter monkey-patch is unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T05:35:33Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27326105979
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27326105979

Per request, do not use --enable-prefill-delayer / --prefill-delayer-max-delay-ms in the DSv4 sglang-disagg recipe. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T05:37:15Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27326175156
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27326175156

…, no EP) The #27855 fix only matters on the DSv4 + MoRI expert-parallel path. This recipe is TP8/EP1 for the smoke test, so that crash isn't reachable. Remove the patch_aiter_dsv4_fp4_swiglu source-patch from setup_deps.sh; a comment in amd-master.yaml records that it's needed only when EP/DEP decode is enabled. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T05:41:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27326242579
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27326242579

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 316dd21. Configure here.}

cursor · 2026-06-11T05:49:38Z

+        # multi-stream
+        export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=false
+        export SGLANG_ROCM_USE_MULTI_STREAM=false
+    fi


Missing mainline deep_gemm JIT off

Medium Severity

The DeepSeek-V4-Pro block hardcodes SGLANG_OPT_FP8_WO_A_GEMM and SGLANG_OPT_USE_TOPK_V2 for the mainline …-20260610 image, but omits SGLANG_ENABLE_JIT_DEEPGEMM=0 (and SGLANG_TOPK_TRANSFORM_512_TORCH=1) that the in-repo mainline DSv4 recipe sets when deep_gemm is absent. That image line has no deep_gemm, so startup can still hit JIT or top-k paths that expect it.

^{Reviewed by Cursor Bugbot for commit 316dd21. Configure here.}

cursor · 2026-06-11T05:49:38Z

+    os.path.exists(tmp) and os.remove(tmp)
+    raise
+PYEOF
+    _SETUP_INSTALLED+=("dsv4-config-model-type")


Setup logs false config patch

Low Severity

patch_dsv4_config always appends dsv4-config-model-type to _SETUP_INSTALLED after the Python helper returns, including when the helper exits early because model_type is already deepseek_v3. Setup summary then reports an install/patch that did not run.

Additional Locations (1)

benchmarks/multi_node/amd_utils/setup_deps.sh#L767-L770

^{Reviewed by Cursor Bugbot for commit 316dd21. Configure here.}

github-actions · 2026-06-11T06:52:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27326392188
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27326392188

functionstackx requested a review from a team June 11, 2026 04:39

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 11, 2026 04:39

github-project-automation Bot added this to InferenceMAX Board Jun 11, 2026

perf-changelog: add dsv4-fp4-mi355x-sglang-disagg entry (#1708)

42c97e6

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx added the full-sweep-enabled label Jun 11, 2026

dsv4 sglang-disagg: drop the prefill delayer from dp_flags

9648a81

Per request, do not use --enable-prefill-delayer / --prefill-delayer-max-delay-ms in the DSv4 sglang-disagg recipe. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

cursor Bot reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] dsv4-fp4-mi355x-sglang-disagg: DeepSeek-V4-Pro SGLang disagg (8k1k conc=1 smoke test)#1708

[Klaud Cold] dsv4-fp4-mi355x-sglang-disagg: DeepSeek-V4-Pro SGLang disagg (8k1k conc=1 smoke test)#1708
functionstackx wants to merge 6 commits into
mainfrom
dsv4-fp4-mi355x-sglang-disagg

functionstackx commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Serving config (models.yaml)

Image: lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260610

Topology

Scope: smoke test first

Test plan

Risks / open questions

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

Missing mainline deep_gemm JIT off

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

Setup logs false config patch

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

functionstackx commented Jun 11, 2026 •

edited by cursor Bot

Loading

Serving config (`models.yaml`)

Image: `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260610`