[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test) by functionstackx · Pull Request #1707 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-11T04:24:32Z

Summary

Adds dsv4-fp4-mi355x-vllm-disagg — a DeepSeek-V4-Pro disaggregated prefill/decode benchmark on MI355X via vLLM + MoRI-IO. It combines the two pieces this work was scoped against:

the validated single-node DSv4 vLLM serving recipe (dsv4-fp4-mi355x-vllm, from vllm-project/recipes#433, landed here in dsv4-fp4-mi355x-vllm and adopt recipes#433 #1374), and
the vLLM-disagg framework (MoRI-IO P/D, standalone router) introduced for the kimi / minimax MI355X recipes (#1141, #1569).

Disaggregation only adds the MoRIIO kv-transfer role to each worker; the per-node engine config is otherwise identical to the known-good aggregated run.

Files

File	Change
`benchmarks/multi_node/dsv4_fp4_mi355x_vllm-disagg.sh`	New model-agnostic launcher; identical in shape to the kimi/minimax `vllm-disagg` wrappers (`launch_mi355x-amds.sh` resolves `dsv4`+`fp4`+`vllm-disagg` → this filename).
`benchmarks/multi_node/amd_utils/models_vllm.yaml`	New `DeepSeek-V4-Pro` entry — prefill/decode flags + env keyed on `MODEL_NAME`.
`.github/configs/amd-master.yaml`	New `dsv4-fp4-mi355x-vllm-disagg` block.

Serving config (`models_vllm.yaml`)

Per-node flags reuse the aggregated recipe verbatim, so the engine config matches the known-good single-node run:

--moe-backend triton_unfused — required for the FP4 MoE expert weight format (auto backend doesn't register the FP4 scale params → safetensors KeyError).
--tokenizer-mode deepseek_v4 --reasoning-parser deepseek_v4 --kv-cache-dtype fp8 --no-enable-prefix-caching --distributed-executor-backend mp --gpu-memory-utilization 0.9 --max-num-batched-tokens 8192.
--enforce-eager — no CUDA graphs, to keep the first disagg recipe robust against cudagraph/MoRIIO-hook interactions (FULL/PIECEWISE capture is a follow-up).
--async-scheduling intentionally omitted (not used by the kimi/minimax vllm-disagg recipes).
env: VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 VLLM_ENGINE_READY_TIMEOUT_S=3600.

Image: `vllm/vllm-openai-rocm:nightly-3f0a91bb…` (patch-free, via #1585)

This PR also folds in #1585 ("Remove MoRI-IO patches from vLLM Disagg benchmarks"), so all three vllm-disagg recipes (kimi, minimax, dsv4) run patch-free on the same nightly:

setup_deps.sh drops ~557 lines of runtime MoRIIO Python patches — they were upstreamed in vllm#40344 (merged 2026-05-28).
a2a backend mori → mori_low_latency; read-mode is now set via read_mode: true in kv_connector_extra_config (server_vllm.sh) instead of the VLLM_MORIIO_CONNECTOR_READ_MODE env var.

Why the nightly and not a release tag: vllm#40344 is not in v0.22.0 or v0.22.1 — it landed ~1 day before the v0.22.0 cut and wasn't backported (both release trees have zero read_mode). So the patch-free path requires the nightly. nightly-3f0a91bb96f8d72e0498b95c166e817deae14d62 (2026-06-03) carries #40344 and DeepseekV4ForCausalLM (vllm#40871) and the MoRIIO connector (vllm#29304); it's confirmed live on Docker Hub. (Note: the GC'd-tag risk I'd flagged for the old disagg nightlies applies here too, but this is the maintained image the kimi/minimax recipes now share, so the disagg cluster caches it.)

Topology

1P1D — 1 prefill node + 1 decode node (2 nodes total), each a full TP=8, EP=1 worker. This matches the aggregated recipe, which runs DSv4 on TP=8 without expert parallelism (--moe-backend triton_unfused handles the FP4 sharding at TP=8, so EP is not required to load; at EP=1 there is no all2all backend, so the mori_low_latency rename from #1585 doesn't touch this recipe). DEP decode and multi-node 1P2D are follow-ups once the base path validates.

Scope: smoke test first

Per request, this starts minimal: a single ISL/OSL (8k/1k) at a single conc=1, to validate the path end-to-end (image pull, MoRIIO transport, serving flags, model staging on the disagg cluster) before expanding to the full 1k1k + 8k1k, conc 8-512 sweep the kimi/minimax recipes run. At conc=1 the generator emits one config and skips eval.

Validated locally:

$ generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml \
    --framework vllm-disagg --runner-type mi355x-disagg
# 1 dsv4_8k1k config: isl=8192 osl=1024 conc=[1], 1P1D TP8/EP1, image v0.22.0
$ validate_master_config(amd-master.yaml)  # all 75 entries valid

Test plan

Apply the sweep label so run-sweep.yml exercises dsv4-fp4-mi355x-vllm-disagg at 8k1k/conc=1.
Confirm nightly-3f0a91bb… imports on a fresh MI355X-disagg runner and exposes DeepseekV4ForCausalLM, the MoRIIO connector, and the native read_mode flag (no setup_deps patches).
Confirm DeepSeek-V4-Pro is staged on the disagg cluster (models--deepseek-ai--DeepSeek-V4-Pro or DeepSeek-V4-Pro under MODEL_DIR).
On green: expand to 1k1k + 8k1k, conc 8-512 (and evaluate DEP decode / 1P2D).

🤖 Generated with Claude Code

Note

Medium Risk
Touches shared multi-node vLLM-disagg plumbing (images, MoRI-IO config, large setup_deps removal) for kimi/minimax as well as the new DSv4 path; benchmark/infra risk rather than app auth or data handling.

Overview
Adds dsv4-fp4-mi355x-vllm-disagg, a DeepSeek-V4-Pro disaggregated prefill/decode benchmark on MI355X (vLLM + MoRI-IO), with a new launcher script, DeepSeek-V4-Pro serving flags in models_vllm.yaml (aligned with the single-node DSv4 recipe), and an amd-master.yaml entry scoped as an 8k/1k, conc=1 smoke test on 1P1D TP8/EP1.

MoRI-IO patch-free path (#1585) is folded in for all MI355X vllm-disagg recipes: kimi/minimax move to nightly-3f0a91bb…, setup_deps.sh drops ~550 lines of runtime MoRIIO Python patches, VLLM_MORIIO_CONNECTOR_READ_MODE is removed from Slurm/submit/docker env, and read_mode: true is set in server_vllm.sh kv_connector_extra_config. Kimi/MiniMax decode all2all backend is renamed mori → mori_low_latency; the vLLM router default image is bumped.

perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit 46ffe59. Bugbot is set up for automated code reviews on this repo. Configure here.}

…m image Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

…atches

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

… (1k1k conc=1 smoke test) Adds a DeepSeek-V4-Pro disaggregated prefill/decode recipe on MI355X via vLLM + MoRI-IO, combining the validated single-node DSv4 vLLM serving recipe (dsv4-fp4-mi355x-vllm, vllm-project/recipes#433) with the vLLM-disagg framework introduced for the kimi / minimax mi355x recipes (#1141, #1569). - benchmarks/multi_node/dsv4_fp4_mi355x_vllm-disagg.sh: model-agnostic launcher (identical in shape to the kimi/minimax wrappers). - amd_utils/models_vllm.yaml: DeepSeek-V4-Pro entry. Per-node serving flags reuse the aggregated recipe verbatim (--moe-backend triton_unfused required for the FP4 expert format, deepseek_v4 tokenizer/reasoning parser, fp8 KV, --enforce-eager); only the MoRIIO kv-transfer role is added by the framework. - amd-master.yaml: dsv4-fp4-mi355x-vllm-disagg, 1P1D (TP8/EP1 prefill+decode), image v0.22.0 (carries both DeepseekV4ForCausalLM and the MoRIIO connector, and stays pullable unlike the GC'd nightly tags). Starts with a single ISL/OSL (1k/1k) at conc=1 to smoke-test the path end-to-end before expanding to the full 1k1k + 8k1k, conc 8-512 sweep. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T04:24:39Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T04:28:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27323783591
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27323783591

…ch-free nightly Brings the vLLM-disagg infra onto the upstream-MoRIIO nightly so the large setup_deps.sh runtime patches are dropped (vllm#40344), and migrates the new dsv4-fp4-mi355x-vllm-disagg recipe to match: - image -> vllm/vllm-openai-rocm:nightly-3f0a91bb (carries #40344 + DeepseekV4); not available in v0.22.0/v0.22.1 release tags - drop VLLM_MORIIO_CONNECTOR_READ_MODE env setting (read_mode now set via kv_connector_extra_config in server_vllm.sh) - dsv4 is TP8/EP1 so no all2all backend / mori_low_latency rename needed Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-11T06:49:51Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27324475132
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27324475132

Remove the kimik2.5/minimaxm2.5 vllm-disagg changelog entry (that change is documented in #1585) and scrub kimi/minimax references from the dsv4-fp4-mi355x-vllm-disagg entry descriptions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-06-12T04:54:41Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27395381813
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27395381813

…disagg

github-actions · 2026-06-12T05:46:27Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27395515912
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27395515912

github-actions · 2026-06-12T06:44:31Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27395515912
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27395515912

simondanielsson and others added 9 commits May 29, 2026 09:33

fix: remove moriio connector patches after bumping to new nightly vll…

89b9243

…m image Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

fix: move read mode envvar to flag

d2aadee

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

fix: unpin vllm-router image

24555d3

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

fix: update to mori_low_latency backend after rename in nightly

dfdbc7d

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

fix: pin nightlyies

0c16e44

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

fix: comments and add perf-changelog.yml

3c94a6f

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

Merge remote-tracking branch 'upstream' into fix/remove-vllm-disagg-p…

3412624

…atches

fix: pint router iamge as well

f3b4132

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>

functionstackx requested a review from a team June 11, 2026 04:24

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 11, 2026 04:24

github-project-automation Bot added this to InferenceMAX Board Jun 11, 2026

functionstackx and others added 2 commits June 11, 2026 00:24

perf-changelog: add dsv4-fp4-mi355x-vllm-disagg entry (#1707)

75cc33e

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

dsv4-fp4-mi355x-vllm-disagg: switch smoke test from 1k1k to 8k1k

a60d592

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

functionstackx changed the title ~~[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (1k1k conc=1 smoke test)~~ [Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test) Jun 11, 2026

functionstackx added full-sweep-enabled and removed full-sweep-enabled labels Jun 11, 2026

functionstackx added the full-sweep-enabled label Jun 11, 2026

Merge remote-tracking branch 'origin/main' into dsv4-fp4-mi355x-vllm-…

46ffe59

…disagg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test)#1707

[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test)#1707
functionstackx wants to merge 14 commits into
mainfrom
dsv4-fp4-mi355x-vllm-disagg

functionstackx commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

functionstackx commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Serving config (models_vllm.yaml)

Image: vllm/vllm-openai-rocm:nightly-3f0a91bb… (patch-free, via #1585)

Topology

Scope: smoke test first

Test plan

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

functionstackx commented Jun 11, 2026 •

edited by cursor Bot

Loading

Serving config (`models_vllm.yaml`)

Image: `vllm/vllm-openai-rocm:nightly-3f0a91bb…` (patch-free, via #1585)