Skip to content

[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test)#1707

Open
functionstackx wants to merge 14 commits into
mainfrom
dsv4-fp4-mi355x-vllm-disagg
Open

[Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test)#1707
functionstackx wants to merge 14 commits into
mainfrom
dsv4-fp4-mi355x-vllm-disagg

Conversation

@functionstackx

@functionstackx functionstackx commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds dsv4-fp4-mi355x-vllm-disagg — a DeepSeek-V4-Pro disaggregated prefill/decode benchmark on MI355X via vLLM + MoRI-IO. It combines the two pieces this work was scoped against:

Disaggregation only adds the MoRIIO kv-transfer role to each worker; the per-node engine config is otherwise identical to the known-good aggregated run.

Files

File Change
benchmarks/multi_node/dsv4_fp4_mi355x_vllm-disagg.sh New model-agnostic launcher; identical in shape to the kimi/minimax vllm-disagg wrappers (launch_mi355x-amds.sh resolves dsv4+fp4+vllm-disagg → this filename).
benchmarks/multi_node/amd_utils/models_vllm.yaml New DeepSeek-V4-Pro entry — prefill/decode flags + env keyed on MODEL_NAME.
.github/configs/amd-master.yaml New dsv4-fp4-mi355x-vllm-disagg block.

Serving config (models_vllm.yaml)

Per-node flags reuse the aggregated recipe verbatim, so the engine config matches the known-good single-node run:

  • --moe-backend triton_unfusedrequired for the FP4 MoE expert weight format (auto backend doesn't register the FP4 scale params → safetensors KeyError).
  • --tokenizer-mode deepseek_v4 --reasoning-parser deepseek_v4 --kv-cache-dtype fp8 --no-enable-prefix-caching --distributed-executor-backend mp --gpu-memory-utilization 0.9 --max-num-batched-tokens 8192.
  • --enforce-eager — no CUDA graphs, to keep the first disagg recipe robust against cudagraph/MoRIIO-hook interactions (FULL/PIECEWISE capture is a follow-up).
  • --async-scheduling intentionally omitted (not used by the kimi/minimax vllm-disagg recipes).
  • env: VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 VLLM_ENGINE_READY_TIMEOUT_S=3600.

Image: vllm/vllm-openai-rocm:nightly-3f0a91bb… (patch-free, via #1585)

This PR also folds in #1585 ("Remove MoRI-IO patches from vLLM Disagg benchmarks"), so all three vllm-disagg recipes (kimi, minimax, dsv4) run patch-free on the same nightly:

  • setup_deps.sh drops ~557 lines of runtime MoRIIO Python patches — they were upstreamed in vllm#40344 (merged 2026-05-28).
  • a2a backend morimori_low_latency; read-mode is now set via read_mode: true in kv_connector_extra_config (server_vllm.sh) instead of the VLLM_MORIIO_CONNECTOR_READ_MODE env var.

Why the nightly and not a release tag: vllm#40344 is not in v0.22.0 or v0.22.1 — it landed ~1 day before the v0.22.0 cut and wasn't backported (both release trees have zero read_mode). So the patch-free path requires the nightly. nightly-3f0a91bb96f8d72e0498b95c166e817deae14d62 (2026-06-03) carries #40344 and DeepseekV4ForCausalLM (vllm#40871) and the MoRIIO connector (vllm#29304); it's confirmed live on Docker Hub. (Note: the GC'd-tag risk I'd flagged for the old disagg nightlies applies here too, but this is the maintained image the kimi/minimax recipes now share, so the disagg cluster caches it.)

Topology

1P1D — 1 prefill node + 1 decode node (2 nodes total), each a full TP=8, EP=1 worker. This matches the aggregated recipe, which runs DSv4 on TP=8 without expert parallelism (--moe-backend triton_unfused handles the FP4 sharding at TP=8, so EP is not required to load; at EP=1 there is no all2all backend, so the mori_low_latency rename from #1585 doesn't touch this recipe). DEP decode and multi-node 1P2D are follow-ups once the base path validates.

Scope: smoke test first

Per request, this starts minimal: a single ISL/OSL (8k/1k) at a single conc=1, to validate the path end-to-end (image pull, MoRIIO transport, serving flags, model staging on the disagg cluster) before expanding to the full 1k1k + 8k1k, conc 8-512 sweep the kimi/minimax recipes run. At conc=1 the generator emits one config and skips eval.

Validated locally:

$ generate_sweep_configs.py full-sweep --config-files .github/configs/amd-master.yaml \
    --framework vllm-disagg --runner-type mi355x-disagg
# 1 dsv4_8k1k config: isl=8192 osl=1024 conc=[1], 1P1D TP8/EP1, image v0.22.0
$ validate_master_config(amd-master.yaml)  # all 75 entries valid

Test plan

  • Apply the sweep label so run-sweep.yml exercises dsv4-fp4-mi355x-vllm-disagg at 8k1k/conc=1.
  • Confirm nightly-3f0a91bb… imports on a fresh MI355X-disagg runner and exposes DeepseekV4ForCausalLM, the MoRIIO connector, and the native read_mode flag (no setup_deps patches).
  • Confirm DeepSeek-V4-Pro is staged on the disagg cluster (models--deepseek-ai--DeepSeek-V4-Pro or DeepSeek-V4-Pro under MODEL_DIR).
  • On green: expand to 1k1k + 8k1k, conc 8-512 (and evaluate DEP decode / 1P2D).

🤖 Generated with Claude Code


Note

Medium Risk
Touches shared multi-node vLLM-disagg plumbing (images, MoRI-IO config, large setup_deps removal) for kimi/minimax as well as the new DSv4 path; benchmark/infra risk rather than app auth or data handling.

Overview
Adds dsv4-fp4-mi355x-vllm-disagg, a DeepSeek-V4-Pro disaggregated prefill/decode benchmark on MI355X (vLLM + MoRI-IO), with a new launcher script, DeepSeek-V4-Pro serving flags in models_vllm.yaml (aligned with the single-node DSv4 recipe), and an amd-master.yaml entry scoped as an 8k/1k, conc=1 smoke test on 1P1D TP8/EP1.

MoRI-IO patch-free path (#1585) is folded in for all MI355X vllm-disagg recipes: kimi/minimax move to nightly-3f0a91bb…, setup_deps.sh drops ~550 lines of runtime MoRIIO Python patches, VLLM_MORIIO_CONNECTOR_READ_MODE is removed from Slurm/submit/docker env, and read_mode: true is set in server_vllm.sh kv_connector_extra_config. Kimi/MiniMax decode all2all backend is renamed morimori_low_latency; the vLLM router default image is bumped.

perf-changelog.yaml documents the new config key.

Reviewed by Cursor Bugbot for commit 46ffe59. Bugbot is set up for automated code reviews on this repo. Configure here.

simondanielsson and others added 9 commits May 29, 2026 09:33
…m image

Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
… (1k1k conc=1 smoke test)

Adds a DeepSeek-V4-Pro disaggregated prefill/decode recipe on MI355X via
vLLM + MoRI-IO, combining the validated single-node DSv4 vLLM serving recipe
(dsv4-fp4-mi355x-vllm, vllm-project/recipes#433) with the vLLM-disagg framework
introduced for the kimi / minimax mi355x recipes (#1141, #1569).

- benchmarks/multi_node/dsv4_fp4_mi355x_vllm-disagg.sh: model-agnostic launcher
  (identical in shape to the kimi/minimax wrappers).
- amd_utils/models_vllm.yaml: DeepSeek-V4-Pro entry. Per-node serving flags
  reuse the aggregated recipe verbatim (--moe-backend triton_unfused required
  for the FP4 expert format, deepseek_v4 tokenizer/reasoning parser, fp8 KV,
  --enforce-eager); only the MoRIIO kv-transfer role is added by the framework.
- amd-master.yaml: dsv4-fp4-mi355x-vllm-disagg, 1P1D (TP8/EP1 prefill+decode),
  image v0.22.0 (carries both DeepseekV4ForCausalLM and the MoRIIO connector,
  and stays pullable unlike the GC'd nightly tags).

Starts with a single ISL/OSL (1k/1k) at conc=1 to smoke-test the path end-to-end
before expanding to the full 1k1k + 8k1k, conc 8-512 sweep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

functionstackx and others added 2 commits June 11, 2026 00:24
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@functionstackx functionstackx changed the title [Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (1k1k conc=1 smoke test) [Klaud Cold] dsv4-fp4-mi355x-vllm-disagg: DeepSeek-V4-Pro vLLM disagg (8k1k conc=1 smoke test) Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

…ch-free nightly

Brings the vLLM-disagg infra onto the upstream-MoRIIO nightly so the large
setup_deps.sh runtime patches are dropped (vllm#40344), and migrates the new
dsv4-fp4-mi355x-vllm-disagg recipe to match:
- image -> vllm/vllm-openai-rocm:nightly-3f0a91bb (carries #40344 + DeepseekV4);
  not available in v0.22.0/v0.22.1 release tags
- drop VLLM_MORIIO_CONNECTOR_READ_MODE env setting (read_mode now set via
  kv_connector_extra_config in server_vllm.sh)
- dsv4 is TP8/EP1 so no all2all backend / mori_low_latency rename needed

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Remove the kimik2.5/minimaxm2.5 vllm-disagg changelog entry (that
change is documented in #1585) and scrub kimi/minimax references from
the dsv4-fp4-mi355x-vllm-disagg entry descriptions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants