[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP by richardhuo-nv · Pull Request #1687 · SemiAnalysisAI/InferenceX

richardhuo-nv · 2026-06-08T19:22:23Z

Summary

Adds disaggregated TRT-LLM inference benchmarks for DeepSeek-V4-Pro in MXFP4 precision
on GB300 via the Dynamo frontend, covering both Standard Token Processing (STP) and
Multi-Token Prediction (MTP) configurations.

Changes

.github/configs/nvidia-master.yaml — 819 lines added

New config key dsv4-fp4-gb300-dynamo-trt (STP): 27 scenarios across two sequence length regimes
- ISL 1024 / OSL 1024: 14 concurrency points (4 → 16384)
- ISL 8192 / OSL 1024: 13 concurrency points (4 → 4301)
New config key dsv4-fp4-gb300-dynamo-trt-mtp (MTP): 27 scenarios across two sequence length regimes
- ISL 1024 / OSL 1024: 14 concurrency points (8 → 8192)
- ISL 8192 / OSL 1024: 13 concurrency points (8 → 4301)
Container: nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1
Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
Disaggregated prefill/decode topology with varying num-worker, tp, ep, and
dp-attn per scenario

runners/launch_gb300-nv.sh — 7 lines added

New branch for framework=dynamo-trt + model_prefix=dsv4
Overrides SRT_SLURM_MODEL_PREFIX to deepseek-ai/DeepSeek-V4-Pro to match the
recipe's model.path key (HuggingFace ID format rather than short prefix)
Clones NVIDIA/srt-slurm and checks out sa-submission-q2-2026

perf-changelog.yaml — 12 lines added

Documents new config keys, concurrency coverage, container, and recipe source

Note

Low Risk
Changes are benchmark YAML, changelog, and CI runner branching only; no application runtime or auth logic.

Overview
Adds GB300 disaggregated TRT-LLM (Dynamo) benchmark coverage for DeepSeek-V4-Pro MXFP4 via two new keys in nvidia-master.yaml: dsv4-fp4-gb300-dynamo-trt (STP) and dsv4-fp4-gb300-dynamo-trt-mtp (MTP with spec-decoding: mtp). Each key defines fixed-seq-len scenarios for ISL 1024/OSL 1024 and ISL 8192/OSL 1024, with concurrency sweeps and per-point prefill/decode topology (num-worker, tp, ep, dp-attn) wired to CONFIG_FILE recipes on NVIDIA/srt-slurm branch sa-submission-q2-2026, using container nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1.

launch_gb300-nv.sh gains a dynamo-trt + dsv4 path that clones that branch and overrides SRT_SLURM_MODEL_PREFIX to deepseek-ai/DeepSeek-V4-Pro so srt-slurm model aliases match the recipe’s HuggingFace-style model.path (instead of the short dsv4 prefix used elsewhere). perf-changelog.yaml documents the new config keys and coverage.

^{Reviewed by Cursor Bugbot for commit 4ebf43a. Bugbot is set up for automated code reviews on this repo. Configure here.}

DeepSeek V4 Pro trtllm disagg receipes for STP and MTP

4ebf43a

richardhuo-nv requested a review from a team June 8, 2026 19:22

richardhuo-nv requested review from jgangani and kedarpotdar-nv as code owners June 8, 2026 19:22

github-project-automation Bot added this to InferenceMAX Board Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687

[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687
richardhuo-nv wants to merge 1 commit into
SemiAnalysisAI:mainfrom
richardhuo-nv:rihuo/submit_trtllm_disagg_dsv4

richardhuo-nv commented Jun 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

richardhuo-nv commented Jun 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

richardhuo-nv commented Jun 8, 2026 •

edited by cursor Bot

Loading