[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687
Open
richardhuo-nv wants to merge 1 commit into
Open
[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687richardhuo-nv wants to merge 1 commit into
richardhuo-nv wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds disaggregated TRT-LLM inference benchmarks for DeepSeek-V4-Pro in MXFP4 precision
on GB300 via the Dynamo frontend, covering both Standard Token Processing (STP) and
Multi-Token Prediction (MTP) configurations.
Changes
.github/configs/nvidia-master.yaml— 819 lines addeddsv4-fp4-gb300-dynamo-trt(STP): 27 scenarios across two sequence length regimesdsv4-fp4-gb300-dynamo-trt-mtp(MTP): 27 scenarios across two sequence length regimesnvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1NVIDIA/srt-slurmbranchsa-submission-q2-2026num-worker,tp,ep, anddp-attnper scenariorunners/launch_gb300-nv.sh— 7 lines addedframework=dynamo-trt+model_prefix=dsv4SRT_SLURM_MODEL_PREFIXtodeepseek-ai/DeepSeek-V4-Proto match therecipe's
model.pathkey (HuggingFace ID format rather than short prefix)NVIDIA/srt-slurmand checks outsa-submission-q2-2026perf-changelog.yaml— 12 lines addedNote
Low Risk
Changes are benchmark YAML, changelog, and CI runner branching only; no application runtime or auth logic.
Overview
Adds GB300 disaggregated TRT-LLM (Dynamo) benchmark coverage for DeepSeek-V4-Pro MXFP4 via two new keys in
nvidia-master.yaml:dsv4-fp4-gb300-dynamo-trt(STP) anddsv4-fp4-gb300-dynamo-trt-mtp(MTP withspec-decoding: mtp). Each key defines fixed-seq-len scenarios for ISL 1024/OSL 1024 and ISL 8192/OSL 1024, with concurrency sweeps and per-point prefill/decode topology (num-worker,tp,ep,dp-attn) wired toCONFIG_FILErecipes onNVIDIA/srt-slurmbranchsa-submission-q2-2026, using containernvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1.launch_gb300-nv.shgains adynamo-trt+dsv4path that clones that branch and overridesSRT_SLURM_MODEL_PREFIXtodeepseek-ai/DeepSeek-V4-Proso srt-slurm model aliases match the recipe’s HuggingFace-stylemodel.path(instead of the shortdsv4prefix used elsewhere).perf-changelog.yamldocuments the new config keys and coverage.Reviewed by Cursor Bugbot for commit 4ebf43a. Bugbot is set up for automated code reviews on this repo. Configure here.