Skip to content

[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687

Open
richardhuo-nv wants to merge 1 commit into
SemiAnalysisAI:mainfrom
richardhuo-nv:rihuo/submit_trtllm_disagg_dsv4
Open

[NV] DeepSeek-V4-Pro trtllm disagg receipes for STP and MTP#1687
richardhuo-nv wants to merge 1 commit into
SemiAnalysisAI:mainfrom
richardhuo-nv:rihuo/submit_trtllm_disagg_dsv4

Conversation

@richardhuo-nv

@richardhuo-nv richardhuo-nv commented Jun 8, 2026

Copy link
Copy Markdown

Summary

Adds disaggregated TRT-LLM inference benchmarks for DeepSeek-V4-Pro in MXFP4 precision
on GB300 via the Dynamo frontend, covering both Standard Token Processing (STP) and
Multi-Token Prediction (MTP) configurations.

Changes

.github/configs/nvidia-master.yaml — 819 lines added

  • New config key dsv4-fp4-gb300-dynamo-trt (STP): 27 scenarios across two sequence length regimes
    • ISL 1024 / OSL 1024: 14 concurrency points (4 → 16384)
    • ISL 8192 / OSL 1024: 13 concurrency points (4 → 4301)
  • New config key dsv4-fp4-gb300-dynamo-trt-mtp (MTP): 27 scenarios across two sequence length regimes
    • ISL 1024 / OSL 1024: 14 concurrency points (8 → 8192)
    • ISL 8192 / OSL 1024: 13 concurrency points (8 → 4301)
  • Container: nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1
  • Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
  • Disaggregated prefill/decode topology with varying num-worker, tp, ep, and
    dp-attn per scenario

runners/launch_gb300-nv.sh — 7 lines added

  • New branch for framework=dynamo-trt + model_prefix=dsv4
  • Overrides SRT_SLURM_MODEL_PREFIX to deepseek-ai/DeepSeek-V4-Pro to match the
    recipe's model.path key (HuggingFace ID format rather than short prefix)
  • Clones NVIDIA/srt-slurm and checks out sa-submission-q2-2026

perf-changelog.yaml — 12 lines added

  • Documents new config keys, concurrency coverage, container, and recipe source

Note

Low Risk
Changes are benchmark YAML, changelog, and CI runner branching only; no application runtime or auth logic.

Overview
Adds GB300 disaggregated TRT-LLM (Dynamo) benchmark coverage for DeepSeek-V4-Pro MXFP4 via two new keys in nvidia-master.yaml: dsv4-fp4-gb300-dynamo-trt (STP) and dsv4-fp4-gb300-dynamo-trt-mtp (MTP with spec-decoding: mtp). Each key defines fixed-seq-len scenarios for ISL 1024/OSL 1024 and ISL 8192/OSL 1024, with concurrency sweeps and per-point prefill/decode topology (num-worker, tp, ep, dp-attn) wired to CONFIG_FILE recipes on NVIDIA/srt-slurm branch sa-submission-q2-2026, using container nvcr.io#nvidia/ai-dynamo/tensorrtllm-runtime:1.3.0-deepseek-v4-dev.1.

launch_gb300-nv.sh gains a dynamo-trt + dsv4 path that clones that branch and overrides SRT_SLURM_MODEL_PREFIX to deepseek-ai/DeepSeek-V4-Pro so srt-slurm model aliases match the recipe’s HuggingFace-style model.path (instead of the short dsv4 prefix used elsewhere). perf-changelog.yaml documents the new config keys and coverage.

Reviewed by Cursor Bugbot for commit 4ebf43a. Bugbot is set up for automated code reviews on this repo. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant