Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026
Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026yeswanthk-lab wants to merge 7 commits intomainfrom
Conversation
Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL 1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh to support kimik2.5 model prefix with dynamo-trt framework and clone srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
| export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/" | ||
| export SERVED_MODEL_NAME="deepseek-r1-fp8" | ||
| export SRT_SLURM_MODEL_PREFIX="dsr1-fp8" | ||
| elif [[ $MODEL_PREFIX == "kimik2.5" && $PRECISION == "fp4" ]]; then |
There was a problem hiding this comment.
have we already done this for kimi dynamo vllm submission? @nlevin-ui
There was a problem hiding this comment.
LGTM — the duplicate clone block is a nit already flagged inline; configs and runner changes are correct and follow established patterns.
Extended reasoning...
Overview
The PR adds a new benchmark config (kimik2.5-fp4-gb200-dynamo-trt) with 29 search-space entries in nvidia-master.yaml, and extends launch_gb200-nv.sh to handle the kimik2.5 model prefix under dynamo-trt.
Security risks
No security concerns. Changes are benchmark config data and a CI runner shell script cloning a known NVIDIA GitHub repo.
Level of scrutiny
Benchmark/CI infrastructure code, not production logic. YAML configs mirror adjacent entries and link to upstream recipe files for traceability. Runner script changes are additive elif arms following existing patterns.
Other factors
The one bug found (duplicate git clone blocks) is a maintenance nit; code is functionally correct today and already flagged inline. No outstanding reviewer comments beyond the auto-bot recipe reminder.
| git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" | ||
| cd "$SRT_REPO_DIR" | ||
| git checkout sa-submission-q2-2026 | ||
| elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then | ||
| git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" | ||
| cd "$SRT_REPO_DIR" | ||
| git checkout sa-submission-q2-2026 | ||
| else | ||
| git clone https://github.com/ishandhanani/srt-slurm.git "$SRT_REPO_DIR" | ||
| cd "$SRT_REPO_DIR" |
There was a problem hiding this comment.
🟡 The new dynamo-trt+kimik2.5 git clone block (lines 131-134) is byte-for-byte identical to the existing dynamo-vllm block above it: both clone NVIDIA/srt-slurm and check out sa-submission-q2-2026. This is a maintenance hazard — a future change to the repo URL or branch name that is applied to only one block would cause silent behavioral divergence. The two conditions should be merged into a single branch: [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]].
Extended reasoning...
What the bug is and how it manifests
In runners/launch_gb200-nv.sh the script clones one of two different srt-slurm forks depending on the framework. The new PR adds a second elif arm for dynamo-trt+kimik2.5 that performs the exact same three operations as the existing dynamo-vllm arm: git clone https://github.com/NVIDIA/srt-slurm.git, cd "$SRT_REPO_DIR", and git checkout sa-submission-q2-2026. The two blocks are byte-for-byte identical.
The specific code path that triggers it
Lines 128-134 of the modified file:
if [[ $FRAMEWORK == "dynamo-vllm" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout sa-submission-q2-2026
elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout sa-submission-q2-2026There is no technical reason for two separate branches — they do exactly the same thing.
Why existing code doesn't prevent it
There is no deduplication mechanism in a shell script; the interpreter executes whatever branch matches. Nothing prevents the two arms from silently diverging if only one is updated in a future commit.
What the impact would be
The code is functionally correct today. The risk is future maintenance: when (not if) the sa-submission-q2-2026 branch is retired or the NVIDIA repo URL changes, a developer updating the script may only find and update one of the two identical blocks. The other would silently continue using the stale URL or branch, causing hard-to-diagnose failures only for one of the two frameworks.
How to fix it
Merge the two arms into a single condition:
if [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout sa-submission-q2-2026Step-by-step proof of the duplication
- Locate line 128:
if [[ $FRAMEWORK == "dynamo-vllm" ]]— body clones NVIDIA/srt-slurm and checks outsa-submission-q2-2026. - Locate line 131:
elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]— body is character-for-character identical to step 1. - Run
diffon the two blocks: zero differences. - Conclusion: the bodies are redundant and should share a single conditional.
functionstackx
left a comment
There was a problem hiding this comment.
remove MTP for time being, we haven't thought enough about how to do spec decode for models that dont have native support for spec decode
| # MTP configurations (spec_decoding="mtp") | ||
| - spec-decoding: "mtp" | ||
| conc-list: [ 8, 48, 92, 192, 336 ] | ||
| prefill: | ||
| num-worker: 1 | ||
| tp: 4 | ||
| ep: 4 | ||
| dp-attn: true | ||
| additional-settings: | ||
| # https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/kimi2.5/trtllm_dynamo/disagg/gb200Nvfp4/ISL1K_OSL1K/MTP/ctx1dep4_gen4tep8_batch64_allconc_eplb0_mtp3.yaml |
Remove speculative decoding (MTP) search-space entries from kimik2.5-fp4-gb200-dynamo-trt for both ISL 1024/OSL 1024 and ISL 8192/OSL 1024. Retain all 14 STP (non-MTP) configurations.
|
@claude can you ensure that the added config parameters in nvidia-master.yaml match the parameters in the linked recipes? (I.e., tp, ep etc) |
|
Claude finished @cquil11's task in 1m 59s —— View job Verifying Config Parameters Against Linked Recipes
Result: All 14 configs match their linked recipesI fetched each of the 14 STP recipe YAML files from ISL 1024 / OSL 1024 — 7 STP configs (click to expand)
ISL 8192 / OSL 1024 — 7 STP configs (click to expand)
SummaryAll parallelism parameters ( |
|
@claude add perf changelog entry |
|
Claude finished @cquil11's task in 53s —— View job Adding Perf Changelog Entry
Added perf changelog entry for
View job | |
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
|
…odel.path Made-with: Cursor
addressed |
Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL 1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh to support kimik2.5 model prefix with dynamo-trt framework and clone srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.
nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2