Skip to content

Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026

Open
yeswanthk-lab wants to merge 7 commits intomainfrom
yeswanthk/kimi2.5_Gb200_trt_llm
Open

Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026
yeswanthk-lab wants to merge 7 commits intomainfrom
yeswanthk/kimi2.5_Gb200_trt_llm

Conversation

@yeswanthk-lab
Copy link
Copy Markdown
Collaborator

Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL 1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh to support kimik2.5 model prefix with dynamo-trt framework and clone srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.

  • New framework: dynamo-trt (Dynamo frontend + TensorRT-LLM backend)
  • 29 configs: 15x ISL 1024/OSL 1024 (8 MTP + 7 STP) + 14x ISL 8192/OSL 1024 (7 MTP + 7 STP)
  • Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
  • Runner script updated to clone NVIDIA/srt-slurm and support kimik2.5 model prefix with dynamo-trt
  • Container: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2

Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space
entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL
1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh
to support kimik2.5 model prefix with dynamo-trt framework and clone
srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
export SERVED_MODEL_NAME="deepseek-r1-fp8"
export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
elif [[ $MODEL_PREFIX == "kimik2.5" && $PRECISION == "fp4" ]]; then
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we already done this for kimi dynamo vllm submission? @nlevin-ui

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the duplicate clone block is a nit already flagged inline; configs and runner changes are correct and follow established patterns.

Extended reasoning...

Overview

The PR adds a new benchmark config (kimik2.5-fp4-gb200-dynamo-trt) with 29 search-space entries in nvidia-master.yaml, and extends launch_gb200-nv.sh to handle the kimik2.5 model prefix under dynamo-trt.

Security risks

No security concerns. Changes are benchmark config data and a CI runner shell script cloning a known NVIDIA GitHub repo.

Level of scrutiny

Benchmark/CI infrastructure code, not production logic. YAML configs mirror adjacent entries and link to upstream recipe files for traceability. Runner script changes are additive elif arms following existing patterns.

Other factors

The one bug found (duplicate git clone blocks) is a maintenance nit; code is functionally correct today and already flagged inline. No outstanding reviewer comments beyond the auto-bot recipe reminder.

Comment on lines 128 to 137
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout sa-submission-q2-2026
elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then
git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
git checkout sa-submission-q2-2026
else
git clone https://github.com/ishandhanani/srt-slurm.git "$SRT_REPO_DIR"
cd "$SRT_REPO_DIR"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new dynamo-trt+kimik2.5 git clone block (lines 131-134) is byte-for-byte identical to the existing dynamo-vllm block above it: both clone NVIDIA/srt-slurm and check out sa-submission-q2-2026. This is a maintenance hazard — a future change to the repo URL or branch name that is applied to only one block would cause silent behavioral divergence. The two conditions should be merged into a single branch: [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]].

Extended reasoning...

What the bug is and how it manifests

In runners/launch_gb200-nv.sh the script clones one of two different srt-slurm forks depending on the framework. The new PR adds a second elif arm for dynamo-trt+kimik2.5 that performs the exact same three operations as the existing dynamo-vllm arm: git clone https://github.com/NVIDIA/srt-slurm.git, cd "$SRT_REPO_DIR", and git checkout sa-submission-q2-2026. The two blocks are byte-for-byte identical.

The specific code path that triggers it

Lines 128-134 of the modified file:

if [[ $FRAMEWORK == "dynamo-vllm" ]]; then
    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"
    git checkout sa-submission-q2-2026
elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then
    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"
    git checkout sa-submission-q2-2026

There is no technical reason for two separate branches — they do exactly the same thing.

Why existing code doesn't prevent it

There is no deduplication mechanism in a shell script; the interpreter executes whatever branch matches. Nothing prevents the two arms from silently diverging if only one is updated in a future commit.

What the impact would be

The code is functionally correct today. The risk is future maintenance: when (not if) the sa-submission-q2-2026 branch is retired or the NVIDIA repo URL changes, a developer updating the script may only find and update one of the two identical blocks. The other would silently continue using the stale URL or branch, causing hard-to-diagnose failures only for one of the two frameworks.

How to fix it

Merge the two arms into a single condition:

if [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]]; then
    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"
    git checkout sa-submission-q2-2026

Step-by-step proof of the duplication

  1. Locate line 128: if [[ $FRAMEWORK == "dynamo-vllm" ]] — body clones NVIDIA/srt-slurm and checks out sa-submission-q2-2026.
  2. Locate line 131: elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]] — body is character-for-character identical to step 1.
  3. Run diff on the two blocks: zero differences.
  4. Conclusion: the bodies are redundant and should share a single conditional.

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove MTP for time being, we haven't thought enough about how to do spec decode for models that dont have native support for spec decode

Comment thread .github/configs/nvidia-master.yaml Outdated
Comment on lines +6712 to +6721
# MTP configurations (spec_decoding="mtp")
- spec-decoding: "mtp"
conc-list: [ 8, 48, 92, 192, 336 ]
prefill:
num-worker: 1
tp: 4
ep: 4
dp-attn: true
additional-settings:
# https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/kimi2.5/trtllm_dynamo/disagg/gb200Nvfp4/ISL1K_OSL1K/MTP/ctx1dep4_gen4tep8_batch64_allconc_eplb0_mtp3.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u remove MTP?

Remove speculative decoding (MTP) search-space entries from
kimik2.5-fp4-gb200-dynamo-trt for both ISL 1024/OSL 1024 and
ISL 8192/OSL 1024. Retain all 14 STP (non-MTP) configurations.
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 14, 2026

@claude can you ensure that the added config parameters in nvidia-master.yaml match the parameters in the linked recipes? (I.e., tp, ep etc)

@cquil11 cquil11 requested a review from functionstackx April 14, 2026 19:47
@cquil11 cquil11 dismissed functionstackx’s stale review April 14, 2026 19:47

this has been fixed

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 14, 2026

Claude finished @cquil11's task in 1m 59s —— View job


Verifying Config Parameters Against Linked Recipes

  • Read the added configs in nvidia-master.yaml
  • Fetch all 14 linked recipe files from NVIDIA/srt-slurm (sa-submission-q2-2026 branch)
  • Compare parameters (tp, ep, num-worker, dp-attn, conc-list) between configs and recipes
  • Report findings

Result: All 14 configs match their linked recipes

I fetched each of the 14 STP recipe YAML files from NVIDIA/srt-slurm@sa-submission-q2-2026 and compared tp, ep, num-worker (prefill/decode), dp-attn, and conc-list against the values in nvidia-master.yaml. Every entry matches.

ISL 1024 / OSL 1024 — 7 STP configs (click to expand)
# Recipe file Prefill (workers, tp, ep, dp-attn) Decode (workers, tp, ep, dp-attn) conc-list Match?
1 ctx1dep4_gen4tep8_batch128_allconc_eplb0_mtp0 1, 4, 4, true 4, 8, 8, false 4, 192, 360, 668 Yes
2 ctx1dep4_gen5tep4_batch8_allconc_eplb0_mtp0 1, 4, 4, true 5, 4, 4, false 5, 15, 30, 55 Yes
3 ctx1dep4_gen1dep16_batch32_eplb0_mtp0 1, 4, 4, true 1, 16, 16, true 666 Yes
4 ctx1dep4_gen1dep32_batch64_eplb0_mtp0 1, 4, 4, true 1, 32, 32, true 2253 Yes
5 ctx1dep4_gen1dep8_batch768_allconc_eplb0_mtp0 1, 4, 4, true 1, 8, 8, true 4301, 6452 Yes
6 ctx2dep4_gen1dep16_batch256_eplb0_mtp0 2, 4, 4, true 1, 16, 16, true 4301 Yes
7 ctx2dep4_gen1dep32_batch128_eplb0_mtp0 2, 4, 4, true 1, 32, 32, true 4301 Yes
ISL 8192 / OSL 1024 — 7 STP configs (click to expand)
# Recipe file Prefill (workers, tp, ep, dp-attn) Decode (workers, tp, ep, dp-attn) conc-list Match?
8 ctx1dep4_gen4tep8_batch1_allconc_eplb0_mtp0 1, 4, 4, true 4, 8, 8, false 4 Yes
9 ctx1dep4_gen4tep4_batch32_allconc_eplb0_mtp0 1, 4, 4, true 4, 4, 4, false 156 Yes
10 ctx1dep4_gen5tep4_batch16_allconc_eplb0_mtp0 1, 4, 4, true 5, 4, 4, false 5, 15, 30, 60, 105 Yes
11 ctx2dep4_gen1dep16_batch16_eplb0_mtp0 2, 4, 4, true 1, 16, 16, true 333 Yes
12 ctx3dep4_gen1dep16_batch32_eplb0_mtp0 3, 4, 4, true 1, 16, 16, true 615 Yes
13 ctx5dep4_gen1dep8_batch256_allconc_eplb0_mtp0 5, 4, 4, true 1, 8, 8, true 2151 Yes
14 ctx7dep4_gen1dep16_batch128_eplb0_mtp0 7, 4, 4, true 1, 16, 16, true 2253 Yes

Summary

All parallelism parameters (tp, ep, dp-attn), worker counts (num-worker for prefill and decode), and concurrency lists in nvidia-master.yaml are consistent with their corresponding upstream recipe files. No discrepancies found.


@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 14, 2026

@claude add perf changelog entry

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 14, 2026

Claude finished @cquil11's task in 53s —— View job


Adding Perf Changelog Entry

  • Read current perf-changelog.yaml format
  • Read nvidia-master.yaml changes for config keys
  • Add perf changelog entry
  • Commit and push changes

Added perf changelog entry for kimik2.5-fp4-gb200-dynamo-trt covering:

  • 14 STP disaggregated configs (7x ISL 1024/OSL 1024 + 7x ISL 8192/OSL 1024)
  • New dynamo-trt framework
  • Container image and recipe source
  • Runner script update

View job |

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 14, 2026

@yeswanthk-lab

Error: -14 20:24:28 [ERROR] Fatal error: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunners/gharunner2/actions-runner/_work/InferenceX/InferenceX/srt-slurm/nvidia/Kimi-K2.5-NVFP4

@yeswanthk-lab
Copy link
Copy Markdown
Collaborator Author

@yeswanthk-lab

Error: -14 20:24:28 [ERROR] Fatal error: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunners/gharunner2/actions-runner/_work/InferenceX/InferenceX/srt-slurm/nvidia/Kimi-K2.5-NVFP4

addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants