Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo by yeswanthk-lab · Pull Request #1026 · SemiAnalysisAI/InferenceX

yeswanthk-lab · 2026-04-13T17:30:05Z

Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL 1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh to support kimik2.5 model prefix with dynamo-trt framework and clone srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.

New framework: dynamo-trt (Dynamo frontend + TensorRT-LLM backend)
29 configs: 15x ISL 1024/OSL 1024 (8 MTP + 7 STP) + 14x ISL 8192/OSL 1024 (7 MTP + 7 STP)
Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
Runner script updated to clone NVIDIA/srt-slurm and support kimik2.5 model prefix with dynamo-trt
Container: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:1.1.0-dev.2

Add kimik2.5-fp4-gb200-dynamo-trt configuration with 29 search-space entries covering ISL 1024/OSL 1024 (8 MTP + 7 STP) and ISL 8192/OSL 1024 (7 MTP + 7 STP) disaggregated configs. Update launch_gb200-nv.sh to support kimik2.5 model prefix with dynamo-trt framework and clone srt-slurm from NVIDIA/srt-slurm@sa-submission-q2-2026.

github-actions · 2026-04-13T17:30:17Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

kedarpotdar-nv · 2026-04-13T17:41:12Z

        export MODEL_PATH="/mnt/numa1/groups/sa-shared/models/deepseek-r1-0528/"
        export SERVED_MODEL_NAME="deepseek-r1-fp8"
        export SRT_SLURM_MODEL_PREFIX="dsr1-fp8"
+    elif [[ $MODEL_PREFIX == "kimik2.5" && $PRECISION == "fp4" ]]; then


have we already done this for kimi dynamo vllm submission? @nlevin-ui

claude

LGTM — the duplicate clone block is a nit already flagged inline; configs and runner changes are correct and follow established patterns.

Extended reasoning...

Overview

The PR adds a new benchmark config (kimik2.5-fp4-gb200-dynamo-trt) with 29 search-space entries in nvidia-master.yaml, and extends launch_gb200-nv.sh to handle the kimik2.5 model prefix under dynamo-trt.

Security risks

No security concerns. Changes are benchmark config data and a CI runner shell script cloning a known NVIDIA GitHub repo.

Level of scrutiny

Benchmark/CI infrastructure code, not production logic. YAML configs mirror adjacent entries and link to upstream recipe files for traceability. Runner script changes are additive elif arms following existing patterns.

Other factors

The one bug found (duplicate git clone blocks) is a maintenance nit; code is functionally correct today and already flagged inline. No outstanding reviewer comments beyond the auto-bot recipe reminder.

claude · 2026-04-13T17:43:21Z

    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"
    git checkout sa-submission-q2-2026
+elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then
+    git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR"
+    cd "$SRT_REPO_DIR"
+    git checkout sa-submission-q2-2026
 else
    git clone https://github.com/ishandhanani/srt-slurm.git "$SRT_REPO_DIR"
    cd "$SRT_REPO_DIR"


🟡 The new dynamo-trt+kimik2.5 git clone block (lines 131-134) is byte-for-byte identical to the existing dynamo-vllm block above it: both clone NVIDIA/srt-slurm and check out sa-submission-q2-2026. This is a maintenance hazard — a future change to the repo URL or branch name that is applied to only one block would cause silent behavioral divergence. The two conditions should be merged into a single branch: [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]].

Extended reasoning...

What the bug is and how it manifests

In runners/launch_gb200-nv.sh the script clones one of two different srt-slurm forks depending on the framework. The new PR adds a second elif arm for dynamo-trt+kimik2.5 that performs the exact same three operations as the existing dynamo-vllm arm: git clone https://github.com/NVIDIA/srt-slurm.git, cd "$SRT_REPO_DIR", and git checkout sa-submission-q2-2026. The two blocks are byte-for-byte identical.

The specific code path that triggers it

Lines 128-134 of the modified file:

if [[ $FRAMEWORK == "dynamo-vllm" ]]; then git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" cd "$SRT_REPO_DIR" git checkout sa-submission-q2-2026 elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]]; then git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" cd "$SRT_REPO_DIR" git checkout sa-submission-q2-2026

There is no technical reason for two separate branches — they do exactly the same thing.

Why existing code doesn't prevent it

There is no deduplication mechanism in a shell script; the interpreter executes whatever branch matches. Nothing prevents the two arms from silently diverging if only one is updated in a future commit.

What the impact would be

The code is functionally correct today. The risk is future maintenance: when (not if) the sa-submission-q2-2026 branch is retired or the NVIDIA repo URL changes, a developer updating the script may only find and update one of the two identical blocks. The other would silently continue using the stale URL or branch, causing hard-to-diagnose failures only for one of the two frameworks.

How to fix it

Merge the two arms into a single condition:

if [[ $FRAMEWORK == "dynamo-vllm" || ($FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5") ]]; then git clone https://github.com/NVIDIA/srt-slurm.git "$SRT_REPO_DIR" cd "$SRT_REPO_DIR" git checkout sa-submission-q2-2026

Step-by-step proof of the duplication

Locate line 128: if [[ $FRAMEWORK == "dynamo-vllm" ]] — body clones NVIDIA/srt-slurm and checks out sa-submission-q2-2026.

Locate line 131: elif [[ $FRAMEWORK == "dynamo-trt" && $MODEL_PREFIX == "kimik2.5" ]] — body is character-for-character identical to step 1.

Run diff on the two blocks: zero differences.

Conclusion: the bodies are redundant and should share a single conditional.

functionstackx

remove MTP for time being, we haven't thought enough about how to do spec decode for models that dont have native support for spec decode

functionstackx · 2026-04-14T00:26:22Z

+    # MTP configurations (spec_decoding="mtp")
+    - spec-decoding: "mtp"
+      conc-list: [ 8, 48, 92, 192, 336 ]
+      prefill:
+        num-worker: 1
+        tp: 4
+        ep: 4
+        dp-attn: true
+        additional-settings:
+        # https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/kimi2.5/trtllm_dynamo/disagg/gb200Nvfp4/ISL1K_OSL1K/MTP/ctx1dep4_gen4tep8_batch64_allconc_eplb0_mtp3.yaml


can u remove MTP?

Remove speculative decoding (MTP) search-space entries from kimik2.5-fp4-gb200-dynamo-trt for both ISL 1024/OSL 1024 and ISL 8192/OSL 1024. Retain all 14 STP (non-MTP) configurations.

cquil11 · 2026-04-14T19:47:18Z

@claude can you ensure that the added config parameters in nvidia-master.yaml match the parameters in the linked recipes? (I.e., tp, ep etc)

this has been fixed

Klaud-Cold · 2026-04-14T19:47:41Z

Claude finished @cquil11's task in 1m 59s —— View job

Verifying Config Parameters Against Linked Recipes

Read the added configs in nvidia-master.yaml
Fetch all 14 linked recipe files from NVIDIA/srt-slurm (sa-submission-q2-2026 branch)
Compare parameters (tp, ep, num-worker, dp-attn, conc-list) between configs and recipes
Report findings

Result: All 14 configs match their linked recipes

I fetched each of the 14 STP recipe YAML files from NVIDIA/srt-slurm@sa-submission-q2-2026 and compared tp, ep, num-worker (prefill/decode), dp-attn, and conc-list against the values in nvidia-master.yaml. Every entry matches.

ISL 1024 / OSL 1024 — 7 STP configs (click to expand)

#	Recipe file	Prefill (workers, tp, ep, dp-attn)	Decode (workers, tp, ep, dp-attn)	conc-list	Match?
1	`ctx1dep4_gen4tep8_batch128_allconc_eplb0_mtp0`	1, 4, 4, true	4, 8, 8, false	4, 192, 360, 668	Yes
2	`ctx1dep4_gen5tep4_batch8_allconc_eplb0_mtp0`	1, 4, 4, true	5, 4, 4, false	5, 15, 30, 55	Yes
3	`ctx1dep4_gen1dep16_batch32_eplb0_mtp0`	1, 4, 4, true	1, 16, 16, true	666	Yes
4	`ctx1dep4_gen1dep32_batch64_eplb0_mtp0`	1, 4, 4, true	1, 32, 32, true	2253	Yes
5	`ctx1dep4_gen1dep8_batch768_allconc_eplb0_mtp0`	1, 4, 4, true	1, 8, 8, true	4301, 6452	Yes
6	`ctx2dep4_gen1dep16_batch256_eplb0_mtp0`	2, 4, 4, true	1, 16, 16, true	4301	Yes
7	`ctx2dep4_gen1dep32_batch128_eplb0_mtp0`	2, 4, 4, true	1, 32, 32, true	4301	Yes

ISL 8192 / OSL 1024 — 7 STP configs (click to expand)

#	Recipe file	Prefill (workers, tp, ep, dp-attn)	Decode (workers, tp, ep, dp-attn)	conc-list	Match?
8	`ctx1dep4_gen4tep8_batch1_allconc_eplb0_mtp0`	1, 4, 4, true	4, 8, 8, false	4	Yes
9	`ctx1dep4_gen4tep4_batch32_allconc_eplb0_mtp0`	1, 4, 4, true	4, 4, 4, false	156	Yes
10	`ctx1dep4_gen5tep4_batch16_allconc_eplb0_mtp0`	1, 4, 4, true	5, 4, 4, false	5, 15, 30, 60, 105	Yes
11	`ctx2dep4_gen1dep16_batch16_eplb0_mtp0`	2, 4, 4, true	1, 16, 16, true	333	Yes
12	`ctx3dep4_gen1dep16_batch32_eplb0_mtp0`	3, 4, 4, true	1, 16, 16, true	615	Yes
13	`ctx5dep4_gen1dep8_batch256_allconc_eplb0_mtp0`	5, 4, 4, true	1, 8, 8, true	2151	Yes
14	`ctx7dep4_gen1dep16_batch128_eplb0_mtp0`	7, 4, 4, true	1, 16, 16, true	2253	Yes

Summary

All parallelism parameters (tp, ep, dp-attn), worker counts (num-worker for prefill and decode), and concurrency lists in nvidia-master.yaml are consistent with their corresponding upstream recipe files. No discrepancies found.

cquil11 · 2026-04-14T20:20:26Z

@claude add perf changelog entry

Klaud-Cold · 2026-04-14T20:20:46Z

Claude finished @cquil11's task in 53s —— View job

Adding Perf Changelog Entry

Read current perf-changelog.yaml format
Read nvidia-master.yaml changes for config keys
Add perf changelog entry
Commit and push changes

Added perf changelog entry for kimik2.5-fp4-gb200-dynamo-trt covering:

14 STP disaggregated configs (7x ISL 1024/OSL 1024 + 7x ISL 8192/OSL 1024)
New dynamo-trt framework
Container image and recipe source
Runner script update

View job |

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

cquil11 · 2026-04-14T20:46:22Z

@yeswanthk-lab

Error: -14 20:24:28 [ERROR] Fatal error: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunners/gharunner2/actions-runner/_work/InferenceX/InferenceX/srt-slurm/nvidia/Kimi-K2.5-NVFP4

…odel.path Made-with: Cursor

yeswanthk-lab · 2026-04-15T13:30:34Z

@yeswanthk-lab

Error: -14 20:24:28 [ERROR] Fatal error: Model path does not exist: /mnt/lustre01/users/sa-shared/gharunners/gharunner2/actions-runner/_work/InferenceX/InferenceX/srt-slurm/nvidia/Kimi-K2.5-NVFP4

addressed

yeswanthk-lab requested a review from a team April 13, 2026 17:30

yeswanthk-lab requested review from jgangani and kedarpotdar-nv as code owners April 13, 2026 17:30

github-project-automation bot added this to InferenceMAX Board Apr 13, 2026

kedarpotdar-nv added the NVIDIA label Apr 13, 2026

yeswanthk-lab added the sweep-enabled label Apr 13, 2026

kedarpotdar-nv reviewed Apr 13, 2026

View reviewed changes

claude bot reviewed Apr 13, 2026

View reviewed changes

functionstackx previously requested changes Apr 14, 2026

View reviewed changes

Remove MTP configurations from Kimi K2.5 GB200 dynamo-trt benchmarks

fd80f08

Remove speculative decoding (MTP) search-space entries from kimik2.5-fp4-gb200-dynamo-trt for both ISL 1024/OSL 1024 and ISL 8192/OSL 1024. Retain all 14 STP (non-MTP) configurations.

cquil11 requested a review from functionstackx April 14, 2026 19:47

Merge branch 'main' into yeswanthk/kimi2.5_Gb200_trt_llm

c385377

cquil11 approved these changes Apr 14, 2026

View reviewed changes

Merge branch 'main' into yeswanthk/kimi2.5_Gb200_trt_llm

a4dfb8f

Add perf changelog entry for Kimi K2.5 NVFP4 GB200 dynamo-trt benchmarks

072a372

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

Fix SRT_SLURM_MODEL_PREFIX for Kimi K2.5 dynamo-trt to match recipe m…

49f12bc

…odel.path Made-with: Cursor

Merge branch 'main' into yeswanthk/kimi2.5_Gb200_trt_llm

0ea91a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026

Add Kimi K2.5 NVFP4 GB200 disaggregated TRT-LLM benchmarks via Dynamo#1026
yeswanthk-lab wants to merge 7 commits intomainfrom
yeswanthk/kimi2.5_Gb200_trt_llm

yeswanthk-lab commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

kedarpotdar-nv Apr 13, 2026

Uh oh!

claude bot left a comment

Uh oh!

claude bot Apr 13, 2026

Uh oh!

functionstackx left a comment

Uh oh!

functionstackx Apr 14, 2026

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

Klaud-Cold commented Apr 14, 2026 •

edited

Loading

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

Klaud-Cold commented Apr 14, 2026 •

edited

Loading

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

yeswanthk-lab commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yeswanthk-lab commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

kedarpotdar-nv Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

Klaud-Cold commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verifying Config Parameters Against Linked Recipes

Result: All 14 configs match their linked recipes

Summary

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

Klaud-Cold commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding Perf Changelog Entry

Uh oh!

cquil11 commented Apr 14, 2026

Uh oh!

yeswanthk-lab commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Klaud-Cold commented Apr 14, 2026 •

edited

Loading

Klaud-Cold commented Apr 14, 2026 •

edited

Loading