[Reproducibility] dsr1 0528 dynamo + mtp on b200

Hey,

I've been trying to reproduce the results for the following run:
ISL = 1k
OSK = 1k

B200 (Dynamo TRT, MTP)
Date: 2026-01-29
Image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:0.8.1.post1
Interactivity (tok/s/user): 21.338126345946833
Output Token Throughput per GPU (tok/s/gpu): 10,012.214
Total GPUs: 32
Prefill: 12 GPUs, TP: 4, EP: 4, DPA: True, Workers: 3
Decode: 20 GPUs, TP: 4, EP: 4, DPA: True, Workers: 5
Concurrency: 10860
Precision: FP4
[GitHub Actions Run](https://github.com/SemiAnalysisAI/InferenceX/actions/runs/21484975323/attempts/1)

I have 4 nodes of B200 sxm, i am using K8s to deploy the same configuration as you did here:
https://github.com/NVIDIA/srt-slurm/blob/sa-submission-q2-2026/recipes/trtllm/b200-fp4/1k1k/mtp/ctx3_gen5_dep4_batch512_eplb0_mtp1.yaml

No matter what I did, my results still fall under 10K TPS per GPU. Current best result is ~8.3K per decode gpu.

I have validated the kv transfer is via gpu direct.

The logs of the run already expired and therefore, ask if there is a way to get them or at least share more about how to be able to reproduce the results , e.g, how many frontends were deployed ? or were the system was configured to performance?

Thanks



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Reproducibility] dsr1 0528 dynamo + mtp on b200 #1691

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Reproducibility] dsr1 0528 dynamo + mtp on b200 #1691

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions