Add qwen3.5-fp4-b200-trt single-node TensorRT-LLM benchmark by RohitNagraj · Pull Request #1711 · SemiAnalysisAI/InferenceX

RohitNagraj · 2026-06-11T18:01:01Z

Adds the qwen3.5-fp4-b200-trt config — Qwen3.5-397B-A17B-NVFP4 on B200, single-node TensorRT-LLM — for the 1k/1k and 8k/1k cells with a TP/TEP/DEP parallelism sweep.

nvidia-master.yaml: new config entry + search space.
qwen3.5_fp4_b200_trt.sh: trtllm-serve benchmark script; generates the extra-llm-api config (MoE backend, attention-DP / batch-wait settings) per parallelism mode.
perf-changelog entry.

Note

Low Risk
Benchmark-only additions (YAML config, shell runner, changelog) with no changes to application or serving logic in-repo.

Overview
Adds qwen3.5-fp4-b200-trt so Qwen3.5-397B-A17B-NVFP4 on B200 can be measured with single-node TensorRT-LLM (trtllm-serve on nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc18).

nvidia-master.yaml defines fixed-seq-len cells at 1k/1k and 8k/1k with a parallelism sweep over TP, EP, and optional dp-attn, each with explicit concurrency lists.

qwen3.5_fp4_b200_trt.sh wires the run: it emits qwen3.5-fp4-trt.yml (MoE backend CUTEDSL vs TRTLLM, attention-DP vs batch-wait tuning), starts the server, runs serving benchmarks, and optional lm-eval—same shape as other B200 TRT fixed-seq-len scripts.

perf-changelog.yaml documents the new config key.

^{Reviewed by Cursor Bugbot for commit df2bb3c. Bugbot is set up for automated code reviews on this repo. Configure here.}

Add the qwen3.5-fp4-b200-trt config (Qwen3.5-397B-A17B-NVFP4, B200, 1k/1k and 8k/1k) with a TP/TEP/DEP parallelism sweep, the qwen3.5_fp4_b200_trt.sh benchmark script (trtllm-serve with an extra-llm-api config generated per parallelism mode), and a perf-changelog entry.

github-actions · 2026-06-11T18:01:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 57c34e4. Configure here.}

cursor · 2026-06-11T18:03:00Z

+      - { tp: 4, ep: 1, conc-list: [4] }
+      - { tp: 2, ep: 2, conc-list: [8, 32] }
+      - { tp: 8, ep: 8, conc-list: [4] }
+      - { tp: 8, ep: 8, dp-attn: true, conc-list: [256, 512, 1024] }


conc-list breaks full-sweep

High Severity

The new qwen3.5-fp4-b200-trt fixed-seq-len search space uses conc-list, but single-node full-sweep generation in utils/matrix_logic/generate_sweep_configs.py only reads conc-start and conc-end for that scenario. Matrix generation will raise a missing-key error and no benchmark jobs will be emitted for this config.

^{Reviewed by Cursor Bugbot for commit 57c34e4. Configure here.}

github-actions · 2026-06-11T23:23:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27367214669
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27367214669

github-actions · 2026-06-12T04:33:26Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27367214669
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27367214669

github-actions · 2026-06-12T07:33:44Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27367214669
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27367214669

github-actions · 2026-06-12T16:33:08Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27428901315
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27428901315

RohitNagraj requested a review from a team June 11, 2026 18:01

RohitNagraj requested review from jgangani and kedarpotdar-nv as code owners June 11, 2026 18:01

github-project-automation Bot added this to InferenceMAX Board Jun 11, 2026

Update perf-changelog pr-link for #1711

57c34e4

RohitNagraj added the full-sweep-enabled label Jun 11, 2026

cursor Bot reviewed Jun 11, 2026

View reviewed changes

RohitNagraj added 2 commits June 12, 2026 09:30

Merge branch 'main' into qwen3.5-fp4-b200-trt

c543c4b

Update perf-changelog.yaml

df2bb3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add qwen3.5-fp4-b200-trt single-node TensorRT-LLM benchmark#1711

Add qwen3.5-fp4-b200-trt single-node TensorRT-LLM benchmark#1711
RohitNagraj wants to merge 4 commits into
mainfrom
qwen3.5-fp4-b200-trt

RohitNagraj commented Jun 11, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RohitNagraj commented Jun 11, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 11, 2026

Choose a reason for hiding this comment

conc-list breaks full-sweep

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RohitNagraj commented Jun 11, 2026 •

edited by cursor Bot

Loading