[WIP] Update Qwen3.5 FP8 B200 SGLang by Ankur-singh · Pull Request #1027 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-04-13T18:50:34Z

Summary

Search-space: tp8 conc=4, tp4 conc=4-256 (both seq-lens)
Updated server launch args to match latest recipe

…r args Update search-space to tp8 conc=4 and tp4 conc=4-256. Update benchmark script with new server launch: enable-symm-mem, prefill/chunked 16384, mem-fraction-static 0.8, stream-interval 50, tokenizer-path.

github-actions · 2026-04-13T18:50:44Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

Ankur-singh · 2026-04-13T18:52:22Z

@claude update perf-changelog.yaml file

Klaud-Cold · 2026-04-13T18:52:48Z

Claude finished @Ankur-singh's task in 56s —— View job

Updated `perf-changelog.yaml`

Gather context on PR changes
Read current perf-changelog.yaml
Add changelog entry
Commit and push

Added a perf-changelog.yaml entry for qwen3.5-fp8-b200-sglang covering:

Search-space update (tp8 conc=4, tp4 conc=4-256; ep removed)
New server args (--enable-symm-mem, --disable-radix-cache, --tokenizer-path)
Tuning changes (prefill 32768→16384, mem-fraction 0.82→0.8, stream-interval 30→50)
Removed env vars and conditional --enable-flashinfer-allreduce-fusion

Commit: 2b15d3d
|

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

claude

This PR is still marked [WIP] and also removes the ep field from the search-space YAML while the benchmark script still requires EP_SIZE as a mandatory env var — worth a human look before merging.

Extended reasoning...

Overview

Two files changed: the nvidia-master.yaml CI config (search-space parameters for qwen3.5-fp8-b200-sglang) and benchmarks/single_node/qwen3.5_fp8_b200.sh (server launch args). The changes simplify the server invocation and narrow the tp8 concurrency sweep to a single point (4→4) while widening tp4 (4–256).

Security Risks

No security-sensitive code is touched. No auth, crypto, or permissions changes.

Level of Scrutiny

Moderate. This is benchmark/CI config code, not production logic, but changes to the search-space and server launch args affect the validity of benchmark results and CI correctness. The [WIP] title explicitly signals the author considers it unfinished.

Other Factors

The ep key is removed from both search-space entries in the YAML, yet the benchmark script still lists EP_SIZE in its check_env_vars call (mandatory) and passes it as --expert-parallel-size=$EP_SIZE. If EP_SIZE is not injected by the CI harness through another path, jobs will fail at startup. This potential misalignment, combined with the [WIP] designation, warrants a human review before merging.

…glang-update # Conflicts: # perf-changelog.yaml

update config to include ep4

Update Qwen3.5 FP8 B200 SGLang: tp8 conc=4, tp4 conc=4-256, new serve…

92985b8

…r args Update search-space to tp8 conc=4 and tp4 conc=4-256. Update benchmark script with new server launch: enable-symm-mem, prefill/chunked 16384, mem-fraction-static 0.8, stream-interval 50, tokenizer-path.

Ankur-singh requested a review from a team April 13, 2026 18:50

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners April 13, 2026 18:50

github-project-automation bot added this to InferenceMAX Board Apr 13, 2026

Ankur-singh changed the title ~~[WIP] Update Qwen3.5 FP8 B200 SGLang: tp8 conc=4, tp4 conc=4-256~~ [WIP] Update Qwen3.5 FP8 B200 SGLang Apr 13, 2026

github-actions bot and others added 2 commits April 13, 2026 18:53

Add perf-changelog entry for Qwen3.5 FP8 B200 SGLang config update

2b15d3d

Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>

Update perf-changelog.yaml

82d3275

Ankur-singh added the sweep-enabled label Apr 13, 2026

claude bot reviewed Apr 13, 2026

View reviewed changes

Update Qwen3.5 FP8 B200 SGLang image to v0.5.10.post1-cu130

5d6a8e4

Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from a645df8 to 5d6a8e4 Compare April 13, 2026 21:53

Ankur-singh added 4 commits April 14, 2026 12:00

Update chunked_prefill_size and max_prefill_tokens to 81920

bf116b2

Merge branch 'main' into nv/qwen3.5-fp8-b200-sglang-update

8322426

Merge remote-tracking branch 'origin/main' into nv/qwen3.5-fp8-b200-s…

2c16262

…glang-update # Conflicts: # perf-changelog.yaml

update config to include ep4

8c79915

update config to include ep4

Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from ae3066b to 8c79915 Compare April 15, 2026 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Update Qwen3.5 FP8 B200 SGLang#1027

[WIP] Update Qwen3.5 FP8 B200 SGLang#1027
Ankur-singh wants to merge 8 commits intomainfrom
nv/qwen3.5-fp8-b200-sglang-update

Ankur-singh commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Ankur-singh commented Apr 13, 2026

Uh oh!

Klaud-Cold commented Apr 13, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ankur-singh commented Apr 13, 2026

Summary

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Ankur-singh commented Apr 13, 2026

Uh oh!

Klaud-Cold commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updated perf-changelog.yaml

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Overview

Security Risks

Level of Scrutiny

Other Factors

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Klaud-Cold commented Apr 13, 2026 •

edited

Loading

Updated `perf-changelog.yaml`