Skip to content

[WIP] Update Qwen3.5 FP8 B200 SGLang#1027

Open
Ankur-singh wants to merge 8 commits intomainfrom
nv/qwen3.5-fp8-b200-sglang-update
Open

[WIP] Update Qwen3.5 FP8 B200 SGLang#1027
Ankur-singh wants to merge 8 commits intomainfrom
nv/qwen3.5-fp8-b200-sglang-update

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

Summary

  • Search-space: tp8 conc=4, tp4 conc=4-256 (both seq-lens)
  • Updated server launch args to match latest recipe

…r args

Update search-space to tp8 conc=4 and tp4 conc=4-256. Update benchmark
script with new server launch: enable-symm-mem, prefill/chunked 16384,
mem-fraction-static 0.8, stream-interval 50, tokenizer-path.
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@Ankur-singh Ankur-singh changed the title [WIP] Update Qwen3.5 FP8 B200 SGLang: tp8 conc=4, tp4 conc=4-256 [WIP] Update Qwen3.5 FP8 B200 SGLang Apr 13, 2026
@Ankur-singh
Copy link
Copy Markdown
Collaborator Author

@claude update perf-changelog.yaml file

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 13, 2026

Claude finished @Ankur-singh's task in 56s —— View job


Updated perf-changelog.yaml

  • Gather context on PR changes
  • Read current perf-changelog.yaml
  • Add changelog entry
  • Commit and push

Added a perf-changelog.yaml entry for qwen3.5-fp8-b200-sglang covering:

  • Search-space update (tp8 conc=4, tp4 conc=4-256; ep removed)
  • New server args (--enable-symm-mem, --disable-radix-cache, --tokenizer-path)
  • Tuning changes (prefill 32768→16384, mem-fraction 0.82→0.8, stream-interval 30→50)
  • Removed env vars and conditional --enable-flashinfer-allreduce-fusion

Commit: 2b15d3d
|

github-actions bot and others added 2 commits April 13, 2026 18:53
Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is still marked [WIP] and also removes the ep field from the search-space YAML while the benchmark script still requires EP_SIZE as a mandatory env var — worth a human look before merging.

Extended reasoning...

Overview

Two files changed: the nvidia-master.yaml CI config (search-space parameters for qwen3.5-fp8-b200-sglang) and benchmarks/single_node/qwen3.5_fp8_b200.sh (server launch args). The changes simplify the server invocation and narrow the tp8 concurrency sweep to a single point (4→4) while widening tp4 (4–256).

Security Risks

No security-sensitive code is touched. No auth, crypto, or permissions changes.

Level of Scrutiny

Moderate. This is benchmark/CI config code, not production logic, but changes to the search-space and server launch args affect the validity of benchmark results and CI correctness. The [WIP] title explicitly signals the author considers it unfinished.

Other Factors

The ep key is removed from both search-space entries in the YAML, yet the benchmark script still lists EP_SIZE in its check_env_vars call (mandatory) and passes it as --expert-parallel-size=$EP_SIZE. If EP_SIZE is not injected by the CI harness through another path, jobs will fail at startup. This potential misalignment, combined with the [WIP] designation, warrants a human review before merging.

@Ankur-singh Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from a645df8 to 5d6a8e4 Compare April 13, 2026 21:53
@Ankur-singh Ankur-singh force-pushed the nv/qwen3.5-fp8-b200-sglang-update branch from ae3066b to 8c79915 Compare April 15, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants