Conversation
…r args Update search-space to tp8 conc=4 and tp4 conc=4-256. Update benchmark script with new server launch: enable-symm-mem, prefill/chunked 16384, mem-fraction-static 0.8, stream-interval 50, tokenizer-path.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
@claude update perf-changelog.yaml file |
|
Claude finished @Ankur-singh's task in 56s —— View job Updated
|
Co-authored-by: Ankur Singh <Ankur-singh@users.noreply.github.com>
There was a problem hiding this comment.
This PR is still marked [WIP] and also removes the ep field from the search-space YAML while the benchmark script still requires EP_SIZE as a mandatory env var — worth a human look before merging.
Extended reasoning...
Overview
Two files changed: the nvidia-master.yaml CI config (search-space parameters for qwen3.5-fp8-b200-sglang) and benchmarks/single_node/qwen3.5_fp8_b200.sh (server launch args). The changes simplify the server invocation and narrow the tp8 concurrency sweep to a single point (4→4) while widening tp4 (4–256).
Security Risks
No security-sensitive code is touched. No auth, crypto, or permissions changes.
Level of Scrutiny
Moderate. This is benchmark/CI config code, not production logic, but changes to the search-space and server launch args affect the validity of benchmark results and CI correctness. The [WIP] title explicitly signals the author considers it unfinished.
Other Factors
The ep key is removed from both search-space entries in the YAML, yet the benchmark script still lists EP_SIZE in its check_env_vars call (mandatory) and passes it as --expert-parallel-size=$EP_SIZE. If EP_SIZE is not injected by the CI harness through another path, jobs will fail at startup. This potential misalignment, combined with the [WIP] designation, warrants a human review before merging.
a645df8 to
5d6a8e4
Compare
…glang-update # Conflicts: # perf-changelog.yaml
update config to include ep4
ae3066b to
8c79915
Compare
Summary