[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706
[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706qiching wants to merge 2 commits into
Conversation
…LM MTP) Mirror the merged dsv4 collector (#1650) for three more models, reusing the same speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS quoting fix; dsr1 is thinking-on only.
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Add `--save-detailed` to the SpeedBench AL collectors (dsv4/dsr1/glm5/ qwen3.5) so vllm bench serve records each request's response text, and upload speedbench_results/speedbench_* as a workflow artifact. This lets each cell's measured AL be audited for output correctness (sensible text + correct thinking mode), per review feedback. No behavior change: the flag only adds fields to the already-saved result JSON.
…LM MTP)
Mirror the merged dsv4 collector (#1650) for three more models, reusing the same speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS quoting fix; dsr1 is thinking-on only.