Skip to content

[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706

Draft
qiching wants to merge 2 commits into
mainfrom
albecheng/speedbench-al-dsr1-glm5-qwen
Draft

[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706
qiching wants to merge 2 commits into
mainfrom
albecheng/speedbench-al-dsr1-glm5-qwen

Conversation

@qiching

@qiching qiching commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

…LM MTP)

Mirror the merged dsv4 collector (#1650) for three more models, reusing the same speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS quoting fix; dsr1 is thinking-on only.

…LM MTP)

Mirror the merged dsv4 collector (#1650) for three more models, reusing the same
speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher
change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve
args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS
quoting fix; dsr1 is thinking-on only.
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Add `--save-detailed` to the SpeedBench AL collectors (dsv4/dsr1/glm5/
qwen3.5) so vllm bench serve records each request's response text, and
upload speedbench_results/speedbench_* as a workflow artifact. This lets
each cell's measured AL be audited for output correctness (sensible text
+ correct thinking mode), per review feedback. No behavior change: the
flag only adds fields to the already-saved result JSON.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant