[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL… by qiching · Pull Request #1706 · SemiAnalysisAI/InferenceX

qiching · 2026-06-11T01:08:19Z

…LM MTP)

Mirror the merged dsv4 collector (#1650) for three more models, reusing the same speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS quoting fix; dsr1 is thinking-on only.

…LM MTP) Mirror the merged dsv4 collector (#1650) for three more models, reusing the same speedbench-al.yml workflow (model/model-prefix are inputs; no workflow or launcher change — all three are already in launch_b300-nv.sh STAGED_MODELS). Per-model serve args match the locally-validated scripts; glm5+qwen3.5 apply the #1695 CHAT_TEMPLATE_KWARGS quoting fix; dsr1 is thinking-on only.

github-actions · 2026-06-11T01:08:30Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Add `--save-detailed` to the SpeedBench AL collectors (dsv4/dsr1/glm5/ qwen3.5) so vllm bench serve records each request's response text, and upload speedbench_results/speedbench_* as a workflow artifact. This lets each cell's measured AL be audited for output correctness (sensible text + correct thinking mode), per review feedback. No behavior change: the flag only adds fields to the already-saved result JSON.

github-project-automation Bot added this to InferenceMAX Board Jun 11, 2026

qiching mentioned this pull request Jun 11, 2026

[Tracking Issue] Synthetic Acceptance for MTP Benchmarks #1651

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706

[NV] Add SpeedBench AL collectors for DSR1 / GLM-5 / Qwen3.5 (B300 vL…#1706
qiching wants to merge 2 commits into
mainfrom
albecheng/speedbench-al-dsr1-glm5-qwen

qiching commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

qiching commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant