dsr1 disagg 8k1k mtp: nightly 20260609 + conc-64 dispatch-bug validation by Oseltamivir · Pull Request #1696 · SemiAnalysisAI/InferenceX

Oseltamivir · 2026-06-09T17:30:27Z

Summary

Bump dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp to SGLang ROCm nightly v0.5.12.post1-rocm720-mi35x-20260609 and narrow its 8k1k MTP search space to the single conc-64 DEP8 + 1×DEP8 (MTP3) point.

Diff is intentionally just two things: the image tag and the search-space narrowing. No harness / env-var changes.

Why

At low concurrency the MoRI EP per-rank dispatch buffer num_max_dispatch_tokens_per_rank is sized max(CONC_LIST)/TP*(MTP+1). At conc-64/TP8/MTP3 that collapses to 64/8*4 = 32 (< 256), which silently corrupted decode output on -20260529: output decodes fine and acceptance length stays high, but gsm8k → 0. Collapsing the sweep to conc-64 forces max(CONC_LIST)=64, so the eval runs squarely in that regime.

This nightly is reported to carry the upstream fix (sgl-project/sglang#27194, ROCm/mori#356). Because the dispatch formula is left at its main value (no env clamp, no patcher), a green conc-64 gsm8k here demonstrates the nightly itself fixes the kernel; a red one means it does not.

Pre-fix MI355X reference: dispatch=32 → 0.00, 64 → 0.00, ≥256 → 0.94.

Note

Search space is narrowed for this validation; restore the full sweep once confirmed.

Note

Medium Risk
Temporary global changes to multinode eval marking (MIN_EVAL_CONC and eval-all-entries-in-group) can affect gsm8k coverage for other disagg configs until reverted; benchmark matrix scope is intentionally reduced for one config key.

Overview
Updates dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp to SGLang ROCm nightly 20260609 (MoRI EP dispatch-buffer fix) and replaces the broad 8k1k MTP disagg sweep with four DEP8 + 1×DEP8 (MTP3) points at conc 64, 32, 16, and 8—one concurrency per matrix entry so per-rank dispatch sizes 32→4 stay in the sub-256 bug regime.

Temporary validation harness (documented as revert when the full search space returns): MIN_EVAL_CONC 16→8 so conc-8 is eval-eligible; mark_eval_entries runs gsm8k on every eligible multinode entry in a group instead of only the highest-concurrency one. perf-changelog.yaml records the image bump, narrowed sweep, and harness deltas.

^{Reviewed by Cursor Bugbot for commit 38370f4. Bugbot is set up for automated code reviews on this repo. Configure here.}

… conc-64 Bump dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp to SGLang ROCm nightly v0.5.12.post1-rocm720-mi35x-20260609 and collapse the 8k1k MTP search space to the single conc-64 DEP8 + 1xDEP8 (MTP3) point so max(CONC_LIST)=64 -> the decode server sizes the MoRI per-rank dispatch buffer at 64/8*(MTP+1)=32 (<256), the regime that silently corrupted output (gsm8k=0) on -20260529. Validates the upstream fix (sgl-project/sglang#27194, ROCm/mori#356) reported in this nightly. Harness/env-var settings left unchanged so the result is an honest test.

github-actions · 2026-06-09T17:30:37Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

…ation entry

…g validation Expand the conc-64 point into separate DEP8+MTP3 entries for conc 64, 32, 16, 8 so each launches its own server and exercises a distinct sub-256 dispatch size (32/16/8/4). conc<=4 omitted (floor(conc/8)*4=0). To get a gsm8k eval at every point (the harness otherwise evals only the highest-conc entry per topology group, and ignores conc<16): - mark_eval_entries: eval every eligible multinode entry per group, each at its own concurrency, instead of just max-conc. - MIN_EVAL_CONC 16 -> 8 so conc-8 (dispatch=4) is eval-eligible. Both are validation-only; revert with the full search-space restore. Verified locally: generator emits 4 eval entries (conc 64/32/16/8, each run-eval=true, eval-conc = own conc) and 4 benchmark entries.

github-actions · 2026-06-09T17:47:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27224150488
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27224150488

The test hardcoded conc [8,16,32]/[8] and expected eval-conc=32, which broke when MIN_EVAL_CONC was lowered 16->8 (eligible median shifted to 16). Rebuild the conc lists from MIN_EVAL_CONC (below/at/above) so the test asserts the floor behavior for any value of the constant -- passes under both 8 and 16.

github-actions · 2026-06-09T17:50:18Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27224813462
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27224813462

github-actions · 2026-06-09T20:24:11Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27225030090
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27225030090

github-actions · 2026-06-10T08:10:52Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27225030090
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27225030090

github-actions · 2026-06-10T15:13:53Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27225030090
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=27225030090

Oseltamivir requested a review from a team June 9, 2026 17:30

Oseltamivir requested review from billishyahao, chunfangamd and seungrokj as code owners June 9, 2026 17:30

github-project-automation Bot added this to InferenceMAX Board Jun 9, 2026

Oseltamivir requested review from 1am9trash and yctseng0211 as code owners June 9, 2026 17:30

Oseltamivir added the non-canary-full-sweep-enabled Run the full sweep without the canary gate (full search space, no trim) label Jun 9, 2026

Oseltamivir added 2 commits June 9, 2026 10:32

perf-changelog: record dsr1 8k1k mtp nightly 20260609 + conc-64 valid…

e316738

…ation entry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsr1 disagg 8k1k mtp: nightly 20260609 + conc-64 dispatch-bug validation#1696

dsr1 disagg 8k1k mtp: nightly 20260609 + conc-64 dispatch-bug validation#1696
Oseltamivir wants to merge 4 commits into
mainfrom
dsr1-fp4-disagg-8k1k-mtp-nightly-conc64

Oseltamivir commented Jun 9, 2026 •

edited by cursor Bot

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented Jun 9, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Note

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Oseltamivir commented Jun 9, 2026 •

edited by cursor Bot

Loading