-
Notifications
You must be signed in to change notification settings - Fork 193
[AMD] remove accuracy wrong sweep point, bump image to sglang-rocm 20260609 #1714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
15ab4c7
69dd3d8
8dec29c
b37a18d
4b531d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -193,8 +193,12 @@ if [[ "$PREFILL_ENABLE_DP" == "true" ]] && [[ "$PREFILL_ENABLE_EP" == "true" ]]; | |
| prefill_max_running_requests=$BENCH_MAX_CONC_VALUE | ||
| prefill_dp_ranks=$PREFILL_TP_SIZE | ||
| # MORI_MAX_DISPATCH_TOKENS_PREFILL stays at 8192 (no change) | ||
| MORI_MOE_MAX_INPUT_TOKENS_PREFILL=$((MORI_MAX_DISPATCH_TOKENS_PREFILL * prefill_dp_ranks / 2)) | ||
| echo "[DP+EP override] Prefill: max-running-requests=$prefill_max_running_requests, MOE_MAX_INPUT=$MORI_MOE_MAX_INPUT_TOKENS_PREFILL" | ||
| if [[ "$prefill_max_running_requests" -gt 128 ]]; then | ||
| MORI_MOE_MAX_INPUT_TOKENS_PREFILL=$((MORI_MAX_DISPATCH_TOKENS_PREFILL * prefill_dp_ranks / 2)) | ||
| echo "[DP+EP override] Prefill: max-running-requests=$prefill_max_running_requests, MOE_MAX_INPUT=$MORI_MOE_MAX_INPUT_TOKENS_PREFILL" | ||
| else | ||
| unset MORI_MOE_MAX_INPUT_TOKENS_PREFILL | ||
| fi | ||
| fi | ||
|
|
||
| # Compute DP-dependent decode parameters (3-way: DP > EP-only > no_dp) | ||
|
|
@@ -214,7 +218,11 @@ if [[ "$DECODE_ENABLE_DP" == "true" ]] && [[ "$DECODE_ENABLE_EP" == "true" ]]; t | |
| decode_max_running_requests=$BENCH_MAX_CONC_VALUE | ||
| decode_dp_ranks=$DECODE_TP_SIZE | ||
| MORI_MAX_DISPATCH_TOKENS_DECODE=$((BENCH_MAX_CONC_VALUE / decode_dp_ranks)) | ||
| MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10)) | ||
| if [[ "decode_max_running_requests" -gt 128 ]]; | ||
| MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10)) | ||
| else | ||
| unset MORI_MOE_MAX_INPUT_TOKENS_DECODE | ||
| fi | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Decode concurrency test wrongHigh Severity The decode guard compares the literal string Reviewed by Cursor Bugbot for commit 4b531d9. Configure here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MTP scales unset decode MOEMedium Severity When decode concurrency is at most 128, the new branch unsets Additional Locations (1)Reviewed by Cursor Bugbot for commit 4b531d9. Configure here. |
||
| # Update derived variable | ||
| SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD=$((MORI_MAX_DISPATCH_TOKENS_DECODE * 2)) | ||
| export SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD | ||
|
|
||


Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @billishyahao can u fix the accuracy eval issue instead of just deleting it?
sgl-project/sglang#27194
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are investigating and try to resolve the accuracy issue at the same moment. sgl-project/sglang#27194 . Here we need a clean up to eliminate those incorrect points.