Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
266 changes: 0 additions & 266 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2077,45 +2077,6 @@ dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=3"


# 1*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 128 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=3"

# 1*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 64 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=3"

# 2*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 1024, 2048, 4096 ]
Expand Down Expand Up @@ -2437,233 +2398,6 @@ dsv4-fp4-mi355x-vllm-agentic:
- { tp: 4, offloading: none, conc-list: [1, 2, 4, 8, 10, 12, 16] }
- { tp: 4, ep: 4, dp-attn: true, offloading: none, conc-list: [16, 24, 32, 40, 48] }

dsr1-fp4-mi355x-sglang-disagg-mtp:
image: lmsysorg/sglang-rocm:v0.5.12-rocm720-mi35x-20260519
model: amd/DeepSeek-R1-0528-MXFP4-v2
model-prefix: dsr1
runner: mi355x-disagg
precision: fp4
framework: sglang-disagg
multinode: true
disagg: true
scenarios:
fixed-seq-len:
- isl: 1024
osl: 1024
search-space:
Comment on lines -2440 to -2453

@functionstackx functionstackx Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @billishyahao can u fix the accuracy eval issue instead of just deleting it?

sgl-project/sglang#27194

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are investigating and try to resolve the accuracy issue at the same moment. sgl-project/sglang#27194 . Here we need a clean up to eliminate those incorrect points.

# MTP configurations
# 1P1D TP8
- spec-decoding: "mtp"
conc-list: [ 1, 2, 4, 8 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=3"

# 1P2D TP8
- spec-decoding: "mtp"
conc-list: [ 2, 4, 8, 16, 32 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 2
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "DECODE_MTP_SIZE=3"

# 1P2D TP8
- spec-decoding: "mtp"
conc-list: [ 64, 128, 256 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 2
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "DECODE_MTP_SIZE=2"

# 1P2D TP4
- spec-decoding: "mtp"
conc-list: [ 64, 128, 256 ]
prefill:
num-worker: 1
tp: 4
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 2
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "DECODE_MTP_SIZE=2"

# 1*DEP4+ 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 1024, 2048, 4096 ]
prefill:
num-worker: 1
tp: 4
ep: 4
dp-attn: true
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=1"

- isl: 8192
osl: 1024
search-space:
# MTP configurations
# 1P1D pure TP8
- spec-decoding: "mtp"
conc-list: [ 1, 2, 4, 8 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=3"

# 1P2D TP8
- spec-decoding: "mtp"
conc-list: [ 2, 4, 8, 16, 32 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 2
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "DECODE_MTP_SIZE=3"

# 1P2D TP8
- spec-decoding: "mtp"
conc-list: [ 64, 128, 256 ]
prefill:
num-worker: 1
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 2
tp: 8
ep: 1
dp-attn: false
additional-settings:
- "DECODE_NODES=2"
- "DECODE_MTP_SIZE=2"

# 1*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 128, 512 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=1"

# 1*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 64, 256 ]
prefill:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "PREFILL_NODES=1"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=1"

# 2*DEP8 + 1*DEP8
- spec-decoding: "mtp"
conc-list: [ 1024, 2048, 4096 ]
prefill:
num-worker: 2
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "PREFILL_NODES=2"
decode:
num-worker: 1
tp: 8
ep: 8
dp-attn: true
additional-settings:
- "DECODE_NODES=1"
- "DECODE_MTP_SIZE=1"


# DSv4-Pro FP4 on MI355X via SGLang. Uses a rocm720 mi35x image built off the
Expand Down
14 changes: 11 additions & 3 deletions benchmarks/multi_node/amd_utils/server_sglang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,12 @@ if [[ "$PREFILL_ENABLE_DP" == "true" ]] && [[ "$PREFILL_ENABLE_EP" == "true" ]];
prefill_max_running_requests=$BENCH_MAX_CONC_VALUE
prefill_dp_ranks=$PREFILL_TP_SIZE
# MORI_MAX_DISPATCH_TOKENS_PREFILL stays at 8192 (no change)
MORI_MOE_MAX_INPUT_TOKENS_PREFILL=$((MORI_MAX_DISPATCH_TOKENS_PREFILL * prefill_dp_ranks / 2))
echo "[DP+EP override] Prefill: max-running-requests=$prefill_max_running_requests, MOE_MAX_INPUT=$MORI_MOE_MAX_INPUT_TOKENS_PREFILL"
if [[ "$prefill_max_running_requests" -gt 128 ]]; then
MORI_MOE_MAX_INPUT_TOKENS_PREFILL=$((MORI_MAX_DISPATCH_TOKENS_PREFILL * prefill_dp_ranks / 2))
echo "[DP+EP override] Prefill: max-running-requests=$prefill_max_running_requests, MOE_MAX_INPUT=$MORI_MOE_MAX_INPUT_TOKENS_PREFILL"
else
unset MORI_MOE_MAX_INPUT_TOKENS_PREFILL
fi
fi

# Compute DP-dependent decode parameters (3-way: DP > EP-only > no_dp)
Expand All @@ -214,7 +218,11 @@ if [[ "$DECODE_ENABLE_DP" == "true" ]] && [[ "$DECODE_ENABLE_EP" == "true" ]]; t
decode_max_running_requests=$BENCH_MAX_CONC_VALUE
decode_dp_ranks=$DECODE_TP_SIZE
MORI_MAX_DISPATCH_TOKENS_DECODE=$((BENCH_MAX_CONC_VALUE / decode_dp_ranks))
MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10))
if [[ "decode_max_running_requests" -gt 128 ]];
MORI_MOE_MAX_INPUT_TOKENS_DECODE=$((MORI_MAX_DISPATCH_TOKENS_DECODE * decode_dp_ranks * 7 / 10))
else
unset MORI_MOE_MAX_INPUT_TOKENS_DECODE
fi

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decode concurrency test wrong

High Severity

The decode guard compares the literal string decode_max_running_requests to 128 instead of "$decode_max_running_requests". The prefill side correctly uses "$prefill_max_running_requests", so the decode MOE override never runs for high bench concurrency and MORI_MOE_MAX_INPUT_TOKENS_DECODE is always unset in this block.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4b531d9. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MTP scales unset decode MOE

Medium Severity

When decode concurrency is at most 128, the new branch unsets MORI_MOE_MAX_INPUT_TOKENS_DECODE, but the later MTP block still multiplies that variable. Unset operands in arithmetic expansion become zero, so MTP runs export SGLANG_MORI_MOE_MAX_INPUT_TOKENS=0 instead of omitting the env var like the prefill unset path.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 4b531d9. Configure here.

# Update derived variable
SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD=$((MORI_MAX_DISPATCH_TOKENS_DECODE * 2))
export SGLANG_MORI_DISPATCH_INTER_KERNEL_SWITCH_THRESHOLD
Expand Down
6 changes: 6 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3600,3 +3600,9 @@
- "MI355x DSR1-FP4: Include TP4 configurations for 8k1k"
- "Expand the TP sweep (included TP=4) for 8k/1k configuration for conc=4 to 64"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1692

- config-keys:
- dsr1-fp4-mi355x-sglang-disagg-8k1k-mtp
description:
- "Remove conc128,64 for dep8 case"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1714
Loading