-
Notifications
You must be signed in to change notification settings - Fork 194
dsv4-fp4-b300-sglang: enable piecewise cuda graph and mixed chunk #1693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
yhyang201
wants to merge
21
commits into
main
Choose a base branch
from
dsv4-fp4-b300-piecewise-cuda-graph
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+48
−37
Open
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
c444e42
dsv4-fp4-b300-sglang: align env vars to GB300 and add fp4-indexer flag
yhyang201 3ab8dc9
dsv4-fp4-b300-sglang: bump image to nightly-dev-cu13-20260608-303757cc
yhyang201 250358c
Add perf-changelog entry for dsv4-fp4-b300-sglang env var alignment
yhyang201 96b2462
dsv4-fp4-b300-sglang: switch to nightly-dev-cu13-20260606-b3e4c204
yhyang201 9d8e10d
dsv4-fp4-b300-sglang: switch to nightly-dev-cu13-20260604-14ed9b44
yhyang201 50d1c91
dsv4-fp4-b300-sglang: switch to nightly-dev-cu13-20260601-373cadc9
yhyang201 1025c7f
dsv4-fp4-b300-sglang: remove --enable-deepseek-v4-fp4-indexer
yhyang201 0e02411
dsv4-fp4-b300-sglang: revert image to original nightly-dev-cu13-20260…
yhyang201 c65520a
dsv4-fp4-b300-sglang: switch to nightly-dev-cu13-20260609-317fc6a9
yhyang201 6beee8c
dsv4-fp4-b300-sglang: kill stale processes on server ports before launch
yhyang201 5bd18ad
benchmark: kill stale server processes before launch
yhyang201 c27f53f
benchmark: use python3+psutil to kill stale server by port
yhyang201 cd2f55f
fix stale process cleanup: run pkill on host, not inside container
yhyang201 276647b
b300: fix pkill self-kill and change PORT to 30000
yhyang201 27d5fe9
dsv4-fp4-b300-sglang: remove kill logic, add --enable-deepseek-v4-fp4…
yhyang201 30211e4
dsv4-fp4-b300-sglang: add --enforce-piecewise-cuda-graph --enable-mix…
yhyang201 48af571
perf-changelog: add entry for piecewise cuda graph PR #1693
yhyang201 38653b9
benchmark_lib: remove stale kill_port_users/cleanup_server_ports
yhyang201 abf4349
dsv4-fp4-b300-sglang: replace piecewise-cuda-graph/mixed-chunk with f…
yhyang201 f2724b9
Replace --enable-flashinfer-allreduce-fusion with --enable-mixed-chun…
yhyang201 1bdee89
Remove CONC=4096 profile from dsv4-fp4-b300-sglang
yhyang201 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing piecewise CUDA graph flag
Medium Severity
The PR adds
--enable-mixed-chunkto every concurrency profile but never adds--enforce-piecewise-cuda-graph, even though the title and description call for both on all profiles. Runs therefore omit the intended piecewise CUDA graph enforcement.Additional Locations (2)
benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_sglang.sh#L97-L99benchmarks/single_node/fixed_seq_len/dsv4_fp4_b300_sglang.sh#L115-L117Reviewed by Cursor Bugbot for commit f2724b9. Configure here.