Add vLLM dynamic scheduler reconfigure for single-server sweeps#1029
Add vLLM dynamic scheduler reconfigure for single-server sweeps#1029JordanNanos wants to merge 8 commits intomainfrom
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
…t script - Fix reconfigure_vllm_scheduler() to use POST /reconfigure with a JSON body instead of query params to the non-existent /reconfigure_scheduler - Remove max_num_scheduled_tokens (internal name, not exposed by API) - Use mode=abort&clear_cache=true on /pause for clean reconfigure cycles - Add benchmarks/test_reconfigure_sweep.sh for standalone A/B testing on a cluster: runs N cold starts (baseline) vs 1 start + N reconfigure cycles and prints wall-clock comparison - Update docs to match actual API surface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Added the requested patched-vLLM distribution paths:
Recommended for cluster sweeps: use a custom image or pinned wheel, then enable |
The vLLM /reconfigure endpoint requires PAUSED_ALL state, which maps to pause mode="keep". Using mode="abort" would leave the scheduler in PAUSED_NEW state, causing reconfigure to reject the request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| fi | ||
| json+="}" | ||
|
|
There was a problem hiding this comment.
🔴 The three curl calls in reconfigure_vllm_scheduler() are not chained with &&, so a failure of /reconfigure_scheduler is masked by the subsequent /resume success — the function returns 0 even when reconfiguration failed. Additionally, the call site in run_benchmark_serving() ignores the return value entirely, so the benchmark always proceeds regardless of reconfiguration outcome, silently producing incorrect results with stale scheduler settings.
Extended reasoning...
Bug 1 — internal error masking in reconfigure_vllm_scheduler() (lines 38-40):
In bash, a function's return code is the exit code of its last executed command. The three curl calls run unconditionally with no chaining:
curl -fsS -X POST "$base_url/pause?mode=keep"
curl -fsS -X POST -G "$base_url/reconfigure_scheduler" "${params[@]}"
curl -fsS -X POST "$base_url/resume"The -f flag in -fsS makes curl exit with code 22 on HTTP 4xx/5xx responses. If /reconfigure_scheduler returns an error (e.g., HTTP 400 for an invalid parameter value), curl exits 22 — but execution continues unconditionally to /resume. If /resume succeeds (exit 0), the function returns 0, masking the reconfiguration failure completely.
Bug 2 — return value ignored at the call site in run_benchmark_serving() (~line 361):
if [[ "${VLLM_DYNAMIC_RECONFIGURE:-0}" == "1" && "$backend" == "vllm" ]]; then
reconfigure_vllm_scheduler "$port"
fiThere is no || return 1 or any check on the return value. There is no set -e in the script (only set +x/set -x). Even if the function were fixed to propagate errors, the benchmark would still proceed unconditionally.
Combined effect — step-by-step proof:
- User sets
VLLM_DYNAMIC_RECONFIGURE=1,VLLM_MAX_NUM_SEQS=999999(invalid, exceeds server capacity) run_benchmark_servingcallsreconfigure_vllm_scheduler "$port"- Inside the function:
curl -fsS -X POST .../pausesucceeds (exit 0) curl -fsS -X POST -G .../reconfigure_scheduler ...→ server returns HTTP 400 → curl exits 22- Execution continues (no
&&, no error check):curl -fsS -X POST .../resume→ succeeds (exit 0) - Function returns 0 (last command's exit code) — failure masked
- Back in
run_benchmark_serving: return value not checked, benchmark proceeds - vLLM server is still running with its original scheduler limits
- Benchmark results are recorded as if they were obtained with the requested settings — silently incorrect
Fix: Chain the curl calls with && inside the function, and add || return 1 at the call site:
# Inside reconfigure_vllm_scheduler():
curl -fsS -X POST "$base_url/pause?mode=keep" && curl -fsS -X POST -G "$base_url/reconfigure_scheduler" "${params[@]}" && curl -fsS -X POST "$base_url/resume"
# At the call site in run_benchmark_serving():
reconfigure_vllm_scheduler "$port" || return 11. Double reconfigure in test_reconfigure_sweep.sh: Phase B called reconfigure_vllm_scheduler manually then run_benchmark_serving called it again via the VLLM_DYNAMIC_RECONFIGURE hook. Remove the manual call and let the hook handle it. 2. Doc listed mode=abort but vLLM /reconfigure requires PAUSED_ALL which maps to mode=keep. Fix the Requirements section. 3. No error recovery in reconfigure_vllm_scheduler: if /reconfigure failed, curl exited non-zero, set -e killed the function, and the server stayed paused forever. Now capture the exit code, always call /resume, then propagate the error. 4. --force-reinstall in wheel mode reinstalls all dependencies. Use --no-deps --force-reinstall to only replace the vllm package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
functionstackx
left a comment
There was a problem hiding this comment.
does this work with cudagraphs, aiter, amd, flashinfer, torch compile or any other
Image: semianalysiswork/vllm-reconfigure:latest Based on vllm/vllm-openai:v0.18.0 with reconfigure API overlay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standalone workflow_dispatch workflow that runs benchmarks/test_reconfigure_sweep.sh on any GPU runner using the semianalysiswork/vllm-reconfigure:latest image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@functionstackx unlikely, only vllm has /pause and /resume from what I can tell |
…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Summary
VLLM_DYNAMIC_RECONFIGURE=1hook inrun_benchmark_servingthatcalls vLLM
/pause→/reconfigure→/resumebefore each benchmark runVLLM_MAX_NUM_BATCHED_TOKENSandVLLM_MAX_NUM_SEQSenv vars andsends them as a JSON body to
POST /reconfigurebenchmarks/test_reconfigure_sweep.sh— standalone A/B test script thatcompares N cold starts (baseline) vs 1 cold start + N reconfigure cycles
docs/vllm-dynamic-scheduler-reconfigure.mdPre-built image
Based on
vllm/vllm-openai:v0.18.0with the reconfigure API overlaid.Source: JordanNanos/vllm
feature/reconfigure-schedulerSingle-node test
Sweeps 3
max_num_batched_tokens× 2max_num_seqs= 6 configs.Phase A: 6 cold starts. Phase B: 1 cold start + 5 reconfigure cycles (~1s each).
Test plan
bash -n benchmarks/benchmark_lib.sh— syntax checkbash -n benchmarks/test_reconfigure_sweep.sh— syntax checksemianalysiswork/vllm-reconfigure:latest)AI assistance was used to prepare this change.