Add vLLM dynamic scheduler reconfigure for single-server sweeps by JordanNanos · Pull Request #1029 · SemiAnalysisAI/InferenceX

JordanNanos · 2026-04-14T21:38:58Z

Summary

Opt-in VLLM_DYNAMIC_RECONFIGURE=1 hook in run_benchmark_serving that
calls vLLM /pause → /reconfigure → /resume before each benchmark run
Reads VLLM_MAX_NUM_BATCHED_TOKENS and VLLM_MAX_NUM_SEQS env vars and
sends them as a JSON body to POST /reconfigure
benchmarks/test_reconfigure_sweep.sh — standalone A/B test script that
compares N cold starts (baseline) vs 1 cold start + N reconfigure cycles
Documentation in docs/vllm-dynamic-scheduler-reconfigure.md

Pre-built image

docker pull semianalysiswork/vllm-reconfigure:latest

Based on vllm/vllm-openai:v0.18.0 with the reconfigure API overlaid.
Source: JordanNanos/vllm feature/reconfigure-scheduler

Single-node test

docker run --rm --init --network host \
  --runtime nvidia --gpus all --ipc host --privileged \
  --shm-size=16g --ulimit memlock=-1 --ulimit stack=67108864 \
  -v $HF_HUB_CACHE:/root/.cache/huggingface \
  -v $(pwd):/workspace -w /workspace \
  -e HF_TOKEN -e PORT=8888 \
  -e MODEL=openai/gpt-oss-120b \
  -e TP=8 -e CONC=32 \
  -e ISL=1024 -e OSL=1024 \
  semianalysiswork/vllm-reconfigure:latest \
  bash benchmarks/test_reconfigure_sweep.sh

Sweeps 3 max_num_batched_tokens × 2 max_num_seqs = 6 configs.
Phase A: 6 cold starts. Phase B: 1 cold start + 5 reconfigure cycles (~1s each).

Test plan

bash -n benchmarks/benchmark_lib.sh — syntax check
bash -n benchmarks/test_reconfigure_sweep.sh — syntax check
Build overlay image (semianalysiswork/vllm-reconfigure:latest)
Run A/B test on a single GPU node
Verify benchmark metrics match between baseline and reconfigure phases
Compare total wall-clock time (expect ~5× reduction in startup overhead)

AI assistance was used to prepare this change.

github-actions · 2026-04-14T21:39:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

…t script - Fix reconfigure_vllm_scheduler() to use POST /reconfigure with a JSON body instead of query params to the non-existent /reconfigure_scheduler - Remove max_num_scheduled_tokens (internal name, not exposed by API) - Use mode=abort&clear_cache=true on /pause for clean reconfigure cycles - Add benchmarks/test_reconfigure_sweep.sh for standalone A/B testing on a cluster: runs N cold starts (baseline) vs 1 start + N reconfigure cycles and prints wall-clock comparison - Update docs to match actual API surface Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos · 2026-04-14T21:41:22Z

Added the requested patched-vLLM distribution paths:

install_patched_vllm helper in benchmarks/benchmark_lib.sh supporting wheel, git ref, and editable checkout installs.
docs/vllm-patched-distribution.md with custom image, pinned wheel, mounted editable checkout, and pinned git-ref workflows.

Recommended for cluster sweeps: use a custom image or pinned wheel, then enable VLLM_DYNAMIC_RECONFIGURE=1 only for jobs running a vLLM build with the runtime reconfiguration API.

The vLLM /reconfigure endpoint requires PAUSED_ALL state, which maps to pause mode="keep". Using mode="abort" would leave the scheduler in PAUSED_NEW state, causing reconfigure to reject the request. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-14T21:44:25Z

+    fi
+    json+="}"
+


🔴 The three curl calls in reconfigure_vllm_scheduler() are not chained with &&, so a failure of /reconfigure_scheduler is masked by the subsequent /resume success — the function returns 0 even when reconfiguration failed. Additionally, the call site in run_benchmark_serving() ignores the return value entirely, so the benchmark always proceeds regardless of reconfiguration outcome, silently producing incorrect results with stale scheduler settings.

Extended reasoning...

Bug 1 — internal error masking in reconfigure_vllm_scheduler() (lines 38-40):

In bash, a function's return code is the exit code of its last executed command. The three curl calls run unconditionally with no chaining:

curl -fsS -X POST "$base_url/pause?mode=keep" curl -fsS -X POST -G "$base_url/reconfigure_scheduler" "${params[@]}" curl -fsS -X POST "$base_url/resume"

The -f flag in -fsS makes curl exit with code 22 on HTTP 4xx/5xx responses. If /reconfigure_scheduler returns an error (e.g., HTTP 400 for an invalid parameter value), curl exits 22 — but execution continues unconditionally to /resume. If /resume succeeds (exit 0), the function returns 0, masking the reconfiguration failure completely.

Bug 2 — return value ignored at the call site in run_benchmark_serving() (~line 361):

if [[ "${VLLM_DYNAMIC_RECONFIGURE:-0}" == "1" && "$backend" == "vllm" ]]; then reconfigure_vllm_scheduler "$port" fi

There is no || return 1 or any check on the return value. There is no set -e in the script (only set +x/set -x). Even if the function were fixed to propagate errors, the benchmark would still proceed unconditionally.

Combined effect — step-by-step proof:

User sets VLLM_DYNAMIC_RECONFIGURE=1, VLLM_MAX_NUM_SEQS=999999 (invalid, exceeds server capacity)

run_benchmark_serving calls reconfigure_vllm_scheduler "$port"

Inside the function: curl -fsS -X POST .../pause succeeds (exit 0)

curl -fsS -X POST -G .../reconfigure_scheduler ... → server returns HTTP 400 → curl exits 22

Execution continues (no &&, no error check): curl -fsS -X POST .../resume → succeeds (exit 0)

Function returns 0 (last command's exit code) — failure masked

Back in run_benchmark_serving: return value not checked, benchmark proceeds

vLLM server is still running with its original scheduler limits

Benchmark results are recorded as if they were obtained with the requested settings — silently incorrect

Fix: Chain the curl calls with && inside the function, and add || return 1 at the call site:

# Inside reconfigure_vllm_scheduler(): curl -fsS -X POST "$base_url/pause?mode=keep" && curl -fsS -X POST -G "$base_url/reconfigure_scheduler" "${params[@]}" && curl -fsS -X POST "$base_url/resume" # At the call site in run_benchmark_serving(): reconfigure_vllm_scheduler "$port" || return 1

1. Double reconfigure in test_reconfigure_sweep.sh: Phase B called reconfigure_vllm_scheduler manually then run_benchmark_serving called it again via the VLLM_DYNAMIC_RECONFIGURE hook. Remove the manual call and let the hook handle it. 2. Doc listed mode=abort but vLLM /reconfigure requires PAUSED_ALL which maps to mode=keep. Fix the Requirements section. 3. No error recovery in reconfigure_vllm_scheduler: if /reconfigure failed, curl exited non-zero, set -e killed the function, and the server stayed paused forever. Now capture the exit code, always call /resume, then propagate the error. 4. --force-reinstall in wheel mode reinstalls all dependencies. Use --no-deps --force-reinstall to only replace the vllm package. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

functionstackx

does this work with cudagraphs, aiter, amd, flashinfer, torch compile or any other

Image: semianalysiswork/vllm-reconfigure:latest Based on vllm/vllm-openai:v0.18.0 with reconfigure API overlay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Standalone workflow_dispatch workflow that runs benchmarks/test_reconfigure_sweep.sh on any GPU runner using the semianalysiswork/vllm-reconfigure:latest image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

JordanNanos · 2026-04-15T00:57:47Z

does this work with cudagraphs, aiter, amd, flashinfer, torch compile or any other

@functionstackx unlikely, only vllm has /pause and /resume from what I can tell

…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Add vLLM dynamic scheduler reconfigure hook

dd1d58d

JordanNanos requested a review from a team April 14, 2026 21:38

github-project-automation bot added this to InferenceMAX Board Apr 14, 2026

JordanNanos changed the title ~~Add vLLM dynamic scheduler reconfigure hook~~ Add vLLM dynamic scheduler reconfigure for single-server sweeps Apr 14, 2026

Document patched vLLM distribution options

a195297

claude bot reviewed Apr 14, 2026

View reviewed changes

functionstackx requested changes Apr 14, 2026

View reviewed changes

JordanNanos and others added 2 commits April 14, 2026 15:08

Update docs with pre-built Docker Hub image and run instructions

7c5859d

Image: semianalysiswork/vllm-reconfigure:latest Based on vllm/vllm-openai:v0.18.0 with reconfigure API overlay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add workflow for scheduler reconfigure A/B test

c0859c1

Standalone workflow_dispatch workflow that runs benchmarks/test_reconfigure_sweep.sh on any GPU runner using the semianalysiswork/vllm-reconfigure:latest image. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-advanced-security AI found potential problems Apr 15, 2026

View reviewed changes

Comment thread .github/workflows/test-reconfigure.yml Fixed

Potential fix for pull request finding 'CodeQL / Workflow does not co…

fd47046

…ntain permissions' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vLLM dynamic scheduler reconfigure for single-server sweeps#1029

Add vLLM dynamic scheduler reconfigure for single-server sweeps#1029
JordanNanos wants to merge 8 commits intomainfrom
vllm-dynamic-scheduler-reconfigure

JordanNanos commented Apr 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

JordanNanos commented Apr 14, 2026

Uh oh!

claude bot Apr 14, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

JordanNanos commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

JordanNanos commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pre-built image

Single-node test

Test plan

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

JordanNanos commented Apr 14, 2026

Uh oh!

claude bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JordanNanos commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JordanNanos commented Apr 14, 2026 •

edited

Loading