Skip to content

Conversation

@MengqingCao
Copy link
Collaborator

@MengqingCao MengqingCao commented Dec 4, 2025

What this PR does / why we need it?

Fix dp padding logic in dummyrun. After vllm-project/vllm#28579, num_tokens will be padded in CudagraphDispatcher, thus we also need to do the pad in the dummy_run.

How was this patch tested?

Test locally with the following scripts

VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \
         --model wemaster/deepseek_mtp_main_random_bf16 \
         --trust-remote-code \
         --data-parallel-size 4 \
         --tensor-parallel-size 1 \
         --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \
         --enable-expert-parallel
vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly applies padding logic to dummy_run to align with changes in CudagraphDispatcher. The use of num_tokens_padded for tensor slicing and subsequent function calls is consistent. However, I've found a few issues: a critical bug where num_reqs_padded is used instead of num_tokens_padded when updating num_tokens_across_dp, a potential issue with MoE communication method selection using unpadded token counts, and a leftover debug print statement. Addressing these will ensure correctness and clean up the code.

@github-actions
Copy link

github-actions bot commented Dec 4, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan wangxiyuan added ready read for review ready-for-test start test by label for PR labels Dec 4, 2025
Copy link
Collaborator

@whx-sjtu whx-sjtu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bugfix is really imperative for DP scenarios. Can you fix ut and merge this ASAP? @MengqingCao

Signed-off-by: MengqingCao <cmq0113@163.com>
@MengqingCao
Copy link
Collaborator Author

@GDzhu01 I've tested this pr on the deepseek-r1-w8a8 model with the latest code, could you help test again? Thx!

@MengqingCao
Copy link
Collaborator Author

This bugfix is really imperative for DP scenarios. Can you fix ut and merge this ASAP? @MengqingCao

Yes, I think an approve from @GDzhu01 is needed after his test

@MengqingCao
Copy link
Collaborator Author

After offline discussion with @GDzhu01 , I confirmed that there is no problem with this pr, let's merge this!

@MengqingCao
Copy link
Collaborator Author

@MengqingCao MengqingCao merged commit 58db21f into vllm-project:main Dec 8, 2025
15 checks passed
@MengqingCao MengqingCao deleted the fixdp branch December 8, 2025 12:33
weijinqian0 pushed a commit to weijinqian0/vllm-ascend that referenced this pull request Dec 9, 2025
### What this PR does / why we need it?
Fix dp padding logic in dummyrun. After
vllm-project/vllm#28579, `num_tokens` will be
padded in `CudagraphDispatcher`, thus we also need to do the pad in the
dummy_run.

### How was this patch tested?
Test locally with the following scripts
```bash
VLLM_USE_MODELSCOPE=true python3 -m vllm.entrypoints.openai.api_server \
         --model wemaster/deepseek_mtp_main_random_bf16 \
         --trust-remote-code \
         --data-parallel-size 4 \
         --tensor-parallel-size 1 \
         --compilation-config '{"cudagraph_capture_sizes":[96],"cudagraph_mode":"FULL_DECODE_ONLY"}' \
         --enable-expert-parallel
```
```bash
vllm bench serve --model wemaster/deepseek_mtp_main_random_bf16 --endpoint /v1/completions --dataset-name random --random-input 512 --random-output 100 --num-prompts 48 --request-rate 1 --ready-check-timeout-sec 0
```

- vLLM version: v0.12.0
- vLLM main:
vllm-project/vllm@ad32e3e

Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants