Skip to content

Conversation

@GeoffreyWang1117
Copy link

Summary

  • Add a helpful hint to the DCP error message suggesting users try setting VLLM_ATTENTION_BACKEND to a compatible backend
  • Before: AssertionError: DCP requires attention impls to return the softmax lse for decode, but the impl FlashInferImpl does not return the softmax lse for decode.
  • After: Same message + Try setting VLLM_ATTENTION_BACKEND to a compatible backend such as FLASH_ATTN or FLASHINFER.

Test plan

  • Verified error message contains the new hint text
  • Verified pre-commit checks pass

Fixes #28407

Signed-off-by: GeoffreyWang1117 173976389+GeoffreyWang1117@users.noreply.github.com

Add a helpful hint to the DCP error message suggesting users try
setting VLLM_ATTENTION_BACKEND to a compatible backend such as
FLASH_ATTN or FLASHINFER when the current backend doesn't support
returning softmax LSE for decode.

Fixes vllm-project#28407

Signed-off-by: GeoffreyWang1117 <173976389+GeoffreyWang1117@users.noreply.github.com>

Signed-off-by:  <>
@github-actions
Copy link

github-actions bot commented Dec 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the v1 label Dec 6, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves an error message for Distributed Context Parallelism (DCP) by adding a hint about compatible attention backends. The change is helpful, but the list of suggested backends is incomplete. I've suggested an improvement to make the hint more comprehensive for users on different hardware platforms.

"does not return the softmax lse for decode."
"does not return the softmax lse for decode. "
"Try setting VLLM_ATTENTION_BACKEND to a compatible "
"backend such as FLASH_ATTN or FLASHINFER."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The list of suggested compatible backends is incomplete. For instance, it omits ROCM_FLASH, which is the primary compatible backend for ROCm users. To make this hint more helpful for users on different platforms, I suggest including more compatible backends in the message.

Suggested change
"backend such as FLASH_ATTN or FLASHINFER."
"backend such as FLASH_ATTN, FLASHINFER, or ROCM_FLASH."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Improve DCP error messages

1 participant