Skip to content

Conversation

@LucasWilkinson
Copy link
Collaborator

@LucasWilkinson LucasWilkinson commented Dec 5, 2025

There's and edge case where when the dummy_run runs FULL cudagraphs and as a result we mistakenly disable cudagraphs for the drafter; this means that the dummy_run doesn't call pad_for_cudagraph after _pad_batch_across_dp resulting in a hang.

This PR is a temporary hack but we should rework the drafter similar to #28579 where we pad for cudagraphs before padding for DP.

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request addresses a hang issue related to CUDA graphs when using DeepSeek-R1 with Data Parallelism (DP) and Model-Tensor Parallelism (MTP). The fix aims to ensure that the drafter model correctly utilizes piecewise CUDA graphs when the main model is operating with any CUDA graph mode. While the overall intent of the fix appears to be achieved, there is a logical redundancy in the condition used to determine use_cudagraphs for the drafter, which could be simplified for better readability and maintainability.

@mergify mergify bot added deepseek Related to DeepSeek models v1 labels Dec 5, 2025
@LucasWilkinson LucasWilkinson marked this pull request as ready for review December 5, 2025 15:32
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
@LucasWilkinson LucasWilkinson added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 5, 2025
Comment on lines +4112 to +4116
(
is_graph_capturing
and cudagraph_runtime_mode == CUDAGraphMode.PIECEWISE
)
or (cudagraph_runtime_mode != CUDAGraphMode.NONE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LucasWilkinson could you double check this logic?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the basic idea here is during a DP dummy run we may end up using FULL CGs for the target model; this was set to only using drafter PIECEWISE CGs when he target model was using PIECEWISE CGs (mostly for CG capture) meaning if the target model used FULL CGs this would revert the drafter to eager. This didn't match the behavior of non-dummy runs where if the main model used FULL CGs the drafter would still use PIECEWISE CGs.

Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants