-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[BugFix] Fix DeepSeek-R1 hang with DP and MTP #30119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[BugFix] Fix DeepSeek-R1 hang with DP and MTP #30119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request addresses a hang issue related to CUDA graphs when using DeepSeek-R1 with Data Parallelism (DP) and Model-Tensor Parallelism (MTP). The fix aims to ensure that the drafter model correctly utilizes piecewise CUDA graphs when the main model is operating with any CUDA graph mode. While the overall intent of the fix appears to be achieved, there is a logical redundancy in the condition used to determine use_cudagraphs for the drafter, which could be simplified for better readability and maintainability.
| ( | ||
| is_graph_capturing | ||
| and cudagraph_runtime_mode == CUDAGraphMode.PIECEWISE | ||
| ) | ||
| or (cudagraph_runtime_mode != CUDAGraphMode.NONE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucasWilkinson could you double check this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the basic idea here is during a DP dummy run we may end up using FULL CGs for the target model; this was set to only using drafter PIECEWISE CGs when he target model was using PIECEWISE CGs (mostly for CG capture) meaning if the target model used FULL CGs this would revert the drafter to eager. This didn't match the behavior of non-dummy runs where if the main model used FULL CGs the drafter would still use PIECEWISE CGs.
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
There's and edge case where when the dummy_run runs FULL cudagraphs and as a result we mistakenly disable cudagraphs for the drafter; this means that the dummy_run doesn't call
pad_for_cudagraphafter_pad_batch_across_dpresulting in a hang.This PR is a temporary hack but we should rework the drafter similar to #28579 where we pad for cudagraphs before padding for DP.