Skip to content

Power issues with dual NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition #987

@shahizat

Description

@shahizat

NVIDIA Open GPU Kernel Modules Version

NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.95.05 Release Build (dvs-builder@U22-I3-B17-02-5) Tue Sep 23 09:55:41 UTC 2025

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 24.04.3 LTS

Kernel Release

6.14.0-37-generic

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

GPU 0: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-ef5135f3-1177-e4c8-dd47-3818ddbe9182) GPU 1: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-9d278fa4-f945-cee7-9cb4-2626f3fc5

Describe the bug

Hello,

We've encountered an issue when running LLMs using inference frameworks like vLLM or Sglang in a multi GPU configuration. When I attempt to shut down the machine, either via sudo shutdown now or the desktop UI, it occasionally reboots instead of powering off. After it reboots once, I am usually able to shut it down normally. The issue is non-deterministic. It sometimes shuts down correctly, but other times it triggers a restart. We tested on the four machines with below configuration. The same issue on all machines. Please help to fix it.

  • Motherboard: Gibabyte TRX50 AI TOP
  • CPU: AMD Ryzen Threadripper 9960X 24-Cores
  • GPU: 2xNVIDIA RTX PRO 6000 Blackwell Max-Q
  • PSU: FSP2500-57APB
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX PRO 6000 Blac...    Off |   00000000:21:00.0 Off |                  Off |
| 30%   33C    P8              5W /  300W |     276MiB /  97887MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX PRO 6000 Blac...    Off |   00000000:C1:00.0 Off |                  Off |
| 30%   34C    P8             15W /  300W |      15MiB /  97887MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            2126      G   /usr/lib/xorg/Xorg                      118MiB |
|    0   N/A  N/A            2276      G   /usr/bin/gnome-shell                     24MiB |
|    1   N/A  N/A            2126      G   /usr/lib/xorg/Xorg                        4MiB |


cat /proc/driver/nvidia/params | grep DynamicPowerManagement
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200



cat /proc/driver/nvidia/gpus/0000\:21\:00.0/power
Runtime D3 status:          Disabled by default
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Not Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Not Supported
 Status:                    Disabled

Notebook Dynamic Boost:     Not Supported



cat /proc/driver/nvidia/gpus/0000\:c1\:00.0/power
Runtime D3 status:          Disabled by default
Video Memory:               Active

GPU Hardware Support:
 Video Memory Self Refresh: Not Supported
 Video Memory Off:          Supported

S0ix Power Management:
 Platform Support:          Not Supported
 Status:                    Disabled

Notebook Dynamic Boost:     Not Supported


To Reproduce

  1. vllm serve --model Qwen/Qwen3-VL-30B-A3B-Instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.9
  2. sudo shutdown now
  3. It restarts instead of shutting down

Bug Incidence

Always

nvidia-bug-report.log.gz


Running nvidia-bug-report.sh... complete.


Summary of Skipped Sections:

Skipped Component                   | Details
================================================================================
ldd output                          | glxinfo not found
--------------------------------------------------------------------------------
vulkaninfo output                   | vulkaninfo not found
--------------------------------------------------------------------------------
ibstat output                       | ibstat not found
--------------------------------------------------------------------------------
acpidump output                     | acpidump not found
--------------------------------------------------------------------------------
mst output                          | mst not found
--------------------------------------------------------------------------------
nvlsm-bug-report.sh output          | nvlsm-bug-report.sh not found
--------------------------------------------------------------------------------

Summary of Errors:

Error Component                     | Details                                                      | Resolution
=========================================================================================================================

More Info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions