-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
NVIDIA Open GPU Kernel Modules Version
NVRM version: NVIDIA UNIX Open Kernel Module for x86_64 580.95.05 Release Build (dvs-builder@U22-I3-B17-02-5) Tue Sep 23 09:55:41 UTC 2025
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 24.04.3 LTS
Kernel Release
6.14.0-37-generic
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
GPU 0: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-ef5135f3-1177-e4c8-dd47-3818ddbe9182) GPU 1: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition (UUID: GPU-9d278fa4-f945-cee7-9cb4-2626f3fc5
Describe the bug
Hello,
We've encountered an issue when running LLMs using inference frameworks like vLLM or Sglang in a multi GPU configuration. When I attempt to shut down the machine, either via sudo shutdown now or the desktop UI, it occasionally reboots instead of powering off. After it reboots once, I am usually able to shut it down normally. The issue is non-deterministic. It sometimes shuts down correctly, but other times it triggers a restart. We tested on the four machines with below configuration. The same issue on all machines. Please help to fix it.
- Motherboard: Gibabyte TRX50 AI TOP
- CPU: AMD Ryzen Threadripper 9960X 24-Cores
- GPU: 2xNVIDIA RTX PRO 6000 Blackwell Max-Q
- PSU: FSP2500-57APB
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX PRO 6000 Blac... Off | 00000000:21:00.0 Off | Off |
| 30% 33C P8 5W / 300W | 276MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA RTX PRO 6000 Blac... Off | 00000000:C1:00.0 Off | Off |
| 30% 34C P8 15W / 300W | 15MiB / 97887MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2126 G /usr/lib/xorg/Xorg 118MiB |
| 0 N/A N/A 2276 G /usr/bin/gnome-shell 24MiB |
| 1 N/A N/A 2126 G /usr/lib/xorg/Xorg 4MiB |
cat /proc/driver/nvidia/params | grep DynamicPowerManagement
DynamicPowerManagement: 3
DynamicPowerManagementVideoMemoryThreshold: 200
cat /proc/driver/nvidia/gpus/0000\:21\:00.0/power
Runtime D3 status: Disabled by default
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Not Supported
Status: Disabled
Notebook Dynamic Boost: Not Supported
cat /proc/driver/nvidia/gpus/0000\:c1\:00.0/power
Runtime D3 status: Disabled by default
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Not Supported
Video Memory Off: Supported
S0ix Power Management:
Platform Support: Not Supported
Status: Disabled
Notebook Dynamic Boost: Not Supported
To Reproduce
- vllm serve --model Qwen/Qwen3-VL-30B-A3B-Instruct --tensor-parallel-size 2 --gpu-memory-utilization 0.9
- sudo shutdown now
- It restarts instead of shutting down
Bug Incidence
Always
nvidia-bug-report.log.gz
Running nvidia-bug-report.sh... complete.
Summary of Skipped Sections:
Skipped Component | Details
================================================================================
ldd output | glxinfo not found
--------------------------------------------------------------------------------
vulkaninfo output | vulkaninfo not found
--------------------------------------------------------------------------------
ibstat output | ibstat not found
--------------------------------------------------------------------------------
acpidump output | acpidump not found
--------------------------------------------------------------------------------
mst output | mst not found
--------------------------------------------------------------------------------
nvlsm-bug-report.sh output | nvlsm-bug-report.sh not found
--------------------------------------------------------------------------------
Summary of Errors:
Error Component | Details | Resolution
=========================================================================================================================
More Info
No response