Skip to content

Commit b94de78

Browse files
Fix cuda_toolkit_check driver version and add cuda-bindings dep (#148)
## Summary Two related fixes surfaced while smoke-testing #137 on a fresh Brev box: - **`cuda_toolkit_check` was reading the kernel driver, not the CUDA driver.** `get_driver_version(kernel_mode=True)` returns the NVIDIA kernel module version (e.g. `580` from `580.126.09`), not the CUDA Driver API version (e.g. `13` from CUDA 13.0). The verbose message also printed `Driver supports CUDA 580`, which is what tipped this off. Dropping `kernel_mode=True` makes `get_driver_version()` default to the CUDA Driver API mode and the comparison logic actually fires. - **`cuda-bindings` is now declared as a runtime dep, and the conda recipe gets the missing `cuda-core` it should have had since #141.** `cuda-core` calls into `cuda.bindings.driver` via lazy import and without `cuda-bindings` installed, `cuda_toolkit_check` raises `ImportError: cuda.bindings 12.x or 13.x must be installed` on a fresh `pip install rapids-cli`. The pin `>=12.9.6,!=13.0.*,!=13.1.*` excludes the cuda-bindings 13.0/13.1 wheels and is compatible with both CUDA 12 and CUDA 13 driver hosts (verified with cuda-bindings 12.9.6 against a CUDA 13 environment and cuda-bindings 13.2 against a CUDA 12 environment). A regression test (`test_gather_toolkit_info_driver_major_is_cuda_major`) exercises `_gather_toolkit_info()` end-to-end and asserts `driver_major < 100` to ensure that we are getting the CUDA major version and not the driver version Closes #145. --------- Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
1 parent a64f0b0 commit b94de78

5 files changed

Lines changed: 22 additions & 2 deletions

File tree

conda/recipes/rapids-cli/recipe.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ requirements:
3131
run:
3232
- python
3333
- importlib-metadata >=4.13.0
34+
- cuda-bindings >=12.9.6,!=13.0.*,!=13.1.*
35+
- cuda-core >=0.6.0
3436
- cuda-pathfinder >=1.2.3
3537
- nvidia-ml-py >=12.0
3638
- packaging

dependencies.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,10 @@ dependencies:
6262
- output_types: [conda, requirements, pyproject]
6363
packages:
6464
- cuda-core >=0.6.0
65+
# NVML APIs we use via cuda.core.system landed in cuda-bindings
66+
# 12.9.6 (CUDA 12) and 13.2.0 (CUDA 13). The 13.0/13.1
67+
# wheels pre-date the 13.x landing and are excluded.
68+
- cuda-bindings>=12.9.6,!=13.0.*,!=13.1.*
6569
- nvidia-ml-py>=12.0
6670
- cuda-pathfinder >=1.2.3
6771
- packaging

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ license-files = ["LICENSE"]
77
readme = "README.md"
88
requires-python = ">=3.10"
99
dependencies = [
10+
"cuda-bindings>=12.9.6,!=13.0.*,!=13.1.*",
1011
"cuda-core >=0.6.0",
1112
"cuda-pathfinder >=1.2.3",
1213
"importlib-metadata >= 4.13.0; python_version < '3.12'",

rapids_cli/doctor/checks/cuda_toolkit.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,9 +173,8 @@ def _gather_toolkit_info() -> CudaToolkitInfo: # pragma: no cover
173173
except (DynamicLibNotFoundError, RuntimeError):
174174
info.missing_libs.append(soname)
175175

176-
# Get driver version
177176
try:
178-
info.driver_major = get_driver_version(kernel_mode=True)[0]
177+
info.driver_major = get_driver_version()[0]
179178
except Exception:
180179
info.driver_major = None
181180

rapids_cli/tests/test_cuda_toolkit.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from rapids_cli.doctor.checks.cuda_toolkit import (
99
CudaToolkitInfo,
1010
_ctypes_cuda_version,
11+
_gather_toolkit_info,
1112
_get_toolkit_cuda_major,
1213
cuda_toolkit_check,
1314
)
@@ -176,3 +177,16 @@ def test_check_cuda_home_newer_than_driver(set_toolkit_info):
176177
):
177178
with pytest.raises(ValueError, match="CUDA_HOME"):
178179
cuda_toolkit_check()
180+
181+
182+
def test_gather_toolkit_info_driver_major_is_cuda_major():
183+
"""Regression: driver_major must be the CUDA Driver API major, not the kernel driver major."""
184+
try:
185+
info = _gather_toolkit_info()
186+
except Exception as e:
187+
pytest.skip(f"_gather_toolkit_info unavailable on this platform: {e}")
188+
if info.driver_major is not None:
189+
assert info.driver_major < 100, (
190+
f"driver_major={info.driver_major} looks like a kernel driver "
191+
f"version, not a CUDA Driver API major"
192+
)

0 commit comments

Comments
 (0)