ci: auto-detect GPU arch, add bench workflow for RTX 5060 Ti#1
ci: auto-detect GPU arch, add bench workflow for RTX 5060 Ti#1robtaylor wants to merge 2 commits into
Conversation
gpu-ci.yml: detect the runner's compute capability via nvidia-smi and export INFERRS_WMMA_ARCH + CUDA_COMPUTE_CAP so candle-kernels/build.rs compiles WMMA SASS matching the actual GPU. The previous default (sm_80) only runs on Ampere; the RTX 5060 Ti is sm_120 (Blackwell). Adds an early check that nvcc >= 12.8 when targeting sm_120+. gpu-bench.yml: new workflow_dispatch workflow for running `inferrs bench` and optional nsys profiling on the self-hosted GPU runner. Produces timing stats and uploads nsys traces as artifacts. Co-developed-by: Claude Code v2.1.104 (claude-opus-4-6)
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
6 similar comments
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
Blacksmith Account SuspendedThis Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout. Please contact Blacksmith Support for assistance. |
- Extract CUDA path detection, CC detection, and Blackwell nvcc version check into .github/scripts/cuda-setup.sh — eliminates duplication between gpu-ci.yml and gpu-bench.yml, and closes the gap where gpu-bench.yml was missing the Blackwell guard. - Fail fast when no CUDA installation is found (was silent fallback). - Replace grep -oP (Perl regex, non-portable) with grep -o + awk. - Make model input required in gpu-bench.yml (was silently skipping bench when empty default was used). - Use pre-built binary for nsys profiling (avoids redundant link). - Merge redundant nvidia-smi calls into the shared setup script. - Guard apt-get install with dpkg -s check for self-hosted runner. Co-developed-by: Claude Code v2.1.104 (claude-opus-4-6)
Summary
nvidia-smiand exportINFERRS_WMMA_ARCH+CUDA_COMPUTE_CAPso WMMA kernels compile to matching SASS. Adds early nvcc version check for Blackwell (sm_120 requires CUDA >= 12.8).workflow_dispatchworkflow forinferrs bench+ optionalnsysprofiling on the self-hosted runner. Uploads traces as artifacts.Motivated by nvidia-runner-1 switching from Pascal (sm_61) to RTX 5060 Ti (sm_120, Blackwell). The previous hardcoded sm_80 SASS won't load on Blackwell.
Test plan
INFERRS_WMMA_ARCH=120🤖 Generated with Claude Code