Skip to content

ci: auto-detect GPU arch, add bench workflow for RTX 5060 Ti#1

Open
robtaylor wants to merge 2 commits into
mainfrom
ci/5060-workflows
Open

ci: auto-detect GPU arch, add bench workflow for RTX 5060 Ti#1
robtaylor wants to merge 2 commits into
mainfrom
ci/5060-workflows

Conversation

@robtaylor

Copy link
Copy Markdown

Summary

  • gpu-ci.yml: auto-detect compute capability from nvidia-smi and export INFERRS_WMMA_ARCH + CUDA_COMPUTE_CAP so WMMA kernels compile to matching SASS. Adds early nvcc version check for Blackwell (sm_120 requires CUDA >= 12.8).
  • gpu-bench.yml: new workflow_dispatch workflow for inferrs bench + optional nsys profiling on the self-hosted runner. Uploads traces as artifacts.

Motivated by nvidia-runner-1 switching from Pascal (sm_61) to RTX 5060 Ti (sm_120, Blackwell). The previous hardcoded sm_80 SASS won't load on Blackwell.

Test plan

  • Merge to main and verify GPU CI triggers, detects sm_120, builds with INFERRS_WMMA_ARCH=120
  • Verify CUDA toolkit on runner is >= 12.8 (required for sm_120)
  • Manually trigger GPU Bench workflow with a small MoE model

🤖 Generated with Claude Code

gpu-ci.yml: detect the runner's compute capability via nvidia-smi and
export INFERRS_WMMA_ARCH + CUDA_COMPUTE_CAP so candle-kernels/build.rs
compiles WMMA SASS matching the actual GPU. The previous default
(sm_80) only runs on Ampere; the RTX 5060 Ti is sm_120 (Blackwell).
Adds an early check that nvcc >= 12.8 when targeting sm_120+.

gpu-bench.yml: new workflow_dispatch workflow for running `inferrs
bench` and optional nsys profiling on the self-hosted GPU runner.
Produces timing stats and uploads nsys traces as artifacts.

Co-developed-by: Claude Code v2.1.104 (claude-opus-4-6)
@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

6 similar comments
@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

@blacksmith-sh

blacksmith-sh Bot commented Apr 15, 2026

Copy link
Copy Markdown

Blacksmith Account Suspended

This Blacksmith account requires additional verification. Jobs targeting Blacksmith runners will not be picked up and will remain queued until they timeout.

Please contact Blacksmith Support for assistance.

- Extract CUDA path detection, CC detection, and Blackwell nvcc
  version check into .github/scripts/cuda-setup.sh — eliminates
  duplication between gpu-ci.yml and gpu-bench.yml, and closes the
  gap where gpu-bench.yml was missing the Blackwell guard.
- Fail fast when no CUDA installation is found (was silent fallback).
- Replace grep -oP (Perl regex, non-portable) with grep -o + awk.
- Make model input required in gpu-bench.yml (was silently skipping
  bench when empty default was used).
- Use pre-built binary for nsys profiling (avoids redundant link).
- Merge redundant nvidia-smi calls into the shared setup script.
- Guard apt-get install with dpkg -s check for self-hosted runner.

Co-developed-by: Claude Code v2.1.104 (claude-opus-4-6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant