ci: use T4 + xlarge runners — re-enable CUDA/HIP, de-serialize Metal by robtaylor · Pull Request #112 · gpu-eda/Jacquard

robtaylor · 2026-06-05T01:25:51Z

Summary

Puts the two new gpu-eda team-plan runners to work alongside the free self-hosted macos-runner-1:

tesla4-runner — 4 vCPU + 1 NVIDIA T4 (GitHub-hosted)
macos-runner-xlarge — M2 Pro, 5-core CPU / 8-core GPU, 14 GB RAM/storage (GitHub-hosted)

What changes

CUDA + HIP CI back online (on the T4, every push). Both jobs were if: ${{ false }} and pinned to the offline nvidia-runner-1 since 2026-05-01 — CUDA has had zero CI coverage since. They now run on tesla4-runner:

CUDA Tests: native on the T4.
HIP Tests (NVIDIA backend): the job builds with hipcc + HIP_PLATFORM=nvidia, so the T4 validates the HIP code path too. Native AMD/HIP still needs an AMD/ROCm runner (future).

This directly restores backend coverage and unblocks #104 (CUDA/HIP sim-timing parity).

Metal de-serialized (xlarge, gated to main/label). The three Metal jobs used to serialize on the single self-hosted runner (the reason we stacked #91→#110→#111). The two light jobs (Metal Tests, JTAG Minimal Cosim) now have a conditional runs-on:

runs-on: ${{ (github.ref == 'refs/heads/main' || contains(github.event.pull_request.labels.*.name, 'ci:metal-xl')) && 'macos-runner-xlarge' || 'macos-runner-1' }}

Routine PR pushes: stay on free macos-runner-1 (full coverage, no cost).
main / a ci:metal-xl-labelled PR: offload to the billed macos-runner-xlarge, running in parallel with the disk-heavy MCU SoC Metal Simulation job — which stays pinned to macos-runner-1 because xlarge has only 14 GB storage (big designs won't fit).

Concurrency group added (cancel-in-progress per ref) so rapid pushes don't pile up on the self-hosted / billed runners.

Cost posture

T4 runs on every push (GPU coverage was the big gap); the billed macOS xlarge is gated to main/label so routine PRs cost nothing extra on macOS. Added the ci:metal-xl label for opting a PR into xlarge.

⚠️ Confirm before merge

Runner label strings. I used the names exactly as given: tesla4-runner and macos-runner-xlarge. These must match the runner names registered in org settings. Note GitHub's standard Apple-Silicon larger-runner labels are macos-latest-xlarge / macos-15-xlarge — if macos-runner-xlarge isn't a custom larger-runner you named, the gated jobs won't schedule. Easy to adjust if so.
First CUDA/HIP run may be red. These haven't built in CI since May and main has moved ~45 commits; the first run is diagnostic, not a regression from this PR. They aren't required status checks, so they won't block merges.

Stacking

Stacked on #111 (base feat/bidir-tristate-readback). Cascade: #91 → #110 → #111 → #112. Each auto-retargets toward main as the one below merges.

Registered both labels in .github/actionlint.yaml; actionlint clean (remaining findings are pre-existing shellcheck).

The gpu-eda team plan adds two GitHub-hosted larger runners: tesla4-runner (4 vCPU + 1 NVIDIA T4) and macos-runner-xlarge (M2 Pro, 14 GB). Put them to work alongside the free self-hosted macos-runner-1. - CUDA Tests + HIP Tests (NVIDIA backend): un-gate (`if: false` removed) and move from the offline nvidia-runner-1 to tesla4-runner, every push. CUDA has had no CI coverage since 2026-05-01; the T4 runs CUDA natively and the HIP-on-NVIDIA codepath (HIP_PLATFORM=nvidia). Native AMD/HIP still needs an AMD runner. - Metal Tests + JTAG Minimal Cosim: conditional runs-on — free self-hosted macos-runner-1 on routine PR pushes (full coverage, no cost), offloading to the billed macos-runner-xlarge on `main` or a `ci:metal-xl`-labelled PR so they run in parallel with the disk-heavy MCU SoC Metal job. That job stays pinned to macos-runner-1 (xlarge has only 14 GB storage). - Add a workflow-level concurrency group (cancel-in-progress per ref) so rapid pushes don't pile up on the self-hosted / billed runners. - Register tesla4-runner + macos-runner-xlarge in .github/actionlint.yaml. Co-developed-by: Claude Code v2.1.162 (claude-opus-4-8)

Drop the `branches: [main, staged-aig-release]` filter on the pull_request trigger. It filtered by *base* branch, so PRs stacked on other feature branches got no CI until they cascaded down to a main base. Plain `pull_request:` runs CI on every PR regardless of base. The push trigger keeps its branch filter (we only want push-CI on main/staged-aig-release, not on every feature-branch push — PRs cover those). Co-developed-by: Claude Code v2.1.162 (claude-opus-4-8)

robtaylor force-pushed the feat/ci-runner-allocation branch from a72c2e1 to d3c0b46 Compare June 5, 2026 01:32

robtaylor force-pushed the feat/bidir-tristate-readback branch from 3d63663 to df3fcc4 Compare June 5, 2026 11:01

robtaylor force-pushed the feat/ci-runner-allocation branch from 8a744d5 to 0e4849a Compare June 5, 2026 11:01

robtaylor force-pushed the feat/bidir-tristate-readback branch from df3fcc4 to 66fd50c Compare June 5, 2026 13:00

robtaylor force-pushed the feat/ci-runner-allocation branch from 0e4849a to f7642db Compare June 5, 2026 13:00

Base automatically changed from feat/bidir-tristate-readback to main June 5, 2026 15:46

robtaylor added 2 commits June 5, 2026 16:48

robtaylor force-pushed the feat/ci-runner-allocation branch from c55ee41 to d06a8ec Compare June 5, 2026 15:48

robtaylor merged commit d0e25a8 into main Jun 5, 2026
15 checks passed

robtaylor deleted the feat/ci-runner-allocation branch June 5, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: use T4 + xlarge runners — re-enable CUDA/HIP, de-serialize Metal#112

ci: use T4 + xlarge runners — re-enable CUDA/HIP, de-serialize Metal#112
robtaylor merged 2 commits into
mainfrom
feat/ci-runner-allocation

robtaylor commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

robtaylor commented Jun 5, 2026

Summary

What changes

Cost posture

⚠️ Confirm before merge

Stacking

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant