ci: use T4 + xlarge runners — re-enable CUDA/HIP, de-serialize Metal#112
Merged
Conversation
a72c2e1 to
d3c0b46
Compare
3d63663 to
df3fcc4
Compare
8a744d5 to
0e4849a
Compare
df3fcc4 to
66fd50c
Compare
0e4849a to
f7642db
Compare
The gpu-eda team plan adds two GitHub-hosted larger runners: tesla4-runner (4 vCPU + 1 NVIDIA T4) and macos-runner-xlarge (M2 Pro, 14 GB). Put them to work alongside the free self-hosted macos-runner-1. - CUDA Tests + HIP Tests (NVIDIA backend): un-gate (`if: false` removed) and move from the offline nvidia-runner-1 to tesla4-runner, every push. CUDA has had no CI coverage since 2026-05-01; the T4 runs CUDA natively and the HIP-on-NVIDIA codepath (HIP_PLATFORM=nvidia). Native AMD/HIP still needs an AMD runner. - Metal Tests + JTAG Minimal Cosim: conditional runs-on — free self-hosted macos-runner-1 on routine PR pushes (full coverage, no cost), offloading to the billed macos-runner-xlarge on `main` or a `ci:metal-xl`-labelled PR so they run in parallel with the disk-heavy MCU SoC Metal job. That job stays pinned to macos-runner-1 (xlarge has only 14 GB storage). - Add a workflow-level concurrency group (cancel-in-progress per ref) so rapid pushes don't pile up on the self-hosted / billed runners. - Register tesla4-runner + macos-runner-xlarge in .github/actionlint.yaml. Co-developed-by: Claude Code v2.1.162 (claude-opus-4-8)
Drop the `branches: [main, staged-aig-release]` filter on the pull_request trigger. It filtered by *base* branch, so PRs stacked on other feature branches got no CI until they cascaded down to a main base. Plain `pull_request:` runs CI on every PR regardless of base. The push trigger keeps its branch filter (we only want push-CI on main/staged-aig-release, not on every feature-branch push — PRs cover those). Co-developed-by: Claude Code v2.1.162 (claude-opus-4-8)
c55ee41 to
d06a8ec
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Puts the two new gpu-eda team-plan runners to work alongside the free self-hosted
macos-runner-1:tesla4-runner— 4 vCPU + 1 NVIDIA T4 (GitHub-hosted)macos-runner-xlarge— M2 Pro, 5-core CPU / 8-core GPU, 14 GB RAM/storage (GitHub-hosted)What changes
CUDA + HIP CI back online (on the T4, every push). Both jobs were
if: ${{ false }}and pinned to the offlinenvidia-runner-1since 2026-05-01 — CUDA has had zero CI coverage since. They now run ontesla4-runner:hipcc+HIP_PLATFORM=nvidia, so the T4 validates the HIP code path too. Native AMD/HIP still needs an AMD/ROCm runner (future).This directly restores backend coverage and unblocks #104 (CUDA/HIP sim-timing parity).
Metal de-serialized (xlarge, gated to main/label). The three Metal jobs used to serialize on the single self-hosted runner (the reason we stacked #91→#110→#111). The two light jobs (
Metal Tests,JTAG Minimal Cosim) now have a conditionalruns-on:macos-runner-1(full coverage, no cost).main/ aci:metal-xl-labelled PR: offload to the billedmacos-runner-xlarge, running in parallel with the disk-heavyMCU SoC Metal Simulationjob — which stays pinned tomacos-runner-1because xlarge has only 14 GB storage (big designs won't fit).Concurrency group added (
cancel-in-progressper ref) so rapid pushes don't pile up on the self-hosted / billed runners.Cost posture
T4 runs on every push (GPU coverage was the big gap); the billed macOS xlarge is gated to main/label so routine PRs cost nothing extra on macOS. Added the
ci:metal-xllabel for opting a PR into xlarge.tesla4-runnerandmacos-runner-xlarge. These must match the runner names registered in org settings. Note GitHub's standard Apple-Silicon larger-runner labels aremacos-latest-xlarge/macos-15-xlarge— ifmacos-runner-xlargeisn't a custom larger-runner you named, the gated jobs won't schedule. Easy to adjust if so.Stacking
Stacked on #111 (base
feat/bidir-tristate-readback). Cascade: #91 → #110 → #111 → #112. Each auto-retargets towardmainas the one below merges.Registered both labels in
.github/actionlint.yaml;actionlintclean (remaining findings are pre-existing shellcheck).