Skip to content

ci: decouple playwright/kani/rocq from needs:[test] to reduce blast radius of rust-cpu queue saturation #299

@avrabe

Description

@avrabe

Problem

When the self-hosted rust-cpu pool gets saturated (rivet-core mutation testing held it for hours on 2026-05-17 — smithy commit 70401cd), the entire downstream CI tree blocks because 8 jobs chain off test:

  • playwright (ubuntu-latest)
  • vscode-extension (ubuntu-latest)
  • coverage (rust-cpu, legitimate — runs same suite under llvm-cov)
  • mutants (lean-mem, legitimate)
  • kani (ubuntu-latest, runs Kani harnesses — independent)
  • verus (lean-mem, Verus proofs — independent of test pass)
  • rocq (ubuntu-latest, builds Rocq theorems — independent)
  • release-results (light, but needs 5 upstream)

Smithy operator note (2026-05-17): ubuntu-latest-labeled jobs had their own runner capacity available, but couldn't start because they were queued behind the bottleneck.

Proposal — Phase 1

Remove needs: [test] from playwright, kani, rocq. These are independent of cargo-test passing.

Keep on coverage, mutants. Defer verus to Phase 2.

Phase 2 (out of scope)

  • Audit vscode-extension dep on test.
  • Reconsider release-results five-way fan-in.
  • Lower cargo-mutants --jobs or shard the rivet-core mutation suite.

Refs

  • 2026-05-17 incident: 4+ hour CI stall; smithy 70401cd bumped lean-mem cgroup 24G->32G. This issue addresses the workflow-level lever.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions