Skip to content

Define the release soak standard (ring-promotion gate): time floor + traffic-scaled sample + health #548

Description

@don-petry

Why

The canary-ring promotion model (epic #495) advances a release next → ring0 → ring1 → stable, gated at every step by a soak on the prior ring. Today that gate is qualitative — the runbook says "confirm its callers' runs are healthy before advancing" — which can't be automated or applied consistently. The first real promotion of the six #482 reusables is in flight (.github-private#870, next now at v2.1.0), so we need a concrete, mechanical soak standard the promotion can be held to (and that canary-rollout.sh / #501 can eventually enforce for cross-repo agents).

What the data says (why a naive rule fails)

14-day run volume on the next tier (.github-private), executed = success+failure (skips excluded); 0 failures across all six → baseline ≈ 0%:

reusable exec'd/day 50%/day notes
agent-shield ~71 ~36 high volume
dependency-audit ~71 ~36 high volume
pr-review-mention ~52 ~26 38% skipped
auto-rebase 9 ~5 medium
dependabot-automerge 3 ~1.5 96% skipped
dependabot-rebase 0 0 no runs in 14d

A literal "N hours + 50% of daily average" breaks three ways:

  1. The time floor and the count aren't independent — at 71 runs/day, 36 runs takes ~12h, so the count silently dominates for busy reusables and the time floor never binds.
  2. It collapses at the low enddependabot-rebase (0 runs) gets no sample gate and may not fire in any short window; dependabot-automerge is 96% skips. Only time can protect these.
  3. Skips and volume-capping distort the average — must count executed runs only, and the busy three are so high-volume that 50% balloons the soak.

Proposed standard

A ring advance passes for a reusable when all hold (evaluated per reusable, per ring):

  • Time floor: ≥ 12 h since the channel tag moved.
  • Sample: executed healthy runs ≥ clamp(round(0.5 × daily_avg₁₄d), 5, 25)
    • daily_avg₁₄d = executed runs (success+failure; exclude skipped/cancelled) over the last 14 days, computed across the tier's repos (next→.github-private; ring0→.github; ring1→TalkTerm+bmad; stable→broad fleet).
    • floor 5 (below that a count is noise), cap 25 (beyond that is diminishing returns and just stalls promotion).
    • count only runs started after the cut (so they exercise the new candidate).
  • Health: failure_rate ≤ baseline + ε AND zero startup_failures — the count is the sample size; this is the pass/fail (reuses decide_gate / failure_rate_permille).
  • Low-volume fallback: if 0.5 × daily_avg < 5, drop the count gate and require 24 h + ≥ 1 executed healthy run instead (covers dependabot-automerge, dependabot-rebase).

Net effect on the in-flight promotion: busy three gate ~12 h (floor-bound); auto-rebase ~12–13 h (5 runs); the two dependabot reusables on the 24 h fallback — i.e. two waves, not six independent timers.

Open questions (to decide here)

  1. Per-ring floor. 12 h is sized for next (first real exposure; doesn't span a daily cycle at 12 h, but the count drags busy ones there anyway). Should ring0/ring1/stable use a shorter floor (e.g. 4–6 h) since the candidate already soaked upstream, or stay uniform at 12 h?
  2. Sample bounds. Are MIN=5 / MAX=25 right? (Rationale: ~25 clean runs makes a ~30% regression a <0.1% miss; <5 is statistically meaningless.)
  3. ε for the failure-rate comparison — baseline is ~0 today; allowed delta = 0, or a permille threshold (e.g. ≤ baseline + 50‰)?
  4. Per-reusable vs batch promotion (batching lets a 0-volume reusable gate the whole cohort into the 24 h path).
  5. Run attribution — "since cut" by created-after-timestamp (proxy) vs detecting the run actually resolved the candidate SHA.
  6. Where the standard livespetry-projects/.github/standards/ (org standard) cross-referenced from .github-private/docs/release/{versioning,runbook}.md.
  7. Enforcement — a standalone soak-check (reads channels + run history → per-reusable PASS/WAIT + reason) for the interim, then consumed by canary-rollout.sh for cross-repo agents (feat: implement issue #500 — pr-review: dispatcher fails on review_requested (empty gh api URL) + agent comment runaway #501).

Acceptance

  • Soak standard written (numbers + rationale + per-ring floor decision) in standards/ and cross-linked from the release runbook/versioning docs.
  • A mechanical soak-check that, given a reusable + tier, emits PASS/WAIT with the gating reason (time / sample / health / fallback).
  • Wired into the promotion path (interim: manual soak-check before each cut-release.sh --channel <ring> --push; target: canary-rollout.sh consumes it for cross-repo agents).

References

Until this lands, the in-flight #870 promotion uses a provisional 12 h + healthy-runs judgment before advancing each ring.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ciCI/CD pipeline issuesdev-leadFor dev-lead agent pickupdev-lead:needs-humandev-lead could not complete this issue; needs human attention

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions