skills(review-gates): structural classification doesn't override Gate 1#602
Merged
Merged
Conversation
The structural carve-out lets a single occurrence pass any evidence level for a targeted fix, even when Gate 1 would require 2-3 or 5+. Tighten so structural raises recurrence confidence but doesn't lower the magnitude bar; only Critical structural failures act on a single occurrence.
Merged
max-sixty
added a commit
that referenced
this pull request
May 27, 2026
## Why Cut the 0.1.2 release so consumer repos (and tend's own nightly regen) pick up the new `claude-interactive` harness and per-workflow `harness`/`model` overrides. ## What's new since 0.1.1 **New `claude-interactive` harness** (`max-sixty/tend/interactive@0.1.2`) — opt-in alternative to the released `claude` harness. PTY-supervised interactive `claude` via `script(1)`, end-of-turn detected through Stop/StopFailure hooks. Built as the trial path ahead of Anthropic's June 15 billing split between Agent-SDK metering and the flat Claude Code subscription. Smoke tested end-to-end on tend itself. PRs: #609, #611, #613, #614, #615, #616. **Per-workflow `harness` / `model` override** — adopters can flip a single workflow to a different harness or model without changing `.config/tend.yaml` defaults. #612. **Skill refinements** — nightly upstream-bot rebases (#605), running-in-ci PR bar (#604) and recheck (#573), env-filter loophole fix (#599), authorAssociation warning (#600), review-gates Gate 1 (#602). **Bug fix** — mention queue-delay now uses `comment.updated_at` so edit events report accurately (#595). ## Compatibility Released `claude` harness path is byte-identical; `claude-interactive` is strictly additive and opt-in. Consumer repos that don't touch `harness:` see no change beyond the new skill text and the mention-edit fix.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The structural-vs-stochastic classification in
review-gates.mdcurrently includes a 1-occurrence bypass: "One clear occurrence is sufficient evidence for a targeted fix." This lets non-Critical structural findings clear Gate 1 with a single occurrence, even when the evidence level says High (needs 2–3) or Medium (needs 5+).In practice this misroutes findings to PRs that should sit in the evidence gist until they accumulate. Feedback on #593:
PR #593 was structural + High (1 occurrence) and passed gates only because the structural carve-out overrode the High threshold.
Fix
Tighten the structural bullet so the classification only affects recurrence confidence, not the magnitude bar. Non-Critical structural failures fall back to their evidence-level thresholds (High = 2–3, Medium = 5+). Only Critical structural failures act on a single occurrence — which matches what Gate 1 already says for Critical.
Stochastic guidance is unchanged: still needs 5+ occurrences.
Effect on the rejected PR
Applying the revised criteria to #593: evidence level High, structural, 1 occurrence → falls short of High's 2–3 threshold → record in evidence gist, no PR.