Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## 0.4.5 — 2026-03-16

- **Review findings now actually get fixed, not just listed.** `/review` and `/ship` used to print informational findings (dead code, test gaps, N+1 queries) and then ignore them. Now every finding gets action: obvious mechanical fixes are applied automatically, and genuinely ambiguous issues are batched into a single question instead of 8 separate prompts. You see `[AUTO-FIXED] file:line Problem → what was done` for each auto-fix.
- **You control the line between "just fix it" and "ask me first."** Dead code, stale comments, N+1 queries get auto-fixed. Security issues, race conditions, design decisions get surfaced for your call. The classification lives in one place (`review/checklist.md`) so both `/review` and `/ship` stay in sync.

### Fixed

- **`$B js "const x = await fetch(...); return x.status"` now works.** The `js` command used to wrap everything as an expression — so `const`, semicolons, and multi-line code all broke. It now detects statements and uses a block wrapper, just like `eval` already did.
Expand All @@ -10,6 +13,9 @@

### For contributors

- Gate Classification → Severity Classification rename (severity determines presentation order, not whether you see a prompt).
- Fix-First Heuristic section added to `review/checklist.md` — the canonical AUTO-FIX vs ASK classification.
- New validation test: `Fix-First Heuristic exists in checklist and is referenced by review + ship`.
- Extracted `needsBlockWrapper()` and `wrapForEvaluate()` helpers in `read-commands.ts` — shared by both `js` and `eval` commands (DRY).
- Added `getRefRole()` to `BrowserManager` — exposes ARIA role for ref selectors without changing `resolveRef` return type.
- Click handler auto-routes `[role=option]` refs to `selectOption()` via parent `<select>`, with DOM `tagName` check to avoid blocking custom listbox components.
Expand Down
55 changes: 47 additions & 8 deletions review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,14 +157,53 @@ Follow the output format specified in the checklist. Respect the suppressions

---

## Step 5: Output findings
## Step 5: Fix-First Review

**Always output ALL findings**both critical and informational. The user must see every issue.
**Every finding gets actionnot just critical ones.**

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`

### Step 5a: Classify each finding

For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational findings lean
toward AUTO-FIX.

### Step 5b: Auto-fix all AUTO-FIX items

Apply each fix directly. For each one, output a one-line summary:
`[AUTO-FIXED] [file:line] Problem → what you did`

### Step 5c: Batch-ask about ASK items

If there are ASK items remaining, present them in ONE AskUserQuestion:

- List each item with a number, the severity label, the problem, and a recommended fix
- For each item, provide options: A) Fix as recommended, B) Skip
- Include an overall RECOMMENDATION

Example format:
```
I auto-fixed 5 issues. 2 need your input:
1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
Fix: Add `WHERE status = 'draft'` to the UPDATE
→ A) Fix B) Skip
2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
Fix: Add JSON schema validation
→ A) Fix B) Skip
RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
```

If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.

### Step 5d: Apply user-approved fixes

Apply fixes for items where the user chose "Fix." Output what was fixed.

If no ASK items exist (everything was AUTO-FIX), skip the question entirely.

### Greptile comment resolution

Expand All @@ -174,7 +213,7 @@ After outputting your own findings, if Greptile comments were classified in Step

Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.

1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
Expand Down Expand Up @@ -223,7 +262,7 @@ If no documentation files exist, skip this step silently.
## Important Rules

- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
55 changes: 47 additions & 8 deletions review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,53 @@ Follow the output format specified in the checklist. Respect the suppressions

---

## Step 5: Output findings
## Step 5: Fix-First Review

**Always output ALL findings**both critical and informational. The user must see every issue.
**Every finding gets actionnot just critical ones.**

- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then `RECOMMENDATION: Choose A because [one-line reason]`, then options (A: Fix it now, B: Acknowledge, C: False positive — skip).
After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
- If only non-critical issues found: output findings. No further action needed.
- If no issues found: output `Pre-Landing Review: No issues found.`
Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`

### Step 5a: Classify each finding

For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational findings lean
toward AUTO-FIX.

### Step 5b: Auto-fix all AUTO-FIX items

Apply each fix directly. For each one, output a one-line summary:
`[AUTO-FIXED] [file:line] Problem → what you did`

### Step 5c: Batch-ask about ASK items

If there are ASK items remaining, present them in ONE AskUserQuestion:

- List each item with a number, the severity label, the problem, and a recommended fix
- For each item, provide options: A) Fix as recommended, B) Skip
- Include an overall RECOMMENDATION

Example format:
```
I auto-fixed 5 issues. 2 need your input:
1. [CRITICAL] app/models/post.rb:42 — Race condition in status transition
Fix: Add `WHERE status = 'draft'` to the UPDATE
→ A) Fix B) Skip
2. [INFORMATIONAL] app/services/generator.rb:88 — LLM output not type-checked before DB write
Fix: Add JSON schema validation
→ A) Fix B) Skip
RECOMMENDATION: Fix both — #1 is a real race condition, #2 prevents silent data corruption.
```

If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead of batching.

### Step 5d: Apply user-approved fixes

Apply fixes for items where the user chose "Fix." Output what was fixed.

If no ASK items exist (everything was AUTO-FIX), skip the question entirely.

### Greptile comment resolution

Expand All @@ -92,7 +131,7 @@ After outputting your own findings, if Greptile comments were classified in Step

Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.

1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
1. **VALID & ACTIONABLE comments:** These are included in your findings — they follow the Fix-First flow (auto-fixed if mechanical, batched into ASK if not) (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
Expand Down Expand Up @@ -141,7 +180,7 @@ If no documentation files exist, skip this step silently.
## Important Rules

- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
- **Fix-first, not read-only.** AUTO-FIX items are applied directly. ASK items are only applied after user approval. Never commit, push, or create PRs — that's /ship's job.
- **Be terse.** One line problem, one line fix. No preamble.
- **Only flag real problems.** Skip anything that's fine.
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
51 changes: 42 additions & 9 deletions review/checklist.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,23 @@
Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.

**Two-pass review:**
- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. Highest severity.
- **Pass 2 (INFORMATIONAL):** Run all remaining categories. Lower severity but still actioned.

All findings get action via Fix-First Review: obvious mechanical fixes are applied automatically,
genuinely ambiguous issues are batched into a single user question.

**Output format:**

```
Pre-Landing Review: N issues (X critical, Y informational)
**CRITICAL** (blocking /ship):
- [file:line] Problem description
Fix: suggested fix
**AUTO-FIXED:**
- [file:line] Problem → fix applied
**Issues** (non-blocking):
**NEEDS INPUT:**
- [file:line] Problem description
Fix: suggested fix
Recommended fix: suggested fix
```

If no issues found: `Pre-Landing Review: No issues found.`
Expand Down Expand Up @@ -102,10 +104,10 @@ To do this: use Grep to find all references to the sibling values (e.g., grep fo

---

## Gate Classification
## Severity Classification

```
CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
CRITICAL (highest severity): INFORMATIONAL (lower severity):
├─ SQL & Data Safety ├─ Conditional Side Effects
├─ Race Conditions & Concurrency ├─ Magic Numbers & String Coupling
├─ LLM Output Trust Boundary ├─ Dead Code & Consistency
Expand All @@ -115,7 +117,38 @@ CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
├─ Time Window Safety
├─ Type Coercion at Boundaries
└─ View/Frontend
All findings are actioned via Fix-First Review. Severity determines
presentation order and classification of AUTO-FIX vs ASK — critical
findings lean toward ASK (they're riskier), informational findings
lean toward AUTO-FIX (they're more mechanical).
```

---

## Fix-First Heuristic

This heuristic is referenced by both `/review` and `/ship`. It determines whether
the agent auto-fixes a finding or asks the user.

```
AUTO-FIX (agent fixes without asking): ASK (needs human judgment):
├─ Dead code / unused variables ├─ Security (auth, XSS, injection)
├─ N+1 queries (missing .includes()) ├─ Race conditions
├─ Stale comments contradicting code ├─ Design decisions
├─ Magic numbers → named constants ├─ Large fixes (>20 lines)
├─ Missing LLM output validation ├─ Enum completeness
├─ Version/path mismatches ├─ Removing functionality
├─ Variables assigned but never read └─ Anything changing user-visible
└─ Inline styles, O(n*m) view lookups behavior
```

**Rule of thumb:** If the fix is mechanical and a senior engineer would apply it
without discussion, it's AUTO-FIX. If reasonable engineers could disagree about
the fix, it's ASK.

**Critical findings default toward ASK** (they're inherently riskier).
**Informational findings default toward AUTO-FIX** (they're more mechanical).

---

Expand Down
29 changes: 18 additions & 11 deletions ship/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- On the base branch (abort)
- Merge conflicts that can't be auto-resolved (stop, show conflicts)
- Test failures (stop, show failures)
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
- Pre-landing review finds ASK items that need user judgment
- MINOR or MAJOR version bump needed (ask — see Step 4)
- Greptile review comments that need user decision (complex fixes, false positives)
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
Expand All @@ -120,6 +120,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
- Commit message approval (auto-commit)
- Multi-file changesets (auto-split into bisectable commits)
- TODOS.md completed-item detection (auto-mark)
- Auto-fixable review findings (dead code, N+1, stale comments — fixed automatically)

---

Expand Down Expand Up @@ -243,19 +244,25 @@ Review the diff for structural issues that tests don't catch.
- **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
- **Pass 2 (INFORMATIONAL):** All remaining categories

4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
4. **Classify each finding as AUTO-FIX or ASK** per the Fix-First Heuristic in
checklist.md. Critical findings lean toward ASK; informational lean toward AUTO-FIX.

5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
5. **Auto-fix all AUTO-FIX items.** Apply each fix. Output one line per fix:
`[AUTO-FIXED] [file:line] Problem → what you did`

6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
- The problem (`file:line` + description)
- `RECOMMENDATION: Choose A because [one-line reason]`
- Options: A) Fix it now, B) Acknowledge and ship anyway, C) It's a false positive — skip
After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
6. **If ASK items remain,** present them in ONE AskUserQuestion:
- List each with number, severity, problem, recommended fix
- Per-item options: A) Fix B) Skip
- Overall RECOMMENDATION
- If 3 or fewer ASK items, you may use individual AskUserQuestion calls instead

7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
7. **After all fixes (auto + user-approved):**
- If ANY fixes were applied: commit fixed files by name (`git add <fixed-files> && git commit -m "fix: pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test.
- If no fixes applied (all ASK items skipped, or no issues found): continue to Step 4.

8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
8. Output summary: `Pre-Landing Review: N issues — M auto-fixed, K asked (J fixed, L skipped)`

If no issues found: `Pre-Landing Review: No issues found.`

Save the review output — it goes into the PR body in Step 8.

Expand Down Expand Up @@ -488,7 +495,7 @@ EOF
- **Never skip tests.** If tests fail, stop.
- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
- **Never force push.** Use regular `git push` only.
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
- **Never ask for confirmation** except for MINOR/MAJOR version bumps and pre-landing review ASK items (batched into at most one AskUserQuestion).
- **Always use the 4-digit version format** from the VERSION file.
- **Date format in CHANGELOG:** `YYYY-MM-DD`
- **Split commits for bisectability** — each commit = one logical change.
Expand Down
Loading