feat: add agent reviewer (--review) and bump to v0.4.4#292
feat: add agent reviewer (--review) and bump to v0.4.4#292rubenmarcus wants to merge 1 commit intomainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
ed90fa8 to
b549ad2
Compare
✔️ Bundle Size Analysis
Bundle breakdown |
Greptile SummaryThis PR introduces an LLM-powered agent reviewer step ( Key changes:
Issues found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Executor
participant Validation
participant Reviewer
participant Git
participant LLM
User->>Executor: runLoop({ review: true, ... })
loop Each Iteration (i > 1, pastWarmup)
Executor->>Executor: Agent runs task, produces changes
alt validationCommands configured
Executor->>Validation: runAllValidations()
Validation-->>Executor: ValidationResult[]
alt Validation fails
Executor->>Executor: lastValidationFeedback = feedback
Executor->>Executor: continue (next iteration)
else Validation passes
Executor->>Executor: circuitBreaker.recordSuccess()
Executor->>Executor: lastValidationFeedback = ''
end
end
alt options.review && hasChanges
Executor->>Reviewer: runReview(cwd)
Reviewer->>Git: git add --intent-to-add --all
Reviewer->>Git: git diff HEAD
Git-->>Reviewer: diff output
Reviewer->>Git: git restore --staged .
Reviewer->>LLM: tryCallLLM(diff, REVIEW_SYSTEM_PROMPT)
LLM-->>Reviewer: JSON findings[]
Reviewer->>Reviewer: parseFindings()
Reviewer-->>Executor: ReviewResult { passed, findings }
alt reviewResult.passed == false (has errors)
Executor->>Executor: circuitBreaker.recordFailure()
Executor->>Executor: lastValidationFeedback = formatReviewFeedback()
Executor->>Executor: continue (next iteration)
else reviewResult.passed == true
Executor->>Executor: circuitBreaker.recordSuccess()
Executor->>Executor: lastValidationFeedback = ''
end
end
alt options.commit && hasChanges
Executor->>Git: gitCommit()
end
end
Executor-->>User: LoopResult
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46
Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**
`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.
For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed
A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:
```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);
await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });
// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515
Comment:
**Reviewer LLM cost not counted by `costTracker`**
`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.
Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:
```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```
And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
// ...
const response = await tryCallLLM({ ... });
if (response && costTracker) {
costTracker.recordUsage(response); // track reviewer tokens
}
// ...
}
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: e1f9ef7 |
65a2d98 to
1f06989
Compare
1f06989 to
40b306d
Compare
| if (reviewResult) { | ||
| const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length; | ||
| const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : ''; | ||
| spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`)); | ||
| circuitBreaker.recordSuccess(); | ||
| lastValidationFeedback = ''; | ||
| } else { | ||
| spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`)); | ||
| } |
There was a problem hiding this comment.
Warning findings are silently swallowed when review passes
When the agent review passes but contains warning-severity findings, only the count is shown in the spinner suffix (e.g., "Agent review passed (2 warning(s))"). The actual warning messages are never logged to the console and are never fed back to the agent via lastValidationFeedback (which is correctly cleared to ''). This means the user and the agent have no visibility into the content of the warnings.
Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them:
| if (reviewResult) { | |
| const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length; | |
| const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : ''; | |
| spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`)); | |
| circuitBreaker.recordSuccess(); | |
| lastValidationFeedback = ''; | |
| } else { | |
| spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`)); | |
| } | |
| if (reviewResult) { | |
| const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning'); | |
| const warnCount = warnFindings.length; | |
| const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : ''; | |
| spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`)); | |
| for (const f of warnFindings) { | |
| log(chalk.dim(` ⚠️ ${f.message}`)); | |
| } | |
| circuitBreaker.recordSuccess(); | |
| lastValidationFeedback = ''; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1543-1551
Comment:
**Warning findings are silently swallowed when review passes**
When the agent review passes but contains `warning`-severity findings, only the count is shown in the spinner suffix (e.g., `"Agent review passed (2 warning(s))"`). The actual warning messages are never logged to the console and are never fed back to the agent via `lastValidationFeedback` (which is correctly cleared to `''`). This means the user and the agent have no visibility into the content of the warnings.
Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them:
```suggestion
if (reviewResult) {
const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning');
const warnCount = warnFindings.length;
const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
for (const f of warnFindings) {
log(chalk.dim(` ⚠️ ${f.message}`));
}
circuitBreaker.recordSuccess();
lastValidationFeedback = '';
}
```
How can I resolve this? If you propose a fix, please make it concise.40b306d to
282f255
Compare
282f255 to
82c7706
Compare
82c7706 to
0750d3c
Compare
0750d3c to
695ec81
Compare
695ec81 to
5d76d7d
Compare
- Add LLM-powered diff review step that runs after lint/build/test pass but before commit, catching security issues, logic errors, and pattern violations that automated checks miss - New src/loop/reviewer.ts module using existing tryCallLLM infrastructure (Anthropic/OpenAI/OpenRouter) with structured JSON findings output - Errors block commit and feed back via lastValidationFeedback; warnings are logged but non-blocking; gracefully skips if no diff or no API key - Wire --review CLI flag on run command and export public API types - Close 14 delivered issues (#212, #224, #225, #226, #227, #228, #229, #231, #232, #233, #237, #239, #240, #241) - Bump version to 0.4.4 Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019ce2d4-0f5f-742e-9721-3181f57267df
5d76d7d to
e1f9ef7
Compare
| await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false }); | ||
|
|
||
| const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false }); | ||
|
|
||
| // Clean up: remove intent-to-add entries so the index isn't permanently mutated | ||
| await execa('git', ['restore', '--staged', '.'], { cwd, reject: false }); |
There was a problem hiding this comment.
git restore --staged . unstages genuine pre-existing staged changes
git restore --staged . is designed to clean up the intent-to-add entries, but it indiscriminately unstages all staged changes — not just the ones added by git add --intent-to-add --all. If a user had previously staged specific files with git add (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.
For example:
- User runs
git add src/config.tsto stage a specific file - User runs
ralph-starter run "my task" --review - After
getDiffcompletes,src/config.tsis no longer staged — the user's staging state is destroyed
A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling git add --intent-to-add --all, and then restore only those files afterward:
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);
await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });
// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46
Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**
`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.
For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed
A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:
```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);
await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });
// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```
How can I resolve this? If you propose a fix, please make it concise.| // --- Agent reviewer: LLM-powered diff review before commit --- | ||
| if (options.review && hasChanges && i > 1 && pastWarmup) { |
There was a problem hiding this comment.
Reviewer LLM cost not counted by costTracker
runReview calls tryCallLLM internally, but this additional LLM call is invisible to the costTracker. When a user sets maxCost to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.
Consider passing a cost-tracking callback or the CostTracker instance into runReview so reviewer token usage is accounted for:
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);And in reviewer.ts:
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
// ...
const response = await tryCallLLM({ ... });
if (response && costTracker) {
costTracker.recordUsage(response); // track reviewer tokens
}
// ...
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515
Comment:
**Reviewer LLM cost not counted by `costTracker`**
`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.
Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:
```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```
And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
// ...
const response = await tryCallLLM({ ... });
if (response && costTracker) {
costTracker.recordUsage(response); // track reviewer tokens
}
// ...
}
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
Adds an LLM-powered agent reviewer step to the executor loop and prepares the v0.4.4 release.
Agent Reviewer (
--review)A new validation step that runs after lint/build/test pass but before commit. It analyzes the git diff with an LLM to catch issues that automated checks miss:
Usage:
ralph-starter run "my task" --reviewFindings with
severity: errorblock the commit and feed back into the next iteration vialastValidationFeedback. Warnings are logged but non-blocking. Gracefully skips if no diff exists or no LLM API key is available.Issue Cleanup
Closed 14 delivered issues that were still marked as open:
Open issues reduced from 19 → 5 (all genuine future work).
Files Changed
src/loop/reviewer.ts— new module (reviewer logic, LLM prompt, JSON parsing)src/loop/executor.ts— integrate reviewer step after validation, before commitsrc/cli.ts— add--reviewflag toruncommandsrc/commands/run.ts— wirereviewoption through toLoopOptionssrc/index.ts— export public API types (ReviewResult,runReview, etc.)package.json— bump version to 0.4.4