Skip to content

feat: add agent reviewer (--review) and bump to v0.4.4#292

Open
rubenmarcus wants to merge 1 commit intomainfrom
feat/agent-reviewer
Open

feat: add agent reviewer (--review) and bump to v0.4.4#292
rubenmarcus wants to merge 1 commit intomainfrom
feat/agent-reviewer

Conversation

@rubenmarcus
Copy link
Member

Summary

Adds an LLM-powered agent reviewer step to the executor loop and prepares the v0.4.4 release.

Agent Reviewer (--review)

A new validation step that runs after lint/build/test pass but before commit. It analyzes the git diff with an LLM to catch issues that automated checks miss:

  • Security: hardcoded secrets, insecure defaults, injection vulnerabilities
  • Logic errors: off-by-one, race conditions, null access
  • Pattern violations: deviations from codebase conventions
  • Missing error handling: unhandled promises, missing try/catch

Usage: ralph-starter run "my task" --review

Findings with severity: error block the commit and feed back into the next iteration via lastValidationFeedback. Warnings are logged but non-blocking. Gracefully skips if no diff exists or no LLM API key is available.

Issue Cleanup

Closed 14 delivered issues that were still marked as open:

Open issues reduced from 19 → 5 (all genuine future work).

Files Changed

  • src/loop/reviewer.ts — new module (reviewer logic, LLM prompt, JSON parsing)
  • src/loop/executor.ts — integrate reviewer step after validation, before commit
  • src/cli.ts — add --review flag to run command
  • src/commands/run.ts — wire review option through to LoopOptions
  • src/index.ts — export public API types (ReviewResult, runReview, etc.)
  • package.json — bump version to 0.4.4

@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

✔️ Bundle Size Analysis

Metric Value
Base 2590.27 KB
PR 2613.93 KB
Diff 23.66 KB (0%)
Bundle breakdown
156K	dist/auth
80K	dist/automation
4.0K	dist/cli.d.ts
4.0K	dist/cli.d.ts.map
20K	dist/cli.js
12K	dist/cli.js.map
584K	dist/commands
28K	dist/config
4.0K	dist/index.d.ts
4.0K	dist/index.d.ts.map
4.0K	dist/index.js
4.0K	dist/index.js.map
896K	dist/integrations
100K	dist/llm
1.1M	dist/loop
188K	dist/mcp
60K	dist/presets
92K	dist/setup
40K	dist/skills
392K	dist/sources
76K	dist/ui
144K	dist/utils
336K	dist/wizard

@rubenmarcus
Copy link
Member Author

@greptileai

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR introduces an LLM-powered agent reviewer step (--review flag) that runs after lint/build/test validation passes but before auto-commit, and bumps the version to v0.4.4. It also adds white-label support (productName, dotDir) throughout the loop infrastructure and exports the new reviewer types/functions as public API.

Key changes:

  • src/loop/reviewer.ts — new module handling diff retrieval, LLM prompting, JSON parsing, and feedback formatting for structured code review findings
  • src/loop/executor.ts — integrates the reviewer between the validation block and the commit step; correctly respects pastWarmup and handles circuit-breaker tripping on reviewer errors
  • src/loop/memory.ts / src/loop/progress.ts — updated to accept a configurable dotDir for white-labeling the .ralph state directory
  • src/cli.ts / src/commands/run.ts — wire --review from CLI through to LoopOptions

Issues found:

  • git restore --staged . destroys pre-existing staged changes (src/loop/reviewer.ts:46): The cleanup call that was added to undo git add --intent-to-add --all is too broad — it unstages all staged files, not just the intent-to-add entries. A user who had staged specific changes before starting the loop will silently lose that staging state.
  • Reviewer LLM cost not counted by costTracker (src/loop/executor.ts:1518): runReview calls tryCallLLM internally with no hook into the executor's CostTracker, so reviewer token spend is invisible to maxCost enforcement. In a long loop, this can cause actual spend to silently exceed the configured budget.

Confidence Score: 3/5

  • Safe to merge after addressing the git index mutation bug and cost tracking gap; the core reviewer logic and loop integration are otherwise sound.
  • The reviewer integration is well-structured and the warmup/circuit-breaker handling is correct. However, the git restore --staged . cleanup is too broad and can silently destroy a user's staged changes — a concrete, reproducible data-loss scenario. The missing cost tracking is a budget-enforcement gap that becomes material in long loops with maxCost configured. These two issues lower confidence enough to warrant fixes before shipping.
  • src/loop/reviewer.ts (git index mutation) and src/loop/executor.ts (missing cost tracking) need attention before merge.

Important Files Changed

Filename Overview
src/loop/reviewer.ts New module implementing LLM-powered diff review. Contains a logic bug where git restore --staged . is too broad and destroys pre-existing user-staged changes. Core logic (JSON parsing, finding validation, feedback formatting) is well-structured.
src/loop/executor.ts Integrates the reviewer step after validation, before commit. Reviewer LLM cost is not tracked by costTracker, which can cause actual spend to silently exceed maxCost limits. Also adds productName and dotDir white-label options.
src/loop/memory.ts Minor update to support configurable dotDir parameter for white-labeling the .ralph directory. Changes are straightforward and correct.
src/loop/progress.ts Adds dotDir parameter to createProgressTracker to support configurable state directory. Change is clean and backward-compatible with the default value.
src/cli.ts Adds --review CLI flag wired to the new agent reviewer feature. Clean, minimal change.
src/commands/run.ts Adds review?: boolean to RunCommandOptions and threads it through to LoopOptions. Straightforward wiring.
src/index.ts Exports new public types (ReviewFinding, ReviewResult, ReviewSeverity) and runReview function from the reviewer module. Clean addition to the public API.
package.json Version bump from 0.4.3 to 0.4.4. No issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant Executor
    participant Validation
    participant Reviewer
    participant Git
    participant LLM

    User->>Executor: runLoop({ review: true, ... })
    loop Each Iteration (i > 1, pastWarmup)
        Executor->>Executor: Agent runs task, produces changes
        alt validationCommands configured
            Executor->>Validation: runAllValidations()
            Validation-->>Executor: ValidationResult[]
            alt Validation fails
                Executor->>Executor: lastValidationFeedback = feedback
                Executor->>Executor: continue (next iteration)
            else Validation passes
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.review && hasChanges
            Executor->>Reviewer: runReview(cwd)
            Reviewer->>Git: git add --intent-to-add --all
            Reviewer->>Git: git diff HEAD
            Git-->>Reviewer: diff output
            Reviewer->>Git: git restore --staged .
            Reviewer->>LLM: tryCallLLM(diff, REVIEW_SYSTEM_PROMPT)
            LLM-->>Reviewer: JSON findings[]
            Reviewer->>Reviewer: parseFindings()
            Reviewer-->>Executor: ReviewResult { passed, findings }
            alt reviewResult.passed == false (has errors)
                Executor->>Executor: circuitBreaker.recordFailure()
                Executor->>Executor: lastValidationFeedback = formatReviewFeedback()
                Executor->>Executor: continue (next iteration)
            else reviewResult.passed == true
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.commit && hasChanges
            Executor->>Git: gitCommit()
        end
    end
    Executor-->>User: LoopResult
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46

Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**

`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:

```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);

await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
  await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515

Comment:
**Reviewer LLM cost not counted by `costTracker`**

`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:

```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```

And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
  // ...
  const response = await tryCallLLM({ ... });
  if (response && costTracker) {
    costTracker.recordUsage(response); // track reviewer tokens
  }
  // ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: e1f9ef7

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch 2 times, most recently from 65a2d98 to 1f06989 Compare March 12, 2026 16:36
@rubenmarcus
Copy link
Member Author

@greptileai

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 1f06989 to 40b306d Compare March 12, 2026 16:50
@rubenmarcus
Copy link
Member Author

@greptileai

Comment on lines +1543 to +1551
if (reviewResult) {
const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length;
const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
circuitBreaker.recordSuccess();
lastValidationFeedback = '';
} else {
spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning findings are silently swallowed when review passes

When the agent review passes but contains warning-severity findings, only the count is shown in the spinner suffix (e.g., "Agent review passed (2 warning(s))"). The actual warning messages are never logged to the console and are never fed back to the agent via lastValidationFeedback (which is correctly cleared to ''). This means the user and the agent have no visibility into the content of the warnings.

Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them:

Suggested change
if (reviewResult) {
const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length;
const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
circuitBreaker.recordSuccess();
lastValidationFeedback = '';
} else {
spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`));
}
if (reviewResult) {
const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning');
const warnCount = warnFindings.length;
const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
for (const f of warnFindings) {
log(chalk.dim(` ⚠️ ${f.message}`));
}
circuitBreaker.recordSuccess();
lastValidationFeedback = '';
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1543-1551

Comment:
**Warning findings are silently swallowed when review passes**

When the agent review passes but contains `warning`-severity findings, only the count is shown in the spinner suffix (e.g., `"Agent review passed (2 warning(s))"`). The actual warning messages are never logged to the console and are never fed back to the agent via `lastValidationFeedback` (which is correctly cleared to `''`). This means the user and the agent have no visibility into the content of the warnings.

Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them:

```suggestion
        if (reviewResult) {
          const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning');
          const warnCount = warnFindings.length;
          const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
          spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
          for (const f of warnFindings) {
            log(chalk.dim(`  ⚠️ ${f.message}`));
          }
          circuitBreaker.recordSuccess();
          lastValidationFeedback = '';
        }
```

How can I resolve this? If you propose a fix, please make it concise.

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 40b306d to 282f255 Compare March 12, 2026 16:57
@rubenmarcus
Copy link
Member Author

@greptileai

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 282f255 to 82c7706 Compare March 12, 2026 17:06
@rubenmarcus
Copy link
Member Author

@greptileai

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 82c7706 to 0750d3c Compare March 12, 2026 17:23
@rubenmarcus
Copy link
Member Author

@greptileai

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 0750d3c to 695ec81 Compare March 12, 2026 17:30
@rubenmarcus
Copy link
Member Author

@greptileai

@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 695ec81 to 5d76d7d Compare March 12, 2026 17:37
@rubenmarcus
Copy link
Member Author

@greptileai

- Add LLM-powered diff review step that runs after lint/build/test pass
  but before commit, catching security issues, logic errors, and pattern
  violations that automated checks miss
- New src/loop/reviewer.ts module using existing tryCallLLM infrastructure
  (Anthropic/OpenAI/OpenRouter) with structured JSON findings output
- Errors block commit and feed back via lastValidationFeedback; warnings
  are logged but non-blocking; gracefully skips if no diff or no API key
- Wire --review CLI flag on run command and export public API types
- Close 14 delivered issues (#212, #224, #225, #226, #227, #228, #229,
  #231, #232, #233, #237, #239, #240, #241)
- Bump version to 0.4.4

Co-authored-by: Amp <amp@ampcode.com>
Amp-Thread-ID: https://ampcode.com/threads/T-019ce2d4-0f5f-742e-9721-3181f57267df
@rubenmarcus rubenmarcus force-pushed the feat/agent-reviewer branch from 5d76d7d to e1f9ef7 Compare March 12, 2026 17:46
@rubenmarcus
Copy link
Member Author

@greptileai

Comment on lines +41 to +46
await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });

const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: remove intent-to-add entries so the index isn't permanently mutated
await execa('git', ['restore', '--staged', '.'], { cwd, reject: false });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git restore --staged . unstages genuine pre-existing staged changes

git restore --staged . is designed to clean up the intent-to-add entries, but it indiscriminately unstages all staged changes — not just the ones added by git add --intent-to-add --all. If a user had previously staged specific files with git add (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:

  1. User runs git add src/config.ts to stage a specific file
  2. User runs ralph-starter run "my task" --review
  3. After getDiff completes, src/config.ts is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling git add --intent-to-add --all, and then restore only those files afterward:

// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);

await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
  await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46

Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**

`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:

```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);

await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
  await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +1514 to +1515
// --- Agent reviewer: LLM-powered diff review before commit ---
if (options.review && hasChanges && i > 1 && pastWarmup) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer LLM cost not counted by costTracker

runReview calls tryCallLLM internally, but this additional LLM call is invisible to the costTracker. When a user sets maxCost to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the CostTracker instance into runReview so reviewer token usage is accounted for:

const reviewResult = await runReview(options.cwd, costTracker ?? undefined);

And in reviewer.ts:

export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
  // ...
  const response = await tryCallLLM({ ... });
  if (response && costTracker) {
    costTracker.recordUsage(response); // track reviewer tokens
  }
  // ...
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515

Comment:
**Reviewer LLM cost not counted by `costTracker`**

`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:

```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```

And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
  // ...
  const response = await tryCallLLM({ ... });
  if (response && costTracker) {
    costTracker.recordUsage(response); // track reviewer tokens
  }
  // ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant