feat: add agent reviewer (--review) and bump to v0.4.4 by rubenmarcus · Pull Request #292 · multivmlabs/ralph-starter

rubenmarcus · 2026-03-12T16:23:31Z

Summary

Adds an LLM-powered agent reviewer step to the executor loop and prepares the v0.4.4 release.

Agent Reviewer (`--review`)

A new validation step that runs after lint/build/test pass but before commit. It analyzes the git diff with an LLM to catch issues that automated checks miss:

Security: hardcoded secrets, insecure defaults, injection vulnerabilities
Logic errors: off-by-one, race conditions, null access
Pattern violations: deviations from codebase conventions
Missing error handling: unhandled promises, missing try/catch

Usage: ralph-starter run "my task" --review

Findings with severity: error block the commit and feed back into the next iteration via lastValidationFeedback. Warnings are logged but non-blocking. Gracefully skips if no diff exists or no LLM API key is available.

Issue Cleanup

Closed 14 delivered issues that were still marked as open:

Open issues reduced from 19 → 5 (all genuine future work).

Files Changed

src/loop/reviewer.ts — new module (reviewer logic, LLM prompt, JSON parsing)
src/loop/executor.ts — integrate reviewer step after validation, before commit
src/cli.ts — add --review flag to run command
src/commands/run.ts — wire review option through to LoopOptions
src/index.ts — export public API types (ReviewResult, runReview, etc.)
package.json — bump version to 0.4.4

chatgpt-codex-connector · 2026-03-12T16:23:38Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

github-actions · 2026-03-12T16:26:00Z

✔️ Bundle Size Analysis

Metric	Value
Base	2590.27 KB
PR	2613.93 KB
Diff	23.66 KB (0%)

Bundle breakdown

156K	dist/auth
80K	dist/automation
4.0K	dist/cli.d.ts
4.0K	dist/cli.d.ts.map
20K	dist/cli.js
12K	dist/cli.js.map
584K	dist/commands
28K	dist/config
4.0K	dist/index.d.ts
4.0K	dist/index.d.ts.map
4.0K	dist/index.js
4.0K	dist/index.js.map
896K	dist/integrations
100K	dist/llm
1.1M	dist/loop
188K	dist/mcp
60K	dist/presets
92K	dist/setup
40K	dist/skills
392K	dist/sources
76K	dist/ui
144K	dist/utils
336K	dist/wizard

rubenmarcus · 2026-03-12T16:26:48Z

@greptileai

greptile-apps · 2026-03-12T16:30:24Z

Greptile Summary

This PR introduces an LLM-powered agent reviewer step (--review flag) that runs after lint/build/test validation passes but before auto-commit, and bumps the version to v0.4.4. It also adds white-label support (productName, dotDir) throughout the loop infrastructure and exports the new reviewer types/functions as public API.

Key changes:

src/loop/reviewer.ts — new module handling diff retrieval, LLM prompting, JSON parsing, and feedback formatting for structured code review findings
src/loop/executor.ts — integrates the reviewer between the validation block and the commit step; correctly respects pastWarmup and handles circuit-breaker tripping on reviewer errors
src/loop/memory.ts / src/loop/progress.ts — updated to accept a configurable dotDir for white-labeling the .ralph state directory
src/cli.ts / src/commands/run.ts — wire --review from CLI through to LoopOptions

Issues found:

git restore --staged . destroys pre-existing staged changes (src/loop/reviewer.ts:46): The cleanup call that was added to undo git add --intent-to-add --all is too broad — it unstages all staged files, not just the intent-to-add entries. A user who had staged specific changes before starting the loop will silently lose that staging state.
Reviewer LLM cost not counted by costTracker (src/loop/executor.ts:1518): runReview calls tryCallLLM internally with no hook into the executor's CostTracker, so reviewer token spend is invisible to maxCost enforcement. In a long loop, this can cause actual spend to silently exceed the configured budget.

Confidence Score: 3/5

Safe to merge after addressing the git index mutation bug and cost tracking gap; the core reviewer logic and loop integration are otherwise sound.
The reviewer integration is well-structured and the warmup/circuit-breaker handling is correct. However, the git restore --staged . cleanup is too broad and can silently destroy a user's staged changes — a concrete, reproducible data-loss scenario. The missing cost tracking is a budget-enforcement gap that becomes material in long loops with maxCost configured. These two issues lower confidence enough to warrant fixes before shipping.
src/loop/reviewer.ts (git index mutation) and src/loop/executor.ts (missing cost tracking) need attention before merge.

Important Files Changed

Filename	Overview
src/loop/reviewer.ts	New module implementing LLM-powered diff review. Contains a logic bug where `git restore --staged .` is too broad and destroys pre-existing user-staged changes. Core logic (JSON parsing, finding validation, feedback formatting) is well-structured.
src/loop/executor.ts	Integrates the reviewer step after validation, before commit. Reviewer LLM cost is not tracked by `costTracker`, which can cause actual spend to silently exceed `maxCost` limits. Also adds `productName` and `dotDir` white-label options.
src/loop/memory.ts	Minor update to support configurable `dotDir` parameter for white-labeling the `.ralph` directory. Changes are straightforward and correct.
src/loop/progress.ts	Adds `dotDir` parameter to `createProgressTracker` to support configurable state directory. Change is clean and backward-compatible with the default value.
src/cli.ts	Adds `--review` CLI flag wired to the new agent reviewer feature. Clean, minimal change.
src/commands/run.ts	Adds `review?: boolean` to `RunCommandOptions` and threads it through to `LoopOptions`. Straightforward wiring.
src/index.ts	Exports new public types (`ReviewFinding`, `ReviewResult`, `ReviewSeverity`) and `runReview` function from the reviewer module. Clean addition to the public API.
package.json	Version bump from 0.4.3 to 0.4.4. No issues.

Sequence Diagram

sequenceDiagram
    participant User
    participant Executor
    participant Validation
    participant Reviewer
    participant Git
    participant LLM

    User->>Executor: runLoop({ review: true, ... })
    loop Each Iteration (i > 1, pastWarmup)
        Executor->>Executor: Agent runs task, produces changes
        alt validationCommands configured
            Executor->>Validation: runAllValidations()
            Validation-->>Executor: ValidationResult[]
            alt Validation fails
                Executor->>Executor: lastValidationFeedback = feedback
                Executor->>Executor: continue (next iteration)
            else Validation passes
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.review && hasChanges
            Executor->>Reviewer: runReview(cwd)
            Reviewer->>Git: git add --intent-to-add --all
            Reviewer->>Git: git diff HEAD
            Git-->>Reviewer: diff output
            Reviewer->>Git: git restore --staged .
            Reviewer->>LLM: tryCallLLM(diff, REVIEW_SYSTEM_PROMPT)
            LLM-->>Reviewer: JSON findings[]
            Reviewer->>Reviewer: parseFindings()
            Reviewer-->>Executor: ReviewResult { passed, findings }
            alt reviewResult.passed == false (has errors)
                Executor->>Executor: circuitBreaker.recordFailure()
                Executor->>Executor: lastValidationFeedback = formatReviewFeedback()
                Executor->>Executor: continue (next iteration)
            else reviewResult.passed == true
                Executor->>Executor: circuitBreaker.recordSuccess()
                Executor->>Executor: lastValidationFeedback = ''
            end
        end
        alt options.commit && hasChanges
            Executor->>Git: gitCommit()
        end
    end
    Executor-->>User: LoopResult

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/loop/reviewer.ts
Line: 41-46

Comment:
**`git restore --staged .` unstages genuine pre-existing staged changes**

`git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:
1. User runs `git add src/config.ts` to stage a specific file
2. User runs `ralph-starter run "my task" --review`
3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward:

```typescript
// Capture untracked files before registering them
const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false });
const untrackedFiles = untracked.stdout.split('\n').filter(Boolean);

await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });

// Clean up: only remove the intent-to-add entries we added, preserving other staged changes
if (untrackedFiles.length > 0) {
  await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false });
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/loop/executor.ts
Line: 1514-1515

Comment:
**Reviewer LLM cost not counted by `costTracker`**

`runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for:

```typescript
const reviewResult = await runReview(options.cwd, costTracker ?? undefined);
```

And in `reviewer.ts`:
```typescript
export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> {
  // ...
  const response = await tryCallLLM({ ... });
  if (response && costTracker) {
    costTracker.recordUsage(response); // track reviewer tokens
  }
  // ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: e1f9ef7}

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T16:41:37Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T16:51:14Z

@greptileai

greptile-apps · 2026-03-12T16:57:03Z

src/loop/executor.ts

+        if (reviewResult) {
+          const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length;
+          const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';
+          spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));
+          circuitBreaker.recordSuccess();
+          lastValidationFeedback = '';
+        } else {
+          spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`));
+        }


Warning findings are silently swallowed when review passes

When the agent review passes but contains warning-severity findings, only the count is shown in the spinner suffix (e.g., "Agent review passed (2 warning(s))"). The actual warning messages are never logged to the console and are never fed back to the agent via lastValidationFeedback (which is correctly cleared to ''). This means the user and the agent have no visibility into the content of the warnings.

Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them:

Suggested change

if (reviewResult) {

const warnCount = reviewResult.findings.filter((f) => f.severity === 'warning').length;

const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';

spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));

circuitBreaker.recordSuccess();

lastValidationFeedback = '';

} else {

spinner.info(chalk.dim(`Loop ${i}: Agent review skipped (no diff or no LLM key)`));

}

if (reviewResult) {

const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning');

const warnCount = warnFindings.length;

const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : '';

spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`));

for (const f of warnFindings) {

log(chalk.dim(` ⚠️ ${f.message}`));

}

circuitBreaker.recordSuccess();

lastValidationFeedback = '';

}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/loop/executor.ts Line: 1543-1551 Comment: **Warning findings are silently swallowed when review passes** When the agent review passes but contains `warning`-severity findings, only the count is shown in the spinner suffix (e.g., `"Agent review passed (2 warning(s))"`). The actual warning messages are never logged to the console and are never fed back to the agent via `lastValidationFeedback` (which is correctly cleared to `''`). This means the user and the agent have no visibility into the content of the warnings. Compare this to the failure path (lines 1528–1531) where every finding is individually logged. Warnings should receive the same treatment so the operator can see and act on them: ```suggestion if (reviewResult) { const warnFindings = reviewResult.findings.filter((f) => f.severity === 'warning'); const warnCount = warnFindings.length; const suffix = warnCount > 0 ? ` (${warnCount} warning(s))` : ''; spinner.succeed(chalk.green(`Loop ${i}: Agent review passed${suffix}`)); for (const f of warnFindings) { log(chalk.dim(` ⚠️ ${f.message}`)); } circuitBreaker.recordSuccess(); lastValidationFeedback = ''; } ``` How can I resolve this? If you propose a fix, please make it concise.

rubenmarcus · 2026-03-12T17:01:07Z

@greptileai

src/loop/executor.ts

rubenmarcus · 2026-03-12T17:08:01Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:24:52Z

@greptileai

src/loop/executor.ts

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:31:56Z

@greptileai

src/loop/reviewer.ts

rubenmarcus · 2026-03-12T17:38:28Z

@greptileai

src/loop/reviewer.ts

- Add LLM-powered diff review step that runs after lint/build/test pass but before commit, catching security issues, logic errors, and pattern violations that automated checks miss - New src/loop/reviewer.ts module using existing tryCallLLM infrastructure (Anthropic/OpenAI/OpenRouter) with structured JSON findings output - Errors block commit and feed back via lastValidationFeedback; warnings are logged but non-blocking; gracefully skips if no diff or no API key - Wire --review CLI flag on run command and export public API types - Close 14 delivered issues (#212, #224, #225, #226, #227, #228, #229, #231, #232, #233, #237, #239, #240, #241) - Bump version to 0.4.4 Co-authored-by: Amp <amp@ampcode.com> Amp-Thread-ID: https://ampcode.com/threads/T-019ce2d4-0f5f-742e-9721-3181f57267df

rubenmarcus · 2026-03-12T17:46:31Z

@greptileai

greptile-apps · 2026-03-12T17:50:18Z

src/loop/reviewer.ts

+  await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false });
+
+  const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false });
+
+  // Clean up: remove intent-to-add entries so the index isn't permanently mutated
+  await execa('git', ['restore', '--staged', '.'], { cwd, reject: false });


git restore --staged . unstages genuine pre-existing staged changes

git restore --staged . is designed to clean up the intent-to-add entries, but it indiscriminately unstages all staged changes — not just the ones added by git add --intent-to-add --all. If a user had previously staged specific files with git add (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs.

For example:

User runs git add src/config.ts to stage a specific file

User runs ralph-starter run "my task" --review

After getDiff completes, src/config.ts is no longer staged — the user's staging state is destroyed

A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling git add --intent-to-add --all, and then restore only those files afterward:

// Capture untracked files before registering them const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false }); const untrackedFiles = untracked.stdout.split('\n').filter(Boolean); await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false }); const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false }); // Clean up: only remove the intent-to-add entries we added, preserving other staged changes if (untrackedFiles.length > 0) { await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false }); }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/loop/reviewer.ts Line: 41-46 Comment: **`git restore --staged .` unstages genuine pre-existing staged changes** `git restore --staged .` is designed to clean up the intent-to-add entries, but it indiscriminately unstages **all** staged changes — not just the ones added by `git add --intent-to-add --all`. If a user had previously staged specific files with `git add` (e.g., as part of a partial commit workflow) before kicking off the agent loop, those staged changes are silently unstaged after the reviewer runs. For example: 1. User runs `git add src/config.ts` to stage a specific file 2. User runs `ralph-starter run "my task" --review` 3. After `getDiff` completes, `src/config.ts` is no longer staged — the user's staging state is destroyed A safer cleanup would only remove the entries that were actually registered by the intent-to-add step, rather than blanket-resetting the entire index. One approach is to capture the list of untracked files before calling `git add --intent-to-add --all`, and then restore only those files afterward: ```typescript // Capture untracked files before registering them const untracked = await execa('git', ['ls-files', '--others', '--exclude-standard'], { cwd, reject: false }); const untrackedFiles = untracked.stdout.split('\n').filter(Boolean); await execa('git', ['add', '--intent-to-add', '--all'], { cwd, reject: false }); const head = await execa('git', ['diff', 'HEAD'], { cwd, reject: false }); // Clean up: only remove the intent-to-add entries we added, preserving other staged changes if (untrackedFiles.length > 0) { await execa('git', ['restore', '--staged', '--', ...untrackedFiles], { cwd, reject: false }); } ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-12T17:50:19Z

src/loop/executor.ts

+    // --- Agent reviewer: LLM-powered diff review before commit ---
+    if (options.review && hasChanges && i > 1 && pastWarmup) {


Reviewer LLM cost not counted by costTracker

runReview calls tryCallLLM internally, but this additional LLM call is invisible to the costTracker. When a user sets maxCost to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit.

Consider passing a cost-tracking callback or the CostTracker instance into runReview so reviewer token usage is accounted for:

const reviewResult = await runReview(options.cwd, costTracker ?? undefined);

And in reviewer.ts:

export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> { // ... const response = await tryCallLLM({ ... }); if (response && costTracker) { costTracker.recordUsage(response); // track reviewer tokens } // ... }

Prompt To Fix With AI

This is a comment left during a code review. Path: src/loop/executor.ts Line: 1514-1515 Comment: **Reviewer LLM cost not counted by `costTracker`** `runReview` calls `tryCallLLM` internally, but this additional LLM call is invisible to the `costTracker`. When a user sets `maxCost` to cap spending, reviewer calls are silently excluded from the budget. In a long-running loop with many iterations, the reviewer can make as many LLM calls as there are iterations, potentially pushing real spend significantly above the stated limit. Consider passing a cost-tracking callback or the `CostTracker` instance into `runReview` so reviewer token usage is accounted for: ```typescript const reviewResult = await runReview(options.cwd, costTracker ?? undefined); ``` And in `reviewer.ts`: ```typescript export async function runReview(cwd: string, costTracker?: CostTracker): Promise<ReviewResult | null> { // ... const response = await tryCallLLM({ ... }); if (response && costTracker) { costTracker.recordUsage(response); // track reviewer tokens } // ... } ``` How can I resolve this? If you propose a fix, please make it concise.

rubenmarcus added enhancement core labels Mar 12, 2026

rubenmarcus force-pushed the feat/agent-reviewer branch from ed90fa8 to b549ad2 Compare March 12, 2026 16:25

github-actions bot assigned rubenmarcus Mar 12, 2026

github-actions bot added candidate-release PR is ready for release config labels Mar 12, 2026