Fix BUG-7, BUG-8, BUG-9 from demo findings#32
Conversation
BUG-7: Add try/catch + array validation around eval-set file loading in evolve() so parse errors surface as user-facing messages instead of silent exit. BUG-8: Add cold-start bootstrap — when extractFailurePatterns returns empty but the eval set has positive entries, treat those positives as missed queries so evolve can work on skills with zero usage history. BUG-9: Add --out flag to evals CLI parseArgs as alias for --output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aedd329c74
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
cli/selftune/evolution/evolve.ts
Outdated
| // match but there are zero skill usage records, treat the positive eval | ||
| // entries themselves as "missed queries" — they ARE the failure signal. | ||
| const positiveEvals = evalSet.filter((e) => e.should_trigger); | ||
| if (positiveEvals.length > 0) { |
There was a problem hiding this comment.
Restrict cold-start bootstrap to empty usage history
The new cold-start branch now runs whenever failurePatterns is empty and there is at least one positive eval entry, but it never checks that the skill actually has zero prior usage. In normal runs this can happen when a mature skill already triggers correctly on all positives (so extraction returns no misses), and this code will still fabricate a coldstart failure pattern and force proposal generation/deployment from non-failures, which can regress a stable description.
Useful? React with 👍 / 👎.
Summary
Fixed three critical bugs in the evolve workflow:
--outflag to evals CLI as alias for--outputAll 285 evolution tests pass including new cold-start bootstrap test.
🤖 Generated with Claude Code