Skip to content

fix: GEPA max_steps + validation artifact check#46

Open
zjdc49zctq-glitch wants to merge 3 commits into
NousResearch:mainfrom
zjdc49zctq-glitch:fix/gepa-max-steps-and-validation
Open

fix: GEPA max_steps + validation artifact check#46
zjdc49zctq-glitch wants to merge 3 commits into
NousResearch:mainfrom
zjdc49zctq-glitch:fix/gepa-max-steps-and-validation

Conversation

@zjdc49zctq-glitch
Copy link
Copy Markdown

Summary

Two bug fixes in evolution/skills/evolve_skill.py:

1. GEPA max_stepsmax_metric_calls

DSPy 3.2.0 removed the max_steps parameter from GEPA. Currently the code falls back to MIPROv2 since GEPA init fails, but GEPA would never run even if it became available in future DSPy versions.

Fix: Use max_metric_calls instead of max_steps.

2. Constraint validation checked wrong artifact

validate_all() was called on evolved_body (markdown body only, no frontmatter), so the skill_structure check — which looks for --- frontmatter markers — always failed, even for valid skills.

The same issue existed for baseline constraint validation (skill["body"] vs skill["raw"]).

Fix: Validate the full reassembled artifact (evolved_full / skill["raw"]) so skill_structure can find the frontmatter markers.

Verification

Both fixes confirmed working with --iterations 3 on the obsidian skill:

  • ✓ skill_structure now passes (was always failing before)
  • Holdout score improved: 36.2% → 43.2% (+19.4%)

Moli added 3 commits May 2, 2026 22:25
…l artifact

- DSPy 3.2.0 removed max_steps param from GEPA, use max_metric_calls
- Constraint validation was checking evolved_body (no frontmatter) so
  skill_structure check (looking for --- markers) always failed
- Now validates the full reassembled artifact (frontmatter + body) so
  skill_structure check passes correctly
- Also fixes baseline constraint validation to use full artifact
…ring trailing LLM reasoning

- Change re.search(r'\[.*\]') to r'\[.*?\]' (non-greedy) to stop at first
  complete JSON array instead of last ] in the entire response
- Add markdown code fence stripping as secondary fallback
- Expand error message to show 500 chars of LLM output for debugging
…ested arrays

Previous regex-based approaches failed on deeply nested JSON because
.* or .*? would stop at inner ] boundaries. Now uses a proper bracket-
counting parser that tracks depth and string context to find the
outermost complete JSON array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant