Fix DSPy 3.2+ API compat (max_full_evals + metric kwargs) and 2 evolve_skill validator bugs by 0L1v3DaD · Pull Request #73 · NousResearch/hermes-agent-self-evolution

0L1v3DaD · 2026-05-10T05:16:23Z

Summary

Four small but high-impact fixes to make evolution.skills.evolve_skill
work end-to-end on a fresh install with DSPy 3.2+. Tested empirically with
a 5-iteration GEPA run on a real skill against ~30 evaluation examples.

Bugs fixed (one commit each)

#	File	Symptom before fix
1	`evolution/skills/evolve_skill.py`	`TypeError: GEPA.compile() got unexpected keyword 'max_steps'` — DSPy 3.2 renamed it to `max_full_evals` and added a required `reflection_lm` argument
2	`evolution/core/fitness.py`	`TypeError: skill_fitness_metric() takes 2-3 args but 4 were given` — DSPy 3.2 GEPA passes `(gold, pred, trace, pred_name, pred_trace)`, the metric only accepted 3
3	`evolution/skills/evolve_skill.py`	Silent no-op: optimizer "succeeded" with full convergence reported but the saved file was byte-identical to the input. Cause: read `optimized_module.skill_text` (the input field) instead of `optimized_module.predictor.predict.signature.instructions` (the actual evolved prompt)
4	`evolution/skills/evolve_skill.py`	False-negative validation: `skill_structure: missing YAML frontmatter` even though the file written to disk had perfect frontmatter. Cause: `validator.validate_all(evolved_body, ...)` was called on the body-only string a few lines before `reassemble_skill()` prepended the frontmatter

Why this matters

The silent no-op (commit 3) is the worst of the four because it's invisible:

No exception is raised
All constraint gates pass (the unchanged baseline trivially satisfies size / growth / structure gates)
The file written to disk is a valid skill, just identical to the input
Significant token spend per run for zero learning

Once that's fixed, the validator wiring bug (commit 4) becomes visible
and easy to fix.

Verification

After all four commits applied:

5-iteration GEPA run completes without errors
Real evolved prompt: ~6.5KB (vs ~211-char baseline prompt) — substantial growth in the evolved instruction prefix
Skill body preserved verbatim — correct, GEPA never had access to mutate it
4/4 constraint gates pass: size_limit, growth_limit, non_empty, skill_structure
Manual diff of evolved vs baseline confirms real behavioral changes (response shape, anti-pattern guidance, etc.)

Backward compatibility

All four fixes are additive or substitute-equivalent on the DSPy 3.2+ path:

max_full_evals is the new name; max_steps was dropped, no compat shim possible
reflection_lm is now required by GEPA; we now provide one explicitly using the existing config.optimizer_model
The metric signature accepts the extra DSPy 3.2 kwargs but defaults them to None, so MIPROv2 / BootstrapFewShot (3-arg callers) still work
The validator change just hands a strict-superset string to validate_all — all constraint checks remain valid

Test plan

Reproduced before and after each individual commit:

After commit 1: GEPA initializes without TypeError on max_steps
After commit 2: metric runs without TypeError on extra positional args
After commit 3: evolved file size differs from input file size (real artifact, not a copy)
After commit 4: 4/4 gates green on the real evolved artifact

DSPy 3.2 renamed dspy.GEPA iteration arg from max_steps to max_full_evals, and now requires a reflection_lm parameter; without it the optimizer falls back to a default LM that may not be configured. Before: TypeError: GEPA.compile() got unexpected keyword argument max_steps

dspy.GEPA in 3.2+ calls the metric with five positional args: metric(gold, prediction, trace, pred_name, pred_trace) The old 3-arg signature crashes: TypeError: skill_fitness_metric() takes 2-3 args but 4 were given Adding pred_name and pred_trace keyword args satisfies GEPA while remaining backward-compatible with MIPROv2 and BootstrapFewShot (which only pass the first 3).

…_text This is the most impactful fix in the bundle. Before: after GEPA compile() returned, the code read evolved_body = optimized_module.skill_text But skill_text is the INPUT field on SkillModule (the original unchanged skill we fed in). What GEPA actually mutates each iteration is the predictor signature.instructions - the prompt prefix that gets composed with the input to produce the output. Symptom: GEPA appears to succeed (no errors, full convergence reported, all constraint gates pass), but the saved file is byte-identical to the input baseline. Zero learning, significant token spend per run, and the bug is invisible because the unchanged baseline trivially passes all gates (0% growth, valid structure, etc.). Fix: - Extract evolved_instruction from optimized_module.predictor.predict.signature.instructions - Add fallback for MIPROv2 flat predictor structure - Compare against baseline_instruction and warn if no improvement - Log evolved-prompt size vs baseline-prompt size on success - Preserve original body (GEPA never had access to mutate it)

The skill_structure constraint checks that the artifact starts with YAML frontmatter (---). Frontmatter is added by reassemble_skill() at line 217, but the validator was being called on evolved_body (the body-only string) on line 219. Result: false negatives like skill_structure: Skill missing: YAML frontmatter (---), name field, description field even though the file written to disk a few lines later (line 261) DOES have valid frontmatter. Confusing failure mode and a useful evolution gets rejected as failed. Fix: pass evolved_full to validate_all so all four constraint gates operate on the same artifact that ends up on disk.

0L1v3DaD added 4 commits May 10, 2026 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DSPy 3.2+ API compat (max_full_evals + metric kwargs) and 2 evolve_skill validator bugs#73

Fix DSPy 3.2+ API compat (max_full_evals + metric kwargs) and 2 evolve_skill validator bugs#73
0L1v3DaD wants to merge 4 commits into
NousResearch:mainfrom
0L1v3DaD:fix/dspy-3.2-compat-and-validator-bugs

0L1v3DaD commented May 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0L1v3DaD commented May 10, 2026

Summary

Bugs fixed (one commit each)

Why this matters

Verification

Backward compatibility

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant