Skip to content

Fix DSPy 3.2+ API compat (max_full_evals + metric kwargs) and 2 evolve_skill validator bugs#73

Open
0L1v3DaD wants to merge 4 commits into
NousResearch:mainfrom
0L1v3DaD:fix/dspy-3.2-compat-and-validator-bugs
Open

Fix DSPy 3.2+ API compat (max_full_evals + metric kwargs) and 2 evolve_skill validator bugs#73
0L1v3DaD wants to merge 4 commits into
NousResearch:mainfrom
0L1v3DaD:fix/dspy-3.2-compat-and-validator-bugs

Conversation

@0L1v3DaD
Copy link
Copy Markdown

Summary

Four small but high-impact fixes to make evolution.skills.evolve_skill
work end-to-end on a fresh install with DSPy 3.2+. Tested empirically with
a 5-iteration GEPA run on a real skill against ~30 evaluation examples.

Bugs fixed (one commit each)

# File Symptom before fix
1 evolution/skills/evolve_skill.py TypeError: GEPA.compile() got unexpected keyword 'max_steps' — DSPy 3.2 renamed it to max_full_evals and added a required reflection_lm argument
2 evolution/core/fitness.py TypeError: skill_fitness_metric() takes 2-3 args but 4 were given — DSPy 3.2 GEPA passes (gold, pred, trace, pred_name, pred_trace), the metric only accepted 3
3 evolution/skills/evolve_skill.py Silent no-op: optimizer "succeeded" with full convergence reported but the saved file was byte-identical to the input. Cause: read optimized_module.skill_text (the input field) instead of optimized_module.predictor.predict.signature.instructions (the actual evolved prompt)
4 evolution/skills/evolve_skill.py False-negative validation: skill_structure: missing YAML frontmatter even though the file written to disk had perfect frontmatter. Cause: validator.validate_all(evolved_body, ...) was called on the body-only string a few lines before reassemble_skill() prepended the frontmatter

Why this matters

The silent no-op (commit 3) is the worst of the four because it's invisible:

  • No exception is raised
  • All constraint gates pass (the unchanged baseline trivially satisfies size / growth / structure gates)
  • The file written to disk is a valid skill, just identical to the input
  • Significant token spend per run for zero learning

Once that's fixed, the validator wiring bug (commit 4) becomes visible
and easy to fix.

Verification

After all four commits applied:

  • 5-iteration GEPA run completes without errors
  • Real evolved prompt: ~6.5KB (vs ~211-char baseline prompt) — substantial growth in the evolved instruction prefix
  • Skill body preserved verbatim — correct, GEPA never had access to mutate it
  • 4/4 constraint gates pass: size_limit, growth_limit, non_empty, skill_structure
  • Manual diff of evolved vs baseline confirms real behavioral changes (response shape, anti-pattern guidance, etc.)

Backward compatibility

All four fixes are additive or substitute-equivalent on the DSPy 3.2+ path:

  • max_full_evals is the new name; max_steps was dropped, no compat shim possible
  • reflection_lm is now required by GEPA; we now provide one explicitly using the existing config.optimizer_model
  • The metric signature accepts the extra DSPy 3.2 kwargs but defaults them to None, so MIPROv2 / BootstrapFewShot (3-arg callers) still work
  • The validator change just hands a strict-superset string to validate_all — all constraint checks remain valid

Test plan

Reproduced before and after each individual commit:

  1. After commit 1: GEPA initializes without TypeError on max_steps
  2. After commit 2: metric runs without TypeError on extra positional args
  3. After commit 3: evolved file size differs from input file size (real artifact, not a copy)
  4. After commit 4: 4/4 gates green on the real evolved artifact

0L1v3DaD added 4 commits May 10, 2026 05:14
DSPy 3.2 renamed dspy.GEPA iteration arg from max_steps to max_full_evals,
and now requires a reflection_lm parameter; without it the optimizer falls
back to a default LM that may not be configured.

Before: TypeError: GEPA.compile() got unexpected keyword argument max_steps
dspy.GEPA in 3.2+ calls the metric with five positional args:
  metric(gold, prediction, trace, pred_name, pred_trace)

The old 3-arg signature crashes:
  TypeError: skill_fitness_metric() takes 2-3 args but 4 were given

Adding pred_name and pred_trace keyword args satisfies GEPA while
remaining backward-compatible with MIPROv2 and BootstrapFewShot
(which only pass the first 3).
…_text

This is the most impactful fix in the bundle.

Before: after GEPA compile() returned, the code read
  evolved_body = optimized_module.skill_text

But skill_text is the INPUT field on SkillModule (the original
unchanged skill we fed in). What GEPA actually mutates each iteration
is the predictor signature.instructions - the prompt prefix that
gets composed with the input to produce the output.

Symptom: GEPA appears to succeed (no errors, full convergence reported,
all constraint gates pass), but the saved file is byte-identical to the
input baseline. Zero learning, significant token spend per run, and the
bug is invisible because the unchanged baseline trivially passes all
gates (0% growth, valid structure, etc.).

Fix:
  - Extract evolved_instruction from optimized_module.predictor.predict.signature.instructions
  - Add fallback for MIPROv2 flat predictor structure
  - Compare against baseline_instruction and warn if no improvement
  - Log evolved-prompt size vs baseline-prompt size on success
  - Preserve original body (GEPA never had access to mutate it)
The skill_structure constraint checks that the artifact starts with
YAML frontmatter (---). Frontmatter is added by reassemble_skill() at
line 217, but the validator was being called on evolved_body (the
body-only string) on line 219. Result: false negatives like

  skill_structure: Skill missing: YAML frontmatter (---), name field, description field

even though the file written to disk a few lines later (line 261)
DOES have valid frontmatter. Confusing failure mode and a useful
evolution gets rejected as failed.

Fix: pass evolved_full to validate_all so all four constraint gates
operate on the same artifact that ends up on disk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant