Skip to content

Add ProposeCurateEngine: unified evolution engine for OSWorld & CL-bench#20

Open
weitianxin wants to merge 2 commits intomainfrom
feat/propose-curate-engine
Open

Add ProposeCurateEngine: unified evolution engine for OSWorld & CL-bench#20
weitianxin wants to merge 2 commits intomainfrom
feat/propose-curate-engine

Conversation

@weitianxin
Copy link
Copy Markdown

Summary

  • New ProposeCurateEngine (agent_evolve/algorithms/propose_curate/) — a shared EvolutionEngine implementation that encapsulates the propose+curate pipeline used by both OSWorld and CL-bench
  • Engine-based example scripts for both benchmarks (evolve_osworld_engine.py, evolve_cl_bench_engine.py) that use the new engine via engine.step()
  • Registered in agent_evolve/algorithms/__init__.py alongside existing engines

The engine handles:

  1. Extract proposals from observation metadata (feedback.raw["proposal"])
  2. Group proposals by topic/context
  3. Per-topic LLM curation (ACCEPT/MERGE/SKIP)
  4. General cross-topic failure pattern curation (NEW/UPDATE/DELETE)
  5. Write skills to workspace

Configurable via skill_layout parameter: "topic" (OSWorld), "context" (CL-bench), or "flat".

Test plan

  • Engine imports and instantiates correctly
  • step() with empty observations returns mutated=False
  • Proposal extraction from feedback.raw["proposal"] works
  • Both example scripts parse without syntax errors
  • Full integration test with real Bedrock calls (manual)

🤖 Generated with Claude Code

Ubuntu and others added 2 commits May 7, 2026 00:19
…pipelines

OSWorld and CL-bench both implement the same evolution pattern (propose skills
after solve, curate per-topic, curate general patterns) but as standalone
scripts without using the engine abstraction. This adds a shared
ProposeCurateEngine that implements EvolutionEngine.step() with the common
pipeline, and provides engine-based example scripts for both benchmarks.

New files:
- agent_evolve/algorithms/propose_curate/ — engine + prompts
- examples/osworld_examples/evolve_osworld_engine.py
- examples/cl_bench_examples/evolve_cl_bench_engine.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…al prompts

- Rename prompts.py constants to DEFAULT_TOPIC_CURATOR_PROMPT/DEFAULT_GENERAL_CURATOR_PROMPT
- Engine constructor now accepts topic_curator_prompt, general_curator_prompt,
  and format_failed_summary parameters for full customization
- Rewrite evolve_osworld_engine.py with IDENTICAL original prompts and full
  helper functions (trajectory signals, compress, bot detection, eval text,
  build_propose_messages) to reproduce results exactly
- Rewrite evolve_cl_bench_engine.py with IDENTICAL original prompts and full
  helper functions (rephrase feedback, in-context propose, parse proposal)
- Both examples pass exact original curator prompts to engine constructor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant