Skip to content

PRIME-RL/P1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 

Repository files navigation

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper Blog P1-30B P1-235B Leaderboard

IPhO 2025 Score

Overview

Physics reasoning is central to understanding and shaping the real world. Top contests like the International Physics Olympiad (IPhO) set a high bar for complex reasoning and deep physical understanding — a benchmark for evaluating AI's grasp of reality.

P1 is the first open-source model series designed to tackle Olympiad-level physics reasoning through multi-stage reinforcement learning (RL) and a co-evolutionary multi-agent system (PhysicsMinions). It achieved gold medal-level performance on IPhO 2025. We release two model versions:

  • P1-30B-A3B: A 30B parameter model that surpasses larger closed-source models, demonstrating exceptional efficiency
  • P1-235B-A22B: A 235B parameter model achieving gold medal performance on IPhO 2025, rivaling top closed-source models

Results

P1 models demonstrate top-tier physics reasoning across all HiPhO contests.

HiPhO Leaderboard


P1’s physics reasoning transfers effectively across other STEM domains.

STEM Benchmarks

Benchmark P1-235B-A22B Qwen3-235B-A22B-Thinking-2507 P1-30B-A3B Qwen3-30B-A3B-Thinking-2507
AIME24 95.0 94.6 91.0 90.4
AIME25 95.0 94.2 91.0 85.0
HMMT 80.8 81.7 76.9 71.3
GPQA 81.4 79.4 74.4 73.0
HLE 19.1 17.5 14.3 11.6
LiveCodeBench 75.8 76.2 68.1 66.7
LiveBench 79.8 80.3 77.0 76.6

🧮 HiPhO Benchmark

HiPhO (High School Physics Olympiad) is the first benchmark focused on recent Olympiad-level physics contests with human-aligned evaluation.

📚 It compiles 13 competitions (IPhO, APhO, EuPhO, etc.) from 2024–2025, using official rubrics and fine-grained scoring aligned with medal cutoffs.


Co-Evolution Multi-Agent System: PhysicsMinions

To go beyond single-model limits, P1 introduces PhysicsMinions — a co-evolution multi-agent system that iteratively refines solutions through self-verification and reflection.

Module Function
Visual Studio Extracts structured visual information from diagrams (not used in current experiments).
Logic Studio Generates and refines initial reasoning chains.
Review Studio Performs two-stage validation: physical consistency and logical correctness.

Failures trigger a feedback loop to improve the reasoning process — resulting in stronger robustness and reliability.


Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

  • Qwen3 - for providing the foundational base models that powered our research
  • slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
  • verl - for the versatile reinforcement learning framework that enabled our training pipeline
  • sglang - for the efficient LLM serving and inference infrastructure
  • Megatron-LM - for the large-scale model training framework

We also thank colleagues and collaborators who supported the development of P1 models, the accompanying datasets and visual assets.

🧾 Citation

If you find this work useful, please cite:

@misc{p12025,
  title={P1: Mastering Physics Olympiads with Reinforcement Learning},
  author={P1 Team},
  year={2025},
  url={https://prime-rl.github.io/P1/}
}

About

P1: Mastering Physics Olympiads with Reinforcement Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published