P1: Mastering Physics Olympiads with Reinforcement Learning

Overview

Physics reasoning is central to understanding and shaping the real world. Top contests like the International Physics Olympiad (IPhO) set a high bar for complex reasoning and deep physical understanding — a benchmark for evaluating AI's grasp of reality.

P1 is the first open-source model series designed to tackle Olympiad-level physics reasoning through multi-stage reinforcement learning (RL) and a co-evolutionary multi-agent system (PhysicsMinions). It achieved gold medal-level performance on IPhO 2025. We release two model versions:

P1-30B-A3B: A 30B parameter model that surpasses larger closed-source models, demonstrating exceptional efficiency
P1-235B-A22B: A 235B parameter model achieving gold medal performance on IPhO 2025, rivaling top closed-source models

Results

P1 models demonstrate top-tier physics reasoning across all HiPhO contests.

P1’s physics reasoning transfers effectively across other STEM domains.

STEM Benchmarks

Benchmark	P1-235B-A22B	Qwen3-235B-A22B-Thinking-2507	P1-30B-A3B	Qwen3-30B-A3B-Thinking-2507
AIME24	95.0	94.6	91.0	90.4
AIME25	95.0	94.2	91.0	85.0
HMMT	80.8	81.7	76.9	71.3
GPQA	81.4	79.4	74.4	73.0
HLE	19.1	17.5	14.3	11.6
LiveCodeBench	75.8	76.2	68.1	66.7
LiveBench	79.8	80.3	77.0	76.6

🧮 HiPhO Benchmark

HiPhO (High School Physics Olympiad) is the first benchmark focused on recent Olympiad-level physics contests with human-aligned evaluation.

📚 It compiles 13 competitions (IPhO, APhO, EuPhO, etc.) from 2024–2025, using official rubrics and fine-grained scoring aligned with medal cutoffs.

Co-Evolution Multi-Agent System: PhysicsMinions

To go beyond single-model limits, P1 introduces PhysicsMinions — a co-evolution multi-agent system that iteratively refines solutions through self-verification and reflection.

Module	Function
Visual Studio	Extracts structured visual information from diagrams (not used in current experiments).
Logic Studio	Generates and refines initial reasoning chains.
Review Studio	Performs two-stage validation: physical consistency and logical correctness.

Failures trigger a feedback loop to improve the reasoning process — resulting in stronger robustness and reliability.

Acknowledgements

We are grateful to the open-source community for their invaluable contributions. Special thanks to:

Qwen3 - for providing the foundational base models that powered our research
slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
verl - for the versatile reinforcement learning framework that enabled our training pipeline
sglang - for the efficient LLM serving and inference infrastructure
Megatron-LM - for the large-scale model training framework

We also thank colleagues and collaborators who supported the development of P1 models, the accompanying datasets and visual assets.

🧾 Citation

If you find this work useful, please cite:

@misc{p12025,
  title={P1: Mastering Physics Olympiads with Reinforcement Learning},
  author={P1 Team},
  year={2025},
  url={https://prime-rl.github.io/P1/}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
docs		docs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

P1: Mastering Physics Olympiads with Reinforcement Learning

Overview

Results

STEM Benchmarks

🧮 HiPhO Benchmark

Co-Evolution Multi-Agent System: PhysicsMinions

Acknowledgements

🧾 Citation

About

Uh oh!

Releases

Packages

Contributors 2

PRIME-RL/P1

Folders and files

Latest commit

History

Repository files navigation

P1: Mastering Physics Olympiads with Reinforcement Learning

Overview

Results

STEM Benchmarks

🧮 HiPhO Benchmark

Co-Evolution Multi-Agent System: PhysicsMinions

Acknowledgements

🧾 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages