Physics reasoning is central to understanding and shaping the real world. Top contests like the International Physics Olympiad (IPhO) set a high bar for complex reasoning and deep physical understanding — a benchmark for evaluating AI's grasp of reality.
P1 is the first open-source model series designed to tackle Olympiad-level physics reasoning through multi-stage reinforcement learning (RL) and a co-evolutionary multi-agent system (PhysicsMinions). It achieved gold medal-level performance on IPhO 2025. We release two model versions:
- P1-30B-A3B: A 30B parameter model that surpasses larger closed-source models, demonstrating exceptional efficiency
- P1-235B-A22B: A 235B parameter model achieving gold medal performance on IPhO 2025, rivaling top closed-source models
P1 models demonstrate top-tier physics reasoning across all HiPhO contests.
P1’s physics reasoning transfers effectively across other STEM domains.
| Benchmark | P1-235B-A22B | Qwen3-235B-A22B-Thinking-2507 | P1-30B-A3B | Qwen3-30B-A3B-Thinking-2507 |
|---|---|---|---|---|
| AIME24 | 95.0 | 94.6 | 91.0 | 90.4 |
| AIME25 | 95.0 | 94.2 | 91.0 | 85.0 |
| HMMT | 80.8 | 81.7 | 76.9 | 71.3 |
| GPQA | 81.4 | 79.4 | 74.4 | 73.0 |
| HLE | 19.1 | 17.5 | 14.3 | 11.6 |
| LiveCodeBench | 75.8 | 76.2 | 68.1 | 66.7 |
| LiveBench | 79.8 | 80.3 | 77.0 | 76.6 |
HiPhO (High School Physics Olympiad) is the first benchmark focused on recent Olympiad-level physics contests with human-aligned evaluation.
📚 It compiles 13 competitions (IPhO, APhO, EuPhO, etc.) from 2024–2025, using official rubrics and fine-grained scoring aligned with medal cutoffs.
To go beyond single-model limits, P1 introduces PhysicsMinions — a co-evolution multi-agent system that iteratively refines solutions through self-verification and reflection.
| Module | Function |
|---|---|
| Visual Studio | Extracts structured visual information from diagrams (not used in current experiments). |
| Logic Studio | Generates and refines initial reasoning chains. |
| Review Studio | Performs two-stage validation: physical consistency and logical correctness. |
Failures trigger a feedback loop to improve the reasoning process — resulting in stronger robustness and reliability.
We are grateful to the open-source community for their invaluable contributions. Special thanks to:
- Qwen3 - for providing the foundational base models that powered our research
- slime - for their innovative work on efficient reinforcement learning framework that powered our training pipeline
- verl - for the versatile reinforcement learning framework that enabled our training pipeline
- sglang - for the efficient LLM serving and inference infrastructure
- Megatron-LM - for the large-scale model training framework
We also thank colleagues and collaborators who supported the development of P1 models, the accompanying datasets and visual assets.
If you find this work useful, please cite:
@misc{p12025,
title={P1: Mastering Physics Olympiads with Reinforcement Learning},
author={P1 Team},
year={2025},
url={https://prime-rl.github.io/P1/}
}
