GRACE

GRPO-based Reasoning Assistance Calling Efficiently.

LLM Meta-Policy for Overcooked: learning when to call the LLM high-level planner so that overall LLM calls are reduced without losing task performance.

TL;DR

Most LLM-as-planner work calls the LLM at fixed intervals. GRACE learns when to call, training a small meta-policy with GRPO. Result: same task performance with fewer LLM calls (Pareto improvement).

Headline figure

Placeholder — figures/pareto.png will go here once Phase 11 sweeps run.

Installation

Quick start (uv)

git clone https://github.com/idaun/grace.git
cd grace
uv sync --extra dev --extra overcooked
.venv/bin/pytest -v   # all 99+ tests should pass

Optional extras

--extra play — pygame for human-play mode (Phase 9)
--extra unity — mlagents-envs for Unity environments (Phase 6)

Docker

docker build -t grace:latest .
docker run --rm grace:latest pytest -v

Try it: human-play (no LLM, no training required)

.venv/bin/python scripts/play_human.py --env dummy --mode coop
# Player 1: WASD + Space (interact) + E (stay)
# Player 2: arrows + RShift (interact) + RCtrl (stay)

For the Unity build, see unity_env/README.md.

Train a baseline (DESIGN.md §5)

# Plain PPO baseline (no LLM)
PYTHONPATH=$(pwd) python scripts/train.py env=cramped_room policy=ppo meta=never seed=0

# LLM-augmented with fixed-K calls
PYTHONPATH=$(pwd) python scripts/train.py \
    env=cramped_room policy=llm_augmented meta=fixed_k100 llm=qwen3.6_35b seed=0

# Learned meta-policy (the contribution)
PYTHONPATH=$(pwd) python scripts/train_meta.py \
    env=cramped_room policy=llm_augmented meta=learned llm=qwen3.6_35b seed=0

Evaluate

PYTHONPATH=$(pwd) python scripts/eval.py +run_dir=runs/<run_dir>/ +n_episodes=20

PYTHONPATH=$(pwd) python scripts/eval_transfer.py \
    +train_run=runs/<run_dir>/ \
    +test_layouts=[asymmetric_advantages] \
    +n_episodes=10

Reproduce the paper experiments

See docs/REPRODUCIBILITY.md for the three hypothesis sweeps (H1, H2, H3) and expected wallclock.

Project structure

Path	Contents
`src/`	Reusable library (envs, llm, policies, training, eval)
`configs/`	Hydra configs for experiments
`scripts/`	Entry points (`train.py`, `eval.py`, `sweep.py`, `plot_results.py`, ...)
`tests/`	Unit + smoke tests
`unity_env/`	Unity ML-Agents project (C#)
`docs/`	Versioned prompts and experiment journal

Status

Phase 0-7 — Scaffolding (LLM client, env, PPO, GRPO, eval)
Phase 8 — Real Carroll's overcooked-ai integration + checkpoints
Phase 9 — Human-play (pygame + Unity) + BC warm-start
Phase 10 — Prompt v2 + latency diagnostics
Phase 11 — Sweep harness + statistics
Phase 12 — Public-readiness polish
Phase 13 — Full experimental sweep (compute-bound — user runs)

Citation

Placeholder — to be filled once the manuscript is on arXiv.

@misc{grace2026,
  title  = {GRACE: GRPO-based Reasoning Assistance Calling Efficiently},
  author = {GRACE Authors},
  year   = {2026},
  note   = {Preprint}
}

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
scripts		scripts
src		src
tests		tests
unity_env		unity_env
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRACE

TL;DR

Headline figure

Installation

Quick start (uv)

Optional extras

Docker

Try it: human-play (no LLM, no training required)

Train a baseline (DESIGN.md §5)

Evaluate

Reproduce the paper experiments

Project structure

Status

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GRACE

TL;DR

Headline figure

Installation

Quick start (uv)

Optional extras

Docker

Try it: human-play (no LLM, no training required)

Train a baseline (DESIGN.md §5)

Evaluate

Reproduce the paper experiments

Project structure

Status

Citation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages