Skip to content

rl4reasoning/chota-tinker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🥜 chota-tinker

Some infra to speed up tinkering with RL on smaller-scale

why another RL lib?

  • we like tinker and the API it exposes, but doesn't have support for small models like qwen3 0.6b, 1b, 4b etc. (and its paid) -- this lib tries to match tinker like API but running locally on PyTorch
  • there are other tinker like libs (in jax, and pytorch too), but there are all bloated -- we want something that people can hack inside of tinker too and add their new algorithms (new gradient update rules etc.) -- hence we try everything is minimal, setting very hard constraints on LOC and number of files. (trying to continue the tradition from: https://github.com/rl4reasoning/rl-baselines)
  • we make sure its fast, so minimalism doesn't come at a price (or perhaps try to target a better trade-off b/w minimalism and efficiency :))

Tip

"chota" stands for mini in Hindi 😄

Installation

# Load modules (for Compute Canada clusters)
module load cuda/12.6 httpproxy gcc arrow/19.0.1 python/3.12 opencv/4.11

# Create venv and install
export UV_CACHE_DIR=$SCRATCH/.uv_cache
uv venv --python=3.12 && source .venv/bin/activate
uv pip install -e .

TODOs

About

Some infra to speed up tinkering with RL on smaller-scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •