🥜 chota-tinker

Some infra to speed up tinkering with RL on smaller-scale

why another RL lib?

we like tinker and the API it exposes, but doesn't have support for small models like qwen3 0.6b, 1b, 4b etc. (and its paid) -- this lib tries to match tinker like API but running locally on PyTorch
there are other tinker like libs (in jax, and pytorch too), but there are all bloated -- we want something that people can hack inside of tinker too and add their new algorithms (new gradient update rules etc.) -- hence we try everything is minimal, setting very hard constraints on LOC and number of files. (trying to continue the tradition from: https://github.com/rl4reasoning/rl-baselines)
we make sure its fast, so minimalism doesn't come at a price (or perhaps try to target a better trade-off b/w minimalism and efficiency :))

Tip

"chota" stands for mini in Hindi 😄

Installation

# Load modules (for Compute Canada clusters)
module load cuda/12.6 httpproxy gcc arrow/19.0.1 python/3.12 opencv/4.11

# Create venv and install
export UV_CACHE_DIR=$SCRATCH/.uv_cache
uv venv --python=3.12 && source .venv/bin/activate
uv pip install -e .

TODOs

see if training works or not
we need to speed up training
- fused loss fns
- we can have custom model impl. for better memory and throughput during training, like: (https://github.com/NovaSky-AI/SkyRL/blob/main/skyrl-tx/tx/models/qwen3.py)[SkyRL tx]
- add other baselines
  - MARA

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
chota_tinker		chota_tinker
code_env		code_env
tests		tests
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
checkpoint.py		checkpoint.py
collect_trajectories.py		collect_trajectories.py
collect_trajectories_budget_forcing.py		collect_trajectories_budget_forcing.py
collect_trajectories_single_turn.py		collect_trajectories_single_turn.py
create_code_dataset.py		create_code_dataset.py
eval.py		eval.py
gem_math_demo.py		gem_math_demo.py
gem_math_demo_budget_forcing.py		gem_math_demo_budget_forcing.py
gem_math_demo_single_turn.py		gem_math_demo_single_turn.py
gpu_idle.py		gpu_idle.py
ifeval.py		ifeval.py
intellect_env.py		intellect_env.py
intellect_env_README.md		intellect_env_README.md
plot_best_of_k.py		plot_best_of_k.py
pyproject.toml		pyproject.toml
query_models.py		query_models.py
requirements.txt		requirements.txt
sample_api.py		sample_api.py
sample_fast.py		sample_fast.py
setup.sh		setup.sh
submit.sh		submit.sh
submit_single_turn.sh		submit_single_turn.sh
submit_torch.sh		submit_torch.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥜 chota-tinker

Installation

TODOs

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

rl4reasoning/chota-tinker

Folders and files

Latest commit

History

Repository files navigation

🥜 chota-tinker

Installation

TODOs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages