0xNineteen / hyper-alpha-zero Public

Notifications You must be signed in to change notification settings
Fork 0
Star 8

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)

8 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
ctree		ctree
imgs		imgs
.gitignore		.gitignore
board.py		board.py
mcts.py		mcts.py
network.py		network.py
node.py		node.py
readme.md		readme.md
train.py		train.py
utils.py		utils.py

Repository files navigation

hyper-alpha-zero

hyper-optimized alpha-zero implementation with ray + cython for speed

train an agent that beats random actions and pure MCTS in 2 minutes

file structure

train.py: distributed training with ray
ctree/: mcts nodes in cython (node.py = pure python)
mcts.py: mcts playouts
network.py: neural net stuff
board.py: gomoku board

system design

ray distributed parts (train.py):
- one distributed replay buffer
- N actors with the 'best model' weights which self-play games and store data in replay buffer
- M 'candidate models' which pull from the replay buffer and train
  - each iteration they play against the 'best model' and if they win the 'best model' weights is updated
  - include write/evaluation locks on 'best weights'
- 1 best model weights store (PS / parameter server)
  - stores the best weights which are retrived by self-play and updated when candidates win

cython impl
- ctree/: c++/cython mcts
- node.py: pure python mcts

-- todos --

jax network impl
tpu + gpu support
saved model weights

references

based off: https://github.com/junxiaosong/AlphaZero_Gomoku
distributed rl: http://rail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-21.pdf

About

hyper optimized alpha zero implementation to play gomoku (distributed training with ray, mcts with cython)

python cpp cython mcts multi-core deepmind ray monte-carlo-tree-search alpha-zero alpha-go

Report repository

Releases

No releases published

Packages

No packages published

Languages