Create a better AI - using CNN/RNN...

Using self-play, create and train the model:

* First, do a random initialisation
* in each epoch, play _some_ games, using MCTS, and record all states, predictions and rewards
* at the end of the epoch, retrain the model, either all states predicting the reward or **the state predicting the next state's reward**