When training with different reward functions it's hard to compare 2 bots. A callback capable of running n games between current agent and another would prove useful to measure progress.
I will look into it but if someone knows how to do that help is welcome.