[feature request] callback returning wins (or winrate) vs another agent

When training with different reward functions it's hard to compare 2 bots. A `callback` capable of running `n` games between current agent and another would prove useful to measure progress.

I will look into it but if someone knows how to do that help is welcome.