I'm a bit late to the game but in reviewing the code I noticed that the reward member variable is set depending on some actions and not others that pass. So, once some action does result in a reward, does the reward stick and repeat for all subsequent non-rewarding actions?