The model doesn't add the restart_padding for the self-loop diagonal. This creates a small bug for in the plus-times semiring, as the model is considering matches starting from the beginning and taking multiple self-loops.
A simple solution is to give the first self-loop a fixed score of one. Similarly, we could do the same for the final self-loop, and avoid the max/sum-pooling of the final hidden states.