Hi, I'm Ting-Wei,
I just read your paper and have some confusion.
In the experiment for Figure 3, what does None means? If no regularization term is added, the decoding objective should degrade to MAP right? Then why does beam size affect the result?