Just wanted to let you know that you are incorrectly calculating the BLEU scores. You should split() the predicted captions as well as the reference captions. See this tutorial.
Your correctly calculated scores should be half of what you have now. Sorry to be the bearer of bad news.