Paper Reading - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Link of the Paper: https://arxiv.org/abs/1609.06647

Main Points ( Improvements Over the CVPR2015 Model  ):

  1. Image Model Improvement: GooLeNet ( 22 layers ) -> Batch Normalization Model.
  2. Image Model Fine Tuning: fine tuning the image model must be carried after the LSTM parameters have settled on a good language model.
  3. Scheduled Sampling: a fully guided scheme using the true previous word -> a less guided scheme which mostly uses the model generated word instead.
  4. Ensembling
  5. Beam Size Reduction: the best beam size turned out to be small: 3.

猜你喜欢

转载自www.cnblogs.com/zlian2016/p/9476471.html