Paper Reading - Learning a Recurrent Visual Representation for Image Caption Generation

Link of the Paper: https://arxiv.org/abs/1411.5654

Main Points:

  1. The bi-directional mapping model using recurrent neural networks: unlike previous approaches that map both sentences and images to a common embedding ( and then calculate the similarity and match / generate, or just use Encoder-Decoder Model, I guess ); interconversion between Visual Features and Visual Descriptions.
  2. A novel recurrent visual memory: automatically learns to remember long-term visual concepts.

Other Key Points:

猜你喜欢

转载自www.cnblogs.com/zlian2016/p/9477937.html