Paper Reading - Deep Visual-Semantic Alignments for Generating Image Descriptions ( CVPR 2015 )

Link of the Paper: https://arxiv.org/abs/1412.2306

Main Points:

  1. An Alignment Model: Convolutional Neural Networks over image regions, Bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding ( CNN - Structured Objective - BiRNN ).
  2. A Multimodal Recurrent Neural Network architecture.

Other Key Points:

  1. The 

猜你喜欢

转载自www.cnblogs.com/zlian2016/p/9492483.html