Link of the Paper: https://arxiv.org/abs/1711.09151
Innovations:
- The authors develop a convolutional ( CNN-based ) image captioning method that shows comparable performance to an LSTM based method on standard metrics.
- The authors analyze the characteristics of CNN and LSTM nets and provide useful insights such as -- CNNs produce more entropy ( useful for diverse predictions ), better classification accuracy, and do not suffer from vanishing gradients.
Improvements:
- A Convolutional Neural Network with Attention mechanism.
General Points:
- Image Captioning is applicable to virtual assistants, editing tools, image indexing and support of the disabled.
- Image Captioning is a basic ingredient for more complex operations such as storytelling and visual summarization.
- An illustration of a classical RNN architecture for image captioning is provided below.