Link of the Paper: https://arxiv.org/abs/1705.03122
Motivations:
Innotations:
- The authors propose an architecture for sequence to sequence modeling based entirely on convolutional neural networks.
- The authors introduce a separate attention mechanism for each decoder layer.
Improvements:
- The model is equipped with gated linear units ( Language modeling with gated linear units - Dauphin et al., arXiv 2016 ) and residual connections ( Deep Residual Learning for Image Recognition - He et al., CVPR 2015a ).
General Points:
- Multi-layer convolutional neural networks create hierarchical representations over the input sequence in which nearby input elements interact at lower layers while distant elements interact at higher layers.