NMT十篇必读论文(九)Sequence to Sequence Learning with Neural Networks

清华大学NLP整理的神经机器翻译reading list中提到了十篇必读论文

https://github.com/THUNLP-MT/MT-Reading-List

又回到神经机器翻译,这篇论文present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure。使用了两个LSTM作为encoder和decoder(没用attention),生词用UNK表示,在WMT14的数据集上比传统的基于短语的统计机器翻译提高了近2个bleu值。

论文的大体思想和RNN encoder-decoder是一样的,只是用来LSTM来实现。

paper提到三个important point:

1)encoder和decoder的LSTM是两个不同的模型

2)deep LSTM表现比shallow好,选用了4层的LSTM

We found that the LSTM models are fairly easy to train. We used deep LSTMs with 4 layers,with 1000 cells at each layer and 1000 dimensional word embeddings, with an input vocabulary of 160,000 and an output vocabulary of 80,000. We found deep LSTMs to significantly outperform shallow LSTMs, where each additional layer reduced perplexity by nearly 10%, possibly due to their much larger hidden state. We used a naive softmax over 80,000 words at each output. The resulting LSTM has 380M parameters of which 64M are pure recurrent connections (32M for the “encoder” LSTM and 32M for the “decoder” LSTM).

38亿个参数。。。。。。8个GPU跑了10天

3)实践中发现将输入句子reverse后再进行训练效果更好,好了近5个BLEU值(the test BLEU scores of its decoded translations increased from 25.9 to 30.6.)

得出结论:

最后一段相当鼓舞人心:

Most importantly, we demonstrated that a simple, straightforward and a relatively unoptimized approach can outperform a mature SMT system, so further work will likely lead to even greater translation accuracies. These results suggest that our approach will likely do well on other challenging sequence to sequence problems.
 

参考博客:

https://www.cnblogs.com/duye/p/9433013.html

https://www.codercto.com/a/37657.html

https://blog.csdn.net/u013713117/article/details/54773467

https://blog.csdn.net/MebiuW/article/details/52832847

论文复现:

https://blog.csdn.net/as472780551/article/details/82289944

猜你喜欢

转载自blog.csdn.net/weixin_40240670/article/details/86076172