神经机器翻译WMT14英法基准系统 WMT14 English-French Baseline

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/hellonlp/article/details/82663198

最近(2017年以来)的WMT14 English-French Baseline记录

1. GNMT

   https://arxiv.org/pdf/1609.08144.pdf

   语料处理:a shared source and target vocabulary of 32K wordpieces

     For the wordpiece models, we train 3 different models with vocabulary sizes of 8K, 16K, and 32K. Table 4 summarizes our results on the WMT En→Fr dataset. In this table, we also compare against other strong baselines without model ensembling. As can be seen from the table, “WPM-32K”, a wordpiece model with a shared source and target vocabulary of 32K wordpieces, performs well on this dataset and achieves the best quality as well as the fastest inference speed.

    On WMT En→Fr, the training set contains 36M sentence pairs. In both cases, we use newstest2014 as the test sets to compare against previous work. The combination of newstest2012 and newstest2013 is used as the development set.

   实验结果:Table 4 in Page 16:    En→Fr  WPM-32K 38.95

                 or Table 6 in Page 17:    En→Fr Trained with log-likelihood 38.95

   

2. Transformer

    https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

   语料处理: 32000 joint word-piece vocabulary

         For English-French, we used the significantly larger WMT 2014 English-French dataset consisting of 36M sentences and split tokens into a 32000 word-piece vocabulary.

   实验结果:Table 2 in Page 8:    Transformer (base model)  38.1      Transformer (big)  41.0

3. RNMT+

http://aclweb.org/anthology/P18-1008

语料处理:32K joint sub-word units (其实是32K wordpieces)

       We train our models on the standard WMT’14 En→Fr and En→De datasets that comprise 36.3M and 4.5M sentence pairs, respectively. Each sentence was encoded into a sequence of sub-word units obtained by first tokenizing the sentence with the Moses tokenizer, then splitting tokens into subword units (also known as “wordpieces”) using the approach described in (Schuster and Nakajima, 2012). We use a shared vocabulary of 32K sub-word units for each source-target language pair.

实验结果: Table 1 in Page 81: RNMT+     41.00 ± 0.05

4. ConvS2S

https://arxiv.org/pdf/1705.03122.pdf

github:https://github.com/facebookresearch/fairseq/

              https://github.com/facebookresearch/fairseq/issues/59 (语料处理)

语料处理:40K joint BPE

    We use the full training set of 36M sentence pairs, and remove sentences longer than 175 words as well as pairs with a source/target length ratio exceeding 1.5. This results in 35.5M sentence-pairs for training. Results are reported on newstest2014. We use a source and target vocabulary with 40K BPE types.

      注意validation set的设置:  In all setups a small subset of the training data serves as validation set (about 0.5-1% for each dataset) for early stopping and learning rate annealing.

实验结果: Table 1: ConvS2S (BPE 40K)  40.51

5. Fairseq 

   https://arxiv.org/pdf/1806.00187.pdf

   github:https://github.com/pytorch/fairseq

语料处理: 40K joint BPE

    For En–Fr, we train on WMT’14 and borrow the setup of Gehring et al. (2017) with 36M training sentence pairs. We use newstest12+13 for validation and newstest14 for test. The 40K vocabulary is based on a joint source and target BPE factorization. 

      validation set: newstest12+13 for validation

实验结果: Table2: Our result   43.2

猜你喜欢

转载自blog.csdn.net/hellonlp/article/details/82663198
今日推荐