AMBERT! Beyond BERT! Multi-granularity token pre-training language model

AMBERT: A PRE-TRAINED LANGUAGE MODEL WITH MULTI-GRAINED TOKENIZATION

1. What are the problems with the previous BERT?

Induction: that is, the token in BERT is fine-grained. This fine-grainedness cannot solve the "multi-word expression in English (such as ice creaming; New York). The meaning of these multi-word expressions is far from the true meaning. )"

2. Author's solution

In this article, we propose a multi-grained BERT model (AMBERT), which uses both fine-grained and coarse-grained tags . For English, AMBERT extends BERT by simultaneously constructing representations of words and phrases in the input text using two encoders. To be precise, AMBERT first performs tokenization at the word and phrase level. Then, it takes the embedding of words and phrases as input to the two encoders. It uses the same parameters in both encoders. Finally, it obtains the contextual representation of the word and the contextual representation of the phrase at each position. Note that due to parameter sharing, the number of parameters in AMBERT is equivalent to the number of parameters in BERT. AMBERT can represent the input text at the word level and phrase level to take advantage of the advantages of these two marking methods and create a richer representation for the input text at multiple granularities

3 Author contribution (innovative points)

  1. Research on Multi-granularity Pre-training Language Model
  2. Propose a new pre-trained language model called AMBERT as an extension of BERT, which uses multi-granularity tokens and shared parameters
  3. Empirical verification of AMBERT on English and Chinese benchmark data sets GLUE, SQuAD, RACE, and CLUE.

Specifically refer to link expertise and thesis: https: //www.zhuanzhi.ai/vip/bc6b030cfb7f96c81f1eb5440fcb7f94
papers address

Guess you like

Origin blog.csdn.net/qq_40199232/article/details/108333383