Simple BERT Models for Relation Extraction and Semantic Role Labeling
1 Paper motivation
- Bert proposed based model to relation extraction (Relation Extraction) and semantic role labeling (Semantic Role Labeling)
- It does not require binding characteristics of vocabulary and syntax, reaching SOTA performance, providing Baseline for the follow-up study
2 Model Introduction
2.1 relationship extraction model
Model schematic relation extraction, as shown:
The input sentence is configured to: [[CLS] sentence [SEP] subject [SEP] object [SEP]]
To prevent over-fitting, using a special mask token for the subject entity of the sentence and the object entity, for example, [S-PER] represented representative subject entity. After the sentence after the word WordPiece Mask after segmentation is input to the encoder Bert
Use representing words between [[CLS] sentence [SEP] ] represented by the vector obtained Bert, here is not necessarily the length of the sentence, since the word may be divided into several sub-word word will
Use vector representation of an entity subject
Use vector object representation of an entity
Define the position of sequences relative to the subject entity is :
In the formula, and are based language start and end location of the entity, represents the relative position and subject entity
Likewise, due to the position of the object entity is the sequence
The switching position is a position vector sequence, and a vector representing Bert stitching, as shown in (a),
The vector sequence is then input to a Bi-LSTM, Get last status hidden layer in each direction
A single input to the neural network hidden layer is the prediction relationship
2.2 semantic role annotation model
Model schematic of semantic parsing, as shown:
2.2.1 Predicate sense disambiguation, predicate meaning disambiguation
This task will be processed as a sequence labeling, sentence after WordPiece word breaker word, any word of a token is labeled as O, the rest of the token is labeled X. The vector can be expressed as Bert , and predicates indicator embedded spliced prediction is classified after single hidden layer neural network
2.2.2 Argument identification and classification, identification and classification of arguments
Model structure shown above, the input sequence is [[CLS] sentence [SEP] predicate [SEP]], obtained by the vector representing Bert and embedding the indicator by stitching after the single Bi-LSTM give each word sequence hidden layer is expressed as , for predicting the word vector representation , and a token representation of each vector to splice, to enter a single hidden layer of the neural network to classify forecast
3 Experimental performance
Relationship extraction model on TACRED dataset comparison of different models and indicators as shown:
Semantic role labeling model on CoNLL 2009 and out-of-domain data sets compare different models and indicators as shown: