【论文笔记】Enhancing Pre-Trained Language Representations with Rich Knowledge for MRC

KT-NET——Knowledge and Text fusion NET

 

KBs :WrodNet + NELL ;  distrubuted representations of KBs(KB embeddings).

WordNet: recording the lexical relations, such as (organism, hypernym of, animal)

NELL:stores beliefs about entities;比如(Coca Cola, headquartered in, Atlanta)

Datasets:ReCoRD, SQuAD1.1

And other differences (use extra knowledge model of such Kn-Reader difference )

First, the embeddins KB concepts of learning, learning to do the KB embeddings retrieved and integrated into the MRC system (ie structured kg and context are integrated together). Such relevant KB is used globally, which for MRC systems more useful.

Before the KB model is to retrieve relevant KB, then the related KB encode MRC and integrated into the system, which is locally relevant KB's.

Assessment indicators: EM, F1, F1 EM + Score

Model and papers related to the use of knowledge of this paper are worth a look.

contribution

1. pre-trained LMs + kn, potential future research directions, Enhancing LMs with advanced from KBs kg.

2. Design of the MRC's KT-NET

 

Use the effect of kb of bert

From ReCoRD (2018): After the introduction of NELL from WordNet and kn, improve the accuracy of CST.

 

Real-word entities, synsets, concepts

 

KT-NET model

 

Model Description

① first learn embeddings 2 Ge KBs of;

② retrieve relevant possible KB embeddings;

③encodes, the selected state of the hidden layer and the BERT embeddings fuse together;

④用context-, knowledge-aware predictions.

 

To encode kg, using the knowledge graph embedding techniques to learn the vector KB concepts FIG.

Given P, Q, and then retrieve a series of related KB concepts C (w) for all token w (w∈P∪Q), where each concept c∈C (w), c is a learned vector embedding c thus obtaining a pre-trained KB embeddings, then + 4 major components inside.

 

Then, iteratively:

  1. Layer Encoding BERT , calculation and passages of deep, context-aware representations;
  2. Intergration Layer Knowledge , not only context-aware, and knowledge-aware. The use of attention mechanism to select the most relevant kb embeddings from kb memory, and then put them bert encode the representations and integrate;
  3. Layer-Maching Self , FUSE BERT and KB Representations, further rich interactions.
  4. Output layermake knowledge-aware predictions.

 

specific

Use of two KBs, knowledge is stored as triples: (subject, relation, object),

Knowledge embedding

Given a triple (s, r, o), learning vector embeddings of subject s, relation r, and object o.

Then BILINEAR model, f (s, r, o) = sTdiag (r) o.

Such triples have been in the higher validity in KB. Then a magin-base ranking loss to learn embedded. To obtain a vector representation of each entity of two KBs.

Retrieval

Wordnet, the return of synsets word as a candidate;

NELL, the first identifying P, Q of the NE, by string matching the identified entities connected to the NELL entities, and gather relevant NELL Concepts related concepts as to obtain a series of potential candidates.

FIG: passage / question of the token, the 3 kb is given of the most relevant concepts ~ (selected with attention to)

 

4 component

 

 

 

 

 

experiment

Pretreatment: Use BasicTokenizer BERT, and find synonyms with NLTK, also with FullTokenizer built in BERT to segment words into

 

 

 

Consider all sentences words, (nv adj. Adv), and each word si, get the last word for the hidden layer, and then calculate word si q and p, sj the cosine similarity .

After MRC task to fine-tune BERT question words will learn to similar expressions. But the integration of knowledge into the future, different q words demonstrate different word for an article of similarity, these similarities a good reflection of their relationship to encode in KBs years.

KT-NET can learn more accurate representations, thereby gaining a better question-passage matching.

 

Mentioned technology

Knowledge graph embedding techniques (. Yang et al, 2015): used to encode knowledge, KB concept learned vector representation;

Element-wise multiplcation;

Row-wise softmax;

BILINEAR model (yang 2015) validity measured by a bilinear function f (s, r, o), and a margin-based ranking loss to learn the embeddings;

 

The need for external knowledge datasets

ReCoRD :extractive MRC datasets

ARC 、MCScript 、OpenBookQA 、CommonsenseQA :multi-choice MRC datasets

structured knowledge from KBs: a series of papers (see paper)

 

Part-mentioned paper

(Bishan Yang and Tom Mitchell. 2017. ) Leveraging knowledge bases in lstms for improving machine reading;

(2018)Commonsense for generative multi-hop question answering tasks.;

【看过】(2018)Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge.;

(2018, commonsense reasoning)Bridging the gap between human and machine commonsense reading comprehension

Published 63 original articles · won praise 13 · views 40000 +

Guess you like

Origin blog.csdn.net/changreal/article/details/103691154