KT-NET——Knowledge and Text fusion NET
KBs :WrodNet + NELL ; distrubuted representations of KBs(KB embeddings).
WordNet: recording the lexical relations, such as (organism, hypernym of, animal)
NELL:stores beliefs about entities;比如(Coca Cola, headquartered in, Atlanta)
Datasets:ReCoRD, SQuAD1.1
And other differences (use extra knowledge model of such Kn-Reader difference )
First, the embeddins KB concepts of learning, learning to do the KB embeddings retrieved and integrated into the MRC system (ie structured kg and context are integrated together). Such relevant KB is used globally, which for MRC systems more useful.
Before the KB model is to retrieve relevant KB, then the related KB encode MRC and integrated into the system, which is locally relevant KB's.
Assessment indicators: EM, F1, F1 EM + Score
Model and papers related to the use of knowledge of this paper are worth a look.
contribution
1. pre-trained LMs + kn, potential future research directions, Enhancing LMs with advanced from KBs kg.
2. Design of the MRC's KT-NET
Use the effect of kb of bert
From ReCoRD (2018): After the introduction of NELL from WordNet and kn, improve the accuracy of CST.
Real-word entities, synsets, concepts
KT-NET model
Model Description
① first learn embeddings 2 Ge KBs of;
② retrieve relevant possible KB embeddings;
③encodes, the selected state of the hidden layer and the BERT embeddings fuse together;
④用context-, knowledge-aware predictions.
To encode kg, using the knowledge graph embedding techniques to learn the vector KB concepts FIG.
Given P, Q, and then retrieve a series of related KB concepts C (w) for all token w (w∈P∪Q), where each concept c∈C (w), c is a learned vector embedding c thus obtaining a pre-trained KB embeddings, then + 4 major components inside.
Then, iteratively:
- Layer Encoding BERT , calculation and passages of deep, context-aware representations;
- Intergration Layer Knowledge , not only context-aware, and knowledge-aware. The use of attention mechanism to select the most relevant kb embeddings from kb memory, and then put them bert encode the representations and integrate;
- Layer-Maching Self , FUSE BERT and KB Representations, further rich interactions.
- Output layer,make knowledge-aware predictions.
specific
Use of two KBs, knowledge is stored as triples: (subject, relation, object),
Knowledge embedding
Given a triple (s, r, o), learning vector embeddings of subject s, relation r, and object o.
Then BILINEAR model, f (s, r, o) = sTdiag (r) o.
Such triples have been in the higher validity in KB. Then a magin-base ranking loss to learn embedded. To obtain a vector representation of each entity of two KBs.
Retrieval
Wordnet, the return of synsets word as a candidate;
NELL, the first identifying P, Q of the NE, by string matching the identified entities connected to the NELL entities, and gather relevant NELL Concepts related concepts as to obtain a series of potential candidates.
FIG: passage / question of the token, the 3 kb is given of the most relevant concepts ~ (selected with attention to)
4 component
experiment
Pretreatment: Use BasicTokenizer BERT, and find synonyms with NLTK, also with FullTokenizer built in BERT to segment words into
Consider all sentences words, (nv adj. Adv), and each word si, get the last word for the hidden layer, and then calculate word si q and p, sj the cosine similarity .
After MRC task to fine-tune BERT question words will learn to similar expressions. But the integration of knowledge into the future, different q words demonstrate different word for an article of similarity, these similarities a good reflection of their relationship to encode in KBs years.
KT-NET can learn more accurate representations, thereby gaining a better question-passage matching.
Mentioned technology
Knowledge graph embedding techniques (. Yang et al, 2015): used to encode knowledge, KB concept learned vector representation;
Element-wise multiplcation;
Row-wise softmax;
BILINEAR model (yang 2015) validity measured by a bilinear function f (s, r, o), and a margin-based ranking loss to learn the embeddings;
The need for external knowledge datasets
ReCoRD :extractive MRC datasets
ARC 、MCScript 、OpenBookQA 、CommonsenseQA :multi-choice MRC datasets
structured knowledge from KBs: a series of papers (see paper)
Part-mentioned paper
(Bishan Yang and Tom Mitchell. 2017. ) Leveraging knowledge bases in lstms for improving machine reading;
(2018)Commonsense for generative multi-hop question answering tasks.;
【看过】(2018)Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge.;
(2018, commonsense reasoning)Bridging the gap between human and machine commonsense reading comprehension