【Paper Notes】Commonsense Knowledge Aware Conversation Generation with Graph Attention

Commonsense Knowledge Aware Conversation Generation with Graph Attention

  • Conference : IJCAI 2018

  • Task : Open Domain Dialog Generation

  • Code : project address

1. Motivation

Introducing commonsense knowledge into the dialogue task, as real-world background knowledge, can strengthen the model's semantic understanding of the dialogue context, thereby generating more appropriate and informative replies.

There are two shortcomings in the past work of introducing external knowledge:

(1) They are highly dependent on the quality of unstructured text, or are limited to small-scale, domain-specific knowledge;

(2) They usually exploit knowledge triples (entities) individually and independently, instead of considering knowledge triples as a whole in the graph. Therefore, they cannot express the semantics of graphs through linked entities and relations.

2. Main idea

​ This paper proposes a commonsense knowledge-aware dialogue model CCM. Given a user post, the model first retrieves the relevant knowledge graph from the knowledge base and then encodes the graph through a static graph attention mechanism to enhance the semantic understanding of the user Post. .

Then, in the text generation stage, the model attentively reads the retrieved knowledge graphs and knowledge triples in each graph through a dynamic graph attention mechanism to facilitate better generation. This is the first work that attempts to use large-scale commonsense knowledge in dialogue generation tasks. Moreover, unlike existing models that use knowledge triples independently and separately, our model considers each graph as a whole, which can encode more structured and connected semantic information in knowledge graphs.

3. Model

insert image description here

3.1 Knowledge retrieval

​ Use the user's post as a query to retrieve a series of graphs G from the knowledge base. Each graph includes a series of triples, and each triple includes: head entity, relationship, and tail entity. Note that only its neighboring nodes and relationships are retrieved . Also, only one graph is retrieved for each word.

​Use TransE to represent the relations and entities in the knowledge base. In order to fill the gap between the knowledge base and the unstructured dialogue text, use an MLP:
k = ( h ; r ; t ) = MLP ( Trans E ( h ; r ; t ) ) \textbf{k} = (\textbf{h};\textbf{r};\textbf{t}) = MLP(TransE(h; r; t))k=(h;r;t)=MLP(TransE(h;r;t ))
​ That is, use TransE to encode the triplet, and then use MLP to convert to get the final triplet embedding.

3.2 Static Graph Attention Mechanism

The knowledge graph vector statically represents the knowledge graph of the corresponding word in the input X through the static graph attention mechanism.

​ Specifically, a knowledge graph vector is generated by taking the TransE-encoded triplet vector as input.

​ The Bahdanau additive attention model is used here, but it has been improved according to the triplet scene. The attention weight measures the relationship between the relationship r and the head entity and tail entity (I don’t quite understand the role of doing this here) , the final generated knowledge graph vector is the weighted sum of the head and tail vectors and the attention score.

insert image description here

3.3 Knowledge Interpreter

image-20230704213701251

​ The knowledge interpreter takes the user's post and the retrieved graph G as input, and splices the word embedding and its corresponding knowledge graph vector to obtain the knowledge-aware representation of each word, and then sends it to the GRU.

3.4 Knowledge-aware generator

insert image description here

The knowledge-aware generator has two main functions:

(1) Selectively read the retrieved graph, get a graph-aware context vector, and use this vector to update the state of the decoder.

(2) Adaptively select a common word or entity from the retrieved graph for word generation.

image-20230704235939482

​ Among them, ct c_tctis the context vector generated by the encoder through the attention mechanism, ctg c_t^gctgand ctk c_t^kctkThey are obtained by attention calculation of context vector and knowledge graph vector respectively.

3.5 Dynamic Graph Attention Mechanism

​ The dynamic graph attention mechanism is layered. First, given the current state of the decoder st s_tst, which computes the attention sum of all knowledge graph vectors:

insert image description here

​ The attention weight measures the decoder state st s_tstand the knowledge graph vector gi g_igicontact.

​ Subsequently, calculate st s_tstThe attention between all triples of each knowledge graph vector uses a bilinear attention scoring function:

insert image description here

In the final weighting, the weight of each knowledge graph vector calculated by the first layer is multiplied by the weight of each triplet in the graph. That is, focus on a specific graph first, and then focus on a specific triplet in this graph.

​ When decoding, select a word or entity word in the vocabulary to generate, that is, introduce the Copy mechanism, here you can refer to the PGN pointer generation network::

image-20230705000832255

3.6 Loss function

insert image description here
A supervisory signal is applied here as whether the teach-force selects an entity word or a generic word. The first item of the loss function is the cross-entropy loss function, and the latter item is the supervisory signal, qt ∈ { 0 , 1 } q_t \in \{0,1\}qt{ 0,1 } , which is used to supervise the probability of selecting entity words or common words.

Guess you like

Origin blog.csdn.net/m0_47779101/article/details/131546545