[标题]
《Macro Discourse Relation Recognition via Discourse Argument Pair Graph》
[ Code Address ]
None
[ Knowledge Reserve ]
table of Contents
1. Background and overview
1.1 Related research
no
1.2 Contribution points
- First use of gnn in Chinese text relationship recognition
- Good performance
1.3 Related work
no
Two, the model
2.0 General
Summarize the mapping and model.
- argument-word: the keyword information TF-IDF, a priori attention information
- word-word: the global information PMI, subject coherence between sentences
Barabara.
2.1 Build a map
2.1.0 Node representation
Build a graph on the entire corpus, which contains all argument nodes and word nodes.
Using word2vec as a word vector can alleviate the cold start problem and bring more precise word semantic information .
The argument uses the average of word vectors.
2.1.1 Connecting edges
Word-word: Using the PMI indicator, a positive PMI value represents a higher semantic connection between words.
Word-sentence: TF is the frequency of words appearing in sentences, IDF is the
frequency of the inverse document after log normalization (? IDF is the frequency of the inverse document after log normalization).
Self-loop: Not only learn the new, but also keep the old.
2.1.2 Graph construction
2.2 Model
2.0 Input layer
A and H 0 H^0 H0
2.1 Coding layer
After the first layer of convolutional network, the sentence aggregates the words connected to it; the word aggregates the words connected to it.
After the second layer of convolutional network, the sentence aggregates the global semantic information brought by the "words connected to it".
2.2 Classification layer
It turns out that an argument is a paragraph, there are multiple sentences, first concat each to get H arg 1 H_{arg1}Harg1And H arg 2 H_{arg2}Harg2, Then concat to get HHH , then classify.
Cross entropy:
Three, experiment and evaluation
Benchmark model:
- LSTM
- MSRM: Utilizing global information, but ignoring the inconsistency of important words in sentences.
- STGSN: The sequence model cannot well capture the intra-sentence dependencies of long texts; it is not good for long text attention; it ignores global information.
Four, ablation experiment
Remove ww edge: no PMI—>1?
Remove wo edge: no TFIDF—>the weight of each argument for each of its words=1/length
5. Conclusion and personal summary
The obtained sentence vector representation may be transferred to other tasks
. How to better model the future work