Hierarchical Macro Discourse Parsing Based on Topic Segmentation Reading Notes

[ Related Information ]

Title: "Hierarchical Macro Discourse Parsing Based on Topic Segmentation"
Author: Feng Jiang, Yaxin Fan, Xiaomin Chu, Peifeng Li, Qiaoming Zhu *, Fang Kong
meeting: 2021-AAAI

[ Code address ]

No

[ Knowledge Reserve ]

Topic segmentation, text analysis

1. Background and overview

1.1 Related research

The purpose of text analysis is to identify the core nature of text units (which unit is the core and which unit is the satellite) and relationships.
The text analysis is divided into:

  • Micro-text analysis: the relationship within and between sentences
  • Macro-text analysis: the relationship between paragraphs and chapters The
    micro-text structure can only ensure that the part of the text is meaningful, and the larger unit beyond the relationship between individual sentences requires macro-text analysis to explain. Macro text analysis can reveal the topic and overall structure of a text from a higher level. Accurate macro text analysis is essential to obtain a good text dependency tree and improve the performance of downstream NLP tasks.
    The current problems are as follows:
  • The text unit size of macro text analysis is larger, and there are fewer connections between units
  • There are many chapter units for macro-text analysis
  • In macro-text analysis, there is no clear boundary between higher-level paragraphs

1.2 Contribution points

  • Hierarchical text analysis
  • Do not use clear sentence and paragraph boundaries, but use topic boundaries

1.3 Related work

  • There is almost no work to study the macro text analysis
  • MCDTB is the only open source macro Chinese text corpus
  • There is almost no work to build a macroscopic chapter structure hierarchically

Two, the model

Preparing Data for Topic Segmentation

I want to train a topic segmentation model, but there is no labeled topic boundary on the data set of the text analysis. Therefore, to convert the labeled text structure tree into a topic boundary, the conversion rules are as follows:

  • One topic corresponds to one subtree
  • The number of paragraphs in a subtree does not exceed half of the length of the entire chapter.
    Chapter Analysis Data Set -----> Chapter Analysis Data Set with Topic Boundaries

Model Specifics for Topic Segmentation

Insert picture description here

Model Specifics for Discourse Parsing

Three, experiment and evaluation

4. Conclusion and personal summary

Five, reference

Six, expansion

Guess you like

Origin blog.csdn.net/jokerxsy/article/details/115252698