4. The text similarity

4. The text similarity

The main purpose of text similarity analysis is to analyze and measure distance from each other two text. These entities can be simple text or word frequency identification, such as a word, the entire document may be contain sentences and paragraphs of text. There are a variety of text similarity analysis method, the purpose of the text similarity analysis is broadly divided into the following two aspects.

  • Lexical similarity: the contents of a text document study by the syntax, structure and content, and measure their similarity based on these parameters.
  • Semantic similarity: first identify the semantic meaning and context of the document, and then find their distance from each other. In this respect, dependency grammar and entity recognition is a useful tool.

The most popular area of ​​research is lexical similarity analysis, because the technology is simpler and easier to implement, you can use a simple model (such as the bag of words model) Some analysts semantic similarity of realization. Typically, the distance metric used to measure the similarity between the text entities. Next, we will focus on two areas of text similarity.

  • Similarity of terms: Here, the measurement identity or similarity between each word.
  • Document similarity: here, the measure of similarity between the entire text document.

The idea is to make and use several distance metric to see how the similarity between the measurement and analysis of only a simple word of entities, then take a look at when the similarity between documents is measured by a complex phrase consisting of the time, what happens Variety.

Guess you like

Origin www.cnblogs.com/dalton/p/11354014.html