Introduction and practice of text classification model in NLP

Author: Zen and the Art of Computer Programming

1 Introduction

In natural language processing (NLP), text classification refers to the automatic classification of the category to which it belongs based on a given text. For example: given a piece of text, judge whether it involves law, politics, culture, entertainment and other fields. Or given a Weibo, determine which hashtag it is. The text classification task in NLP is an important branch of computer information processing technology. Its application scenarios include news sentiment analysis, spam filtering, web search recommendation, question answering robot, chat robot, information retrieval system, enterprise marketing strategy optimization, etc.

This article will introduce the current mainstream text classification models, including Naive Bayesian, Support Vector Machine (SVM), Neural Network (NN), Recurrent Neural Network (RNN) and Convolutional Neural Network (CNN), and give the details of these models. Features, scope of application and specific operation steps. Possible future models and methods will also be discussed at the end of the article.

2. Basic concepts

(1) Document (Document)

In NLP, a document can be a sequence of words or a sequence of phrases, and is generally used to represent input data. Generally speaking, a document consists of a set of words, phrases or symbols, and each document corresponds to a predefined category or label. For example: one document may correspond to a news report, while another document may correspond to a speech video.

(2) Features

In NLP, features can be words, phrases, sentences, or entire documents in a document. Features can come in many forms, such as letter counts, lexical features, word morphological features, grammatical features, contextual features, etc.

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132438459