textRNN & textCNN network structure and code implementation!

1. What is textRNN

textRNN RNN cycle refers to the use of neural networks to solve the problem of text classification , text categorization is a fundamental task of natural language processing, trying to deduce the given text (sentence, documents, etc.) label or tag collection.

Text classification is very wide application, such as:

  • Spam Categories: 2 classification, to determine whether the message is spam
  • Sentiment Analysis: 2 classification: judgment text sentiment is positive or negative; multi-classification problem: determine which type {text sentiment is very negative, negative, neutral, positive, very positive} in.
  • News Categories: judging a piece of news which belong to categories such as finance, sports, entertainment and so on. According to the number of category tags may be 2 classification can also be a multi-classification.
  • Question Classification Answering System
  • Q & A community problem classification system: multi-multi-label classification (multi-classification of a piece of text, the text may have multiple labels), if known almost see the mountains Cup
  • The judge let the AI ​​do: fine grade classification based on the facts described in the text (multi-classification) statute and classification (multi-multi-label classification)
  • News for determining whether the robot wrote: Category 2

1.1 textRNN principle

In some natural language processing tasks, when the sequence is processed, we will generally adopt recurrent neural network RNN, especially in some of its variants, such as LSTM (more common), GRU. Of course, we can also RNN applied to text categorization task.

Here it can be a text sentence of the document (short text, a number of sentences) or chapter (long text), the length of each segment of the text are different. When the text classification, we will generally specify a fixed sequence of input / Text Length: the length may be the length of the longest text / sequences, in which case all other text / sequence should be padded to the length; the length may be the average of all the training set text / sequence length, at this time is too long for the text / truncated sequences required, the text is too short to fill. In short, all of the text so that the training set / sequence of the same length, which is provided in addition to the previously mentioned, may be any other reasonable value. In the test, the test also needs to focus on the text / sequence to do the same process.

Suppose the length of the training set all text / unified sequence is n, we need the text word, and the word used to give each word embedded in the fixed dimension vector representation. For each input a text / sequences, we can enter at each time step RNN vector representation of a word in the text, is calculated on the hidden current time step, and for outputting the current time step and passed to the next and a time step and the next word of the word input unit RNN together as vectors and then calculate the next time step on the RNN hidden, ... is repeated until the thus processed input text for each word, since input text length is n, it is subjected to the n-th time step.

RNN-based text classification model is very flexible, have a variety of structures. Next, we introduce two typical structure.

2. textRNN network structure

2.1 structure 1

流程:embedding--->BiLSTM--->concat final output/average all output----->softmax layer

The structure is shown below:

Generally the forward / reverse LSTM hidden in the last state of a time step, and then splicing performed in more than one classification through a softmax layer (output layer using the softmax activation function); or taking a forward / reverse each LSTM a hidden state in the time step, two hidden on each time step for splicing, and then taking the mean of the hidden state after splicing step all the time, and then after a softmax layer (output layer using the softmax activation functions) more than one category (Category 2, then using sigmoid activation function).

The above configuration may be added dropout / L2 regularization BatchNormalization or acceleration and to prevent over-fitting model training.

2.2 structure 2

流程:embedding-->BiLSTM---->(dropout)-->concat ouput--->UniLSTM--->(droput)-->softmax layer

The structure is shown below:

与之前结构不同的是,在双向LSTM(上图不太准确,底层应该是一个双向LSTM)的基础上又堆叠了一个单向的LSTM。把双向LSTM在每一个时间步长上的两个隐藏状态进行拼接,作为上层单向LSTM每一个时间步长上的一个输入,最后取上层单向LSTM最后一个时间步长上的隐藏状态,再经过一个softmax层(输出层使用softamx激活函数,2分类的话则使用sigmoid)进行一个多分类。

2.3 总结

TextRNN的结构非常灵活,可以任意改变。比如把LSTM单元替换为GRU单元,把双向改为单向,添加dropout或BatchNormalization以及再多堆叠一层等等。TextRNN在文本分类任务上的效果非常好,与TextCNN不相上下,但RNN的训练速度相对偏慢,一般2层就已经足够多了。

3. 什么是textCNN

在“卷积神经⽹络”中我们探究了如何使⽤⼆维卷积神经⽹络来处理⼆维图像数据。在之前的语⾔模型和⽂本分类任务中,我们将⽂本数据看作是只有⼀个维度的时间序列,并很⾃然地使⽤循环神经⽹络来表征这样的数据。其实,我们也可以将⽂本当作⼀维图像,从而可以⽤⼀维卷积神经⽹络来捕捉临近词之间的关联。本节将介绍将卷积神经⽹络应⽤到⽂本分析的开创性⼯作之⼀:textCNN

3.1 ⼀维卷积层

在介绍模型前我们先来解释⼀维卷积层的⼯作原理。与⼆维卷积层⼀样,⼀维卷积层使⽤⼀维的互相关运算。在⼀维互相关运算中,卷积窗口从输⼊数组的最左⽅开始,按从左往右的顺序,依次在输⼊数组上滑动。当卷积窗口滑动到某⼀位置时,窗口中的输⼊⼦数组与核数组按元素相乘并求和,得到输出数组中相应位置的元素。如下图所⽰,输⼊是⼀个宽为7的⼀维数组,核数组的宽为2。可以看到输出的宽度为 7 - 2 + 1 = 6,且第⼀个元素是由输⼊的最左边的宽为2的⼦数组与核数组按元素相乘后再相加得到的:0 × 1 + 1 × 2 = 2。

多输⼊通道的⼀维互相关运算也与多输⼊通道的⼆维互相关运算类似:在每个通道上,将核与相应的输⼊做⼀维互相关运算,并将通道之间的结果相加得到输出结果。下图展⽰了含3个输⼊ 通道的⼀维互相关运算,其中阴影部分为第⼀个输出元素及其计算所使⽤的输⼊和核数组元素: 0 × 1 + 1 × 2 + 1 × 3 + 2 × 4 + 2 × (-1) + 3 × (-3) = 2。

由⼆维互相关运算的定义可知,多输⼊通道的⼀维互相关运算可以看作单输⼊通道的⼆维互相关运算。如下图所⽰,我们也可以将上图中多输⼊通道的⼀维互相关运算以等价的单输⼊通道的⼆维互相关运算呈现。这⾥核的⾼等于输⼊的⾼。下图的阴影部分为第⼀个输出元素及其计算所使⽤的输⼊和核数组元素:2 × (-1) + 3 × (-3) + 1 × 3 + 2 × 4 + 0 × 1 + 1 × 2 = 2。

以上都是输出都只有⼀个通道。我们在“多输⼊通道和多输出通道”⼀节中介绍了如何在⼆维卷积层中指定多个输出通道。类似地,我们也可以在⼀维卷积层指定多个输出通道,从而拓展卷积层中的模型参数。

3. 2 时序最⼤池化层

类似地,我们有⼀维池化层。textCNN中使⽤的时序最⼤池化(max-over-time pooling)层实际上对应⼀维全局最⼤池化层:假设输⼊包含多个通道,各通道由不同时间步上的数值组成,各通道的输出即该通道所有时间步中最⼤的数值。因此,时序最⼤池化层的输⼊在各个通道上的时间步数可以不同。为提升计算性能,我们常常将不同⻓度的时序样本组成⼀个小批量,并通过在较短序列后附加特殊字符(如0)令批量中各时序样本⻓度相同。这些⼈为添加的特殊字符当然是⽆意义的。由于时序最⼤池化的主要⽬的是抓取时序中最重要的特征,它通常能使模型不受⼈为添加字符的影响。

3.3 textCNN模型

textCNN模型主要使⽤了⼀维卷积层和时序最⼤池化层。假设输⼊的⽂本序列由n个词组成,每个词⽤d维的词向量表⽰。那么输⼊样本的宽为n,⾼为1,输⼊通道数为d。textCNN的计算主要分为以下⼏步:

  1. 定义多个⼀维卷积核,并使⽤这些卷积核对输⼊分别做卷积计算。宽度不同的卷积核可能会捕捉到不同个数的相邻词的相关性。
  2. 对输出的所有通道分别做时序最⼤池化,再将这些通道的池化输出值连结为向量。
  3. 通过全连接层将连结后的向量变换为有关各类别的输出。这⼀步可以使⽤丢弃层应对过拟合。

下图⽤⼀个例⼦解释了textCNN的设计。这⾥的输⼊是⼀个有11个词的句⼦,每个词⽤6维词向量表⽰。因此输⼊序列的宽为11,输⼊通道数为6。给定2个⼀维卷积核,核宽分别为2和4,输出通道数分别设为4和5。因此,⼀维卷积计算后,4个输出通道的宽为 11 - 2 + 1 = 10,而其他5个通道的宽为 11 - 4 + 1 = 8。尽管每个通道的宽不同,我们依然可以对各个通道做时序最⼤池化,并将9个通道的池化输出连结成⼀个9维向量。最终,使⽤全连接将9维向量变换为2维输出,即正⾯情感和负⾯情感的预测。

4. 代码实现

清华新闻分类数据集下载:https://www.lanzous.com/i5t0lsd

机器学习通俗易懂系列文章

3.png

5. 参考文献


作者:@mantchs

GitHub:https://github.com/NLP-LOVE/ML-NLP

欢迎大家加入讨论!共同完善此项目!群号:【541954936】NLP面试学习群

Guess you like

Origin www.cnblogs.com/mantch/p/11416104.html