RNN cyclic neural network (process analysis)

The introduction of RNN: how to express the time series?

 

The convolutional neural network extracts features on the position. For example, speech and translation are both entangled in time. How to extract time series features.

For the research on sequence problems: how to program text into a data type or value, this research plan is embedding.

 

 

The representation method of a single stock price, each time point is a corresponding price value.

 

Representation of multiple stock prices.

The same is true when this idea is applied to pictures. Read the values ​​​​by rows as the characteristics of the sequence.

 

 

 

One-hot representation method.

 

 

 


Let’s analyze it through specific questions and judge whether the evaluation is positive or negative. When we enter I hate this boring movie, we can see that an Embedding is a word.

Each word is an input, there are as many fully connected layers as there are words, each fully connected layer processes a word, and the current semantic information of each word is extracted through the fully connected layer, and then a classification is done to know, the current Whether the review is positive or negative. This method is relatively intuitive, but there is a problem. If a relatively long comment, 100 words need to introduce 100 fully connected layers, the more important thing is that we only get the semantic information of each word. For 100 words, we There is no overall semantic relevance. If we scramble 100 words, it will not affect the result, but the result will affect the overall semantic relevance. 

We need to integrate the whole and analyze it as a whole. To deal with this problem?
How to solve the large amount of parameters in the convolutional network, we use a convolution kernel, the size of the convolution kernel remains unchanged, and the internal value changes, which is solved. We only use one neural network to solve, constant input and output. Our parameters are only w and b. Solve the problem of too many parameters.

How to solve the overall analysis of semantic information?
We add a memory layer or a state state layer, that is, every time a fully connected layer passes through, the latter will record the previous information. In this way, we get the last memory of the entire sentence, which records all the information, and we can get all the semantic information only by processing the information obtained last time. 

There are as many samples as there are on the time axis, as many times as needed, and sampling is related to time. We fold the time week, the initial state is h0, and then continuously aggregate the information. The current input is the last context information. In this way, through continuous aggregation, the last one contains all the contextual information. This is the principle of the simplest RNN cycle network . Taken together, it is the result of continuous aggregation of a neural network.

Here is the mathematical representation of the RNN: code demo

# _*_ coding =utf-8 _*_

import tensorflow as tf
from tensorflow.keras import layers,optimizers,datasets,Sequential
import os
#Xt@Wxh + ht@Whh
cell =layers.SimpleRNNCell(3)
#[b,embeding]4输入的维度
cell.build(input_shape=(None,4))
#打印cell的参数
print(cell.trainable_variables)

#kernel:0表示Wxh,recurrent_kernel:0表示Whh
# [<tf.Variable 'kernel:0' shape=(4, 3) dtype=float32, numpy=
# array([[ 0.87822306,  0.8071407 , -0.20742077],
#        [-0.5177747 , -0.0864749 ,  0.87705004],
#        [-0.8141298 ,  0.32291627, -0.329014  ],
#        [-0.05522245, -0.5882717 ,  0.74687743]], dtype=float32)>, <tf.Variable 'recurrent_kernel:0' shape=(3, 3) dtype=float32, numpy=
# array([[ 0.29745793,  0.9037314 ,  0.30787772],
#        [-0.9194169 ,  0.35804993, -0.16270511],
#        [-0.25727728, -0.23467004,  0.9374105 ]], dtype=float32)>, <tf.Variable 'bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

Wxh is the weight of the last time, Whh is the weight of this time, and will generate the next new Yt, Ht is the activation of the previous context information, and the gradient is guaranteed to be continuously derivable.

If the output is [b,64], then [b,100]@[100,64] +[b,64]@[64,64]=[b,64] 

Xt@Wxh + ht@Whh This is a SampleRNNCell (different from LSTM and GRU)

# #获取4个句子,每个句子80个单词,每个单子100位向量
# x = tf.random.normal([4,80,100])
# #获取第一个单词
# xt0 = x[:,0,:]
# #中间维度是64
# cell = tf.keras.layers.SimpleRNNCell(64)
# #初始状态xt0,返回俩个相同的状态[b,64] 输出
# out,xt1 = cell(xt0,[tf.zeros([4,64])])
#
# print(out.shape,xt1[0].shape)

Gradient diffusion and gradient explosion:

 

For k times of Whh, if Whh>1, the result will approach infinity, and gradient explosion will occur.
If Whh<1, then the result will approach 0, and gradient dispersion will occur.

 

 RNN sentiment analysis in practice: single-layer RNN

import os
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from keras.preprocessing.sequence import pad_sequences

tf.random.set_seed(22)
np.random.seed(22)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
assert tf.__version__.startswith('2.')
embedding_len=100#表示100个句子
batchsz = 128
#
total_words = 10000
max_review_len = 80 #80个单词
(x_train,y_train),(x_test,y_test) = keras.datasets.imdb.load_data(num_words=total_words)
# x_train[b,80] 每个句子80个单词
# x_test[b,80]
x_train = keras.preprocessing.sequence.pad_sequences(x_train,maxlen=max_review_len)
x_test = keras.preprocessing.sequence.pad_sequences(x_test,maxlen=max_review_len)

db_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))
#最后一个bach给drop掉
db_train = db_train.shuffle(1000).batch(batchsz,drop_remainder=True)
db_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
db_test = db_test.batch(batchsz,drop_remainder=True)

print('x_train shape:',x_train.shape,tf.reduce_max(y_train),tf.reduce_min(y_train))
print('x_test:',x_test.shape)

class MyRNN(keras.Model):
    def __init__(self,units):
        super(MyRNN, self).__init__()
        #初始化状态[b,64],b个句子,每个句子64维的状态
        self.state0 = [tf.zeros([batchsz,units])]
        #网络总共三层,第一层就是把数字编码转换为embedding编码,b个句子,每个句子80个单词,每个单词100维向量来表示
        # [b,80] => [b,80,100]
        #total_words输入维度10000,embedding_len 输出是100维,input_length每个句子的长度是80
        self.embedding = layers.Embedding(total_words,embedding_len,input_length=max_review_len)

        #[b,80,100],h_dim:units(100维进行转换为64)
        #RNN:cell0,cell1,cell2
        #SimpleRNN
        #第二层是语义提取,就是利用SimapleRNN提取语义,dropout防止过拟合
        self.rnn_cell0 = layers.SimpleRNNCell(units,dropout=0.2)

        #fc,[b,80,100] => [b,64] => [b,1]
        self.outlayer = layers.Dense(1)

    def call(self,inputs,training=None):
        """
        net(x),net(x, training=True)
        :param inputs: [b,80]
        :param training:
        :return:
        """
        #[b,80]
        x = inputs
        #embedding:[b,80] => [b,80,100]
        x = self.embedding(x)
        #rnn cell compute
        # [b,80,100] =>[b,64]
        state0 = self.state0
        #x在第一维上进行展开
        for word in tf.unstack(x,axis=1):# word [b,100]
            # x*wxh + h*whh #上一个状态会获取下一个状态
            out,state1 = self.rnn_cell0(word,state0)
            #新的state赋值给上一个state
            state0 = state1

        # out: [b,64]=>[b,1]二分类转换为b和1
        x = self.outlayer(out)
        #p(y is pos|x) 计算概率
        prob = tf.sigmoid(x)

        return prob  #完成了前向计算的信息
def main():
    units =64
    epochs =4
    model =MyRNN(units)
    model.compile(optimizer = keras.optimizers.Adam(0.001),
                  loss = tf.losses.BinaryCrossentropy(),
                  metrics = ['accuracy'])
    model.fit(db_train,epochs=epochs,validation_data=db_test)

    model.evaluate(db_test)

if __name__ =='__main__':
    main()

Multilayer RNN:

import os
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from keras.preprocessing.sequence import pad_sequences

tf.random.set_seed(22)
np.random.seed(22)
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
assert tf.__version__.startswith('2.')
embedding_len=100#表示100个句子
batchsz = 128
#
total_words = 10000
max_review_len = 80 #80个单词
(x_train,y_train),(x_test,y_test) = keras.datasets.imdb.load_data(num_words=total_words)
# x_train[b,80] 每个句子80个单词
# x_test[b,80]
x_train = keras.preprocessing.sequence.pad_sequences(x_train,maxlen=max_review_len)
x_test = keras.preprocessing.sequence.pad_sequences(x_test,maxlen=max_review_len)

db_train = tf.data.Dataset.from_tensor_slices((x_train,y_train))
#最后一个bach给drop掉
db_train = db_train.shuffle(1000).batch(batchsz,drop_remainder=True)
db_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
db_test = db_test.batch(batchsz,drop_remainder=True)

print('x_train shape:',x_train.shape,tf.reduce_max(y_train),tf.reduce_min(y_train))
print('x_test:',x_test.shape)

#以单个层训练,现在再增加一个层,提高训练准确度

class MyRNN(keras.Model):
    def __init__(self,units):
        super(MyRNN, self).__init__()
        #初始化状态[b,64],b个句子,每个句子64维的状态
        self.state0 = [tf.zeros([batchsz,units])]
        self.state1 = [tf.zeros([batchsz,units])]
        #网络总共三层,第一层就是把数字编码转换为embedding编码,b个句子,每个句子80个单词,每个单词100维向量来表示
        # [b,80] => [b,80,100]
        #total_words输入维度10000,embedding_len 输出是100维,input_length每个句子的长度是80
        self.embedding = layers.Embedding(total_words,embedding_len,input_length=max_review_len)

        #[b,80,100],h_dim:units(100维进行转换为64)
        #RNN:cell0,cell1,cell2
        #SimpleRNN
        #第二层是语义提取,就是利用SimapleRNN提取语义,dropout防止过拟合
        self.rnn_cell0 = layers.SimpleRNNCell(units,dropout=0.2)
        self.rnn_cell1 = layers.SimpleRNNCell(units,dropout=0.2)

        #fc,[b,80,100] => [b,64] => [b,1]
        self.outlayer = layers.Dense(1)

    def call(self,inputs,training=None):
        """
        net(x),net(x, training=True)
        :param inputs: [b,80]
        :param training:
        :return:
        """
        #[b,80]
        x = inputs
        #embedding:[b,80] => [b,80,100]
        x = self.embedding(x)
        #rnn cell compute
        # [b,80,100] =>[b,64]
        state0 = self.state0
        state1 = self.state1
        #x在第一维上进行展开
        for word in tf.unstack(x,axis=1):# word [b,100]
            # x*wxh + h*whh #上一个状态会获取下一个状态
            out0,state0 = self.rnn_cell0(word,state0)
            out1,state1 = self.rnn_cell1(out0,state1)


        # out: [b,64]=>[b,1]二分类转换为b和1
        x = self.outlayer(out1)
        #p(y is pos|x) 计算概率
        prob = tf.sigmoid(x)

        return prob  #完成了前向计算的信息
def main():
    units =64
    epochs =4
    model =MyRNN(units)
    model.compile(optimizer = keras.optimizers.Adam(0.001),
                  loss = tf.losses.BinaryCrossentropy(),
                  metrics = ['accuracy'])
    model.fit(db_train,epochs=epochs,validation_data=db_test)

    model.evaluate(db_test)

if __name__ =='__main__':
    main()

 The result display:

 

Guess you like

Origin blog.csdn.net/chehec2010/article/details/127102179