2.3 用numpy来验证tensorflow中 tf.nn.rnn_cell.LSTMCell的前向传播

1.使用方法

参考：https://blog.csdn.net/u010223750/article/details/71079036

引言
TensorFlow很容易上手，但是TensorFlow的很多trick却是提升TensorFlow心法的法门，之前说过TensorFlow的read心法，现在想说一说TensorFlow在RNN上的心法，简直好用到哭【以下实验均是基于TensorFlow1.0】

简要介绍tensorflow的RNN
其实在前面多篇都已经提到了TensorFlow的RNN，也在我之前的文章TensorFlow实现文本分类文章中用到了BasicLSTM的方法，通常的，使用RNN的时候，我们需要指定num_step，也就是TensorFlow的roll step步数，但是对于变长的文本来说，指定num_step就不可避免的需要进行padding操作，在之前的文章TensorFlow高阶读写教程也使用了dynamic_padding方法实现自动padding，但是这还不够，因为在跑一遍RNN/LSTM之后，还是需要对padding部分的内容进行删除，我称之为“反padding”，无可避免的，我们就需要指定mask矩阵了，这就有点不优雅，但是TensorFlow提供了一个很优雅的解决方法，让mask去见马克思去了，那就是dynamic_rnn

tf.dynamic_rnn
tensorflow 的dynamic_rnn方法，我们用一个小例子来说明其用法，假设你的RNN的输入input是[2,20,128]，其中2是batch_size,20是文本最大长度，128是embedding_size，可以看出，有两个example，我们假设第二个文本长度只有13，剩下的7个是使用0-padding方法填充的。dynamic返回的是两个参数：outputs,last_states，其中outputs是[2,20,128]，也就是每一个迭代隐状态的输出,last_states是由(c,h)组成的tuple，均为[batch,128]。

到这里并没有什么不同，但是dynamic有个参数：sequence_length，这个参数用来指定每个example的长度，比如上面的例子中，我们令 sequence_length为[20,13]，表示第一个example有效长度为20，第二个example有效长度为13，当我们传入这个参数的时候，对于第二个example，TensorFlow对于13以后的padding就不计算了，其last_states将重复第13步的last_states直至第20步，而outputs中超过13步的结果将会被置零。

2.未指定sequence_length

用numpy来模拟tensorflowLSTM的前向传播：

2.1 tensorflow LSTM_cell原理

参考：https://www.cnblogs.com/yuetz/p/6563377.html

class BasicLSTMCell(RNNCell):
  """Basic LSTM recurrent network cell.

  The implementation is based on: http://arxiv.org/abs/1409.2329.

  We add forget_bias (default: 1) to the biases of the forget gate in order to
  reduce the scale of forgetting in the beginning of the training.

  It does not allow cell clipping, a projection layer, and does not
  use peep-hole connections: it is the basic baseline.

  For advanced models, please use the full LSTMCell that follows.
  """

  def __init__(self, num_units, forget_bias=1.0, input_size=None,
               state_is_tuple=True, activation=tanh):
    """Initialize the basic LSTM cell.

    Args:
      num_units: int, The number of units in the LSTM cell.
      forget_bias: float, The bias added to forget gates (see above).
      input_size: Deprecated and unused.
      state_is_tuple: If True, accepted and returned states are 2-tuples of
        the `c_state` and `m_state`.  If False, they are concatenated
        along the column axis.  The latter behavior will soon be deprecated.
      activation: Activation function of the inner states.
    """
    if not state_is_tuple:
      logging.warn("%s: Using a concatenated state is slower and will soon be "
                   "deprecated.  Use state_is_tuple=True.", self)
    if input_size is not None:
      logging.warn("%s: The input_size parameter is deprecated.", self)
    self._num_units = num_units
    self._forget_bias = forget_bias
    self._state_is_tuple = state_is_tuple
    self._activation = activation

  @property
  def state_size(self):
    return (LSTMStateTuple(self._num_units, self._num_units)
            if self._state_is_tuple else 2 * self._num_units)

  @property
  def output_size(self):
    return self._num_units

  def __call__(self, inputs, state, scope=None):
    """Long short-term memory cell (LSTM)."""
    with vs.variable_scope(scope or "basic_lstm_cell"):
      # Parameters of gates are concatenated into one multiply for efficiency.
      if self._state_is_tuple:
        c, h = state
      else:
        c, h = array_ops.split(value=state, num_or_size_splits=2, axis=1)

　　　 # 线性计算 concat = [inputs, h]W + b 
　　　 # 线性计算，分配W和b，W的shape为（2*num_units, 4*num_units）, b的shape为（4*num_units,）,共包含有四套参数，
      # concat shape(batch_size, 4*num_units)
   　　# 注意：只有cell 的input和output的size相等时才可以这样计算，否则要定义两套W,b.每套再包含四套参数
      concat = _linear([inputs, h], 4 * self._num_units, True, scope=scope)

      # i = input_gate, j = new_input, f = forget_gate, o = output_gate
      i, j, f, o = array_ops.split(value=concat, num_or_size_splits=4, axis=1)

      new_c = (c * sigmoid(f + self._forget_bias) + sigmoid(i) *
               self._activation(j))
      new_h = self._activation(new_c) * sigmoid(o)

      if self._state_is_tuple:
        new_state = LSTMStateTuple(new_c, new_h)
      else:
        new_state = array_ops.concat([new_c, new_h], 1)
      return new_h, new_state

2.2 tensorflow中LSTM的前向传播

import tensorflow as tf
import numpy as np
import random
random.seed(10)

# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
                  [4, 5, 6],
                  [5, 6, 4],
                  [1, 2, 1]],

                  [[3, 2, 4],
                   [2, 2, 2],
                   [0, 0, 0],
                   [0, 0, 0]]
                 ])/10.0

# 数据占位符
data_placeholder = tf.placeholder(dtype=tf.float32, shape=[2, 4, 3])
# rnn的隐层维数 ： 4
cell_f = tf.nn.rnn_cell.LSTMCell(num_units=4, forget_bias=0.0)
outputs, last_states = tf.nn.dynamic_rnn(cell=cell_f, dtype=tf.float32, inputs=data_placeholder)

def get_rnn_variables_to_restore():
  return [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if 'lstm_cell' in v.name]
t = get_rnn_variables_to_restore()
print('输出LSTM_cell中的参数信息:')
print(t[0])
print(t[1])
# 由于我们看不到LSTM参数的初始化,所以不能指定参数的初始化方式
# 但是可以通过上边的方式找到LSTM的参数,从而用tf.assign()初始化
# 用tf.assign将parameter参数赋给LSTM中的参数
parameter = [i + 1 for i in range(112)]
parameter = np.array(parameter)
parameter = np.reshape(parameter, (7, 16)) / 100.0
print('parameter: ')
print(parameter)
parameter_placeholder = tf.placeholder(dtype=tf.float32, shape=[7, 16])
# 讲parameter_placeholder中的参数赋给LSTM_cell中的参数
assign = tf.assign(t[0], parameter_placeholder)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(assign, feed_dict={parameter_placeholder: parameter}))
    print('各个时间步的隐层状态：')
    print(sess.run(outputs, feed_dict={data_placeholder: data}))


输出：
输出LSTM_cell中的参数信息:
<tf.Variable 'rnn/lstm_cell/kernel:0' shape=(7, 16) dtype=float32_ref>
<tf.Variable 'rnn/lstm_cell/bias:0' shape=(16,) dtype=float32_ref>
parameter: 
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13 0.14
  0.15 0.16]
 [0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
  0.31 0.32]
 [0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41 0.42 0.43 0.44 0.45 0.46
  0.47 0.48]
 [0.49 0.5  0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6  0.61 0.62
  0.63 0.64]
 [0.65 0.66 0.67 0.68 0.69 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
  0.79 0.8 ]
 [0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94
  0.95 0.96]
 [0.97 0.98 0.99 1.   1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
  1.11 1.12]]
2018-10-19 16:09:07.859667: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13 0.14
  0.15 0.16]
 [0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
  0.31 0.32]
 [0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41 0.42 0.43 0.44 0.45 0.46
  0.47 0.48]
 [0.49 0.5  0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6  0.61 0.62
  0.63 0.64]
 [0.65 0.66 0.67 0.68 0.69 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
  0.79 0.8 ]
 [0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94
  0.95 0.96]
 [0.97 0.98 0.99 1.   1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
  1.11 1.12]]
各个时间步的隐层状态：
[[[0.04597805 0.04794675 0.04993038 0.05192876]
  [0.21005629 0.2185854  0.22713308 0.23569286]
  [0.5031241  0.51528114 0.5271909  0.5388504 ]
  [0.77954775 0.78653383 0.7932657  0.79975235]]

 [[0.06209857 0.06523891 0.06840938 0.07160922]
  [0.14686525 0.15213361 0.1574358  0.1627702 ]
  [0.2550022  0.2604864  0.26597795 0.27147514]
  [0.45640406 0.4631802  0.4699056  0.4765787 ]]]

2.3 用numpy来验证tensorflow中 tf.nn.rnn_cell.LSTMCell的前向传播

import numpy as np
# 来模拟tf.nn.sigmoid(x)函数
def sig(x):
    for i in range(len(x)):
        for j in range(len(x[0])):
            x[i][j] = 1.0 / (1.0 + math.exp(-x[i][j]))
    return x
# 来模拟tf.nn.tanh(x)函数
def tanh(x):
    for i in range(len(x)):
        for j in range(len(x[0])):
            x[i][j] = (math.exp(x[i][j]) - math.exp(-x[i][j])) / (math.exp(x[i][j]) + math.exp(-x[i][j]))
    return x

# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
                  [4, 5, 6],
                  [5, 6, 4],
                  [1, 2, 1]],

                  [[3, 2, 4],
                   [2, 2, 2],
                   [0, 0, 0],
                   [0, 0, 0]]
                 ])/10.0
"""
input = input/10.0

input1 = np.array([[[1,2,1],[5,6,4],[4,5,6],[1,2,3]],
                  [[0,0,0],[0,0,0],[2,2,2],[3,2,4]]])
input1 = input1/10.0
"""
print("data : ")
print(data)
w = [i + 1 for i in range(112)]
w = np.array(w)
w = np.reshape(w, (7, 16)) / 100.0
print('LSTM_cell中的参数 ：')
print(w)
print('LSTM_cell中的初始h: ')
h = np.zeros((2, 4))
h = h / 10.0
print('LSTM_cell中的初始c: ')
c = np.zeros((2, 4))
c = c / 10.0

# 时间步最大值是4:
for t in range(4):
    current = data[:, t, :]
    # 将当前输入与h进行拼接
    current = np.concatenate((current, h), axis=1)
    mid = np.matmul(current, w)
    print("mid : ")
    print(mid)
    # i表示input gate中未激活的值
    i = mid[:, 0:4]
    # j表示未激活的输入
    j = mid[:, 4:8]
    # f表示未激活的forget gate
    f = mid[:, 8:12]
    # o表示未激活的output gata
    o = mid[:, 12:16]
    # 新的cell state
    c = c * sig(f) + sig(i) * tanh(j)
    h = sig(o) * tanh(c)
    print(str(t+1) + '时刻的h : ')
    print(h)


输出：

data : 
[[[0.1 0.2 0.3]
  [0.4 0.5 0.6]
  [0.5 0.6 0.4]
  [0.1 0.2 0.1]]

 [[0.3 0.2 0.4]
  [0.2 0.2 0.2]
  [0.  0.  0. ]
  [0.  0.  0. ]]]
LSTM_cell中的参数 ：
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13 0.14
  0.15 0.16]
 [0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
  0.31 0.32]
 [0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41 0.42 0.43 0.44 0.45 0.46
  0.47 0.48]
 [0.49 0.5  0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6  0.61 0.62
  0.63 0.64]
 [0.65 0.66 0.67 0.68 0.69 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
  0.79 0.8 ]
 [0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94
  0.95 0.96]
 [0.97 0.98 0.99 1.   1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
  1.11 1.12]]
LSTM_cell中的初始h: 
LSTM_cell中的初始c: 
mid : 
[[0.134 0.14  0.146 0.152 0.158 0.164 0.17  0.176 0.182 0.188 0.194 0.2
  0.206 0.212 0.218 0.224]
 [0.169 0.178 0.187 0.196 0.205 0.214 0.223 0.232 0.241 0.25  0.259 0.268
  0.277 0.286 0.295 0.304]]
1时刻的h : 
[[0.04597805 0.04794676 0.04993039 0.05192877]
 [0.06209857 0.06523893 0.06840938 0.07160925]]
mid : 
[[0.43150916 0.448467   0.46542484 0.48238268 0.49934052 0.51629836
  0.5332562  0.55021404 0.56717188 0.58412972 0.60108756 0.6180454
  0.63500324 0.65196108 0.66891892 0.68587676]
 [0.29970617 0.30837974 0.3170533  0.32572686 0.33440042 0.34307398
  0.35174754 0.3604211  0.36909466 0.37776823 0.38644179 0.39511535
  0.40378891 0.41246247 0.42113603 0.42980959]]
2时刻的h : 
[[0.20998371 0.21850343 0.22704085 0.23558962]
 [0.1467197  0.15196619 0.15724435 0.16255241]]
mid : 
[[0.89634427 0.92025544 0.94416662 0.96807779 0.99198897 1.01590015
  1.03981132 1.0637225  1.08763367 1.11154485 1.13545603 1.1593672
  1.18327838 1.20718955 1.23110073 1.25501191]
 [0.45571443 0.46189926 0.46808409 0.47426891 0.48045374 0.48663857
  0.49282339 0.49900822 0.50519304 0.51137787 0.5175627  0.52374752
  0.52993235 0.53611718 0.542302   0.54848683]]
3时刻的h : 
[[0.49912617 0.51091546 0.52244355 0.53370864]
 [0.25299429 0.25829698 0.26359548 0.26888789]]
mid : 
[[1.58554353 1.61020546 1.6348674  1.65952934 1.68419128 1.70885322
  1.73351515 1.75817709 1.78283903 1.80750097 1.83216291 1.85682485
  1.88148678 1.90614872 1.93081066 1.9554726 ]
 [0.76619382 0.77663157 0.78706932 0.79750706 0.80794481 0.81838256
  0.8288203  0.83925805 0.8496958  0.86013354 0.87057129 0.88100903
  0.89144678 0.90188453 0.91232227 0.92276002]]
4时刻的h : 
[[0.75492218 0.76107768 0.7670033  0.77270935]
 [0.44528038 0.45158054 0.45781786 0.46399112]]

可以看到tensorflow实现的LSTM_cell前向传播计算出的隐藏状态与 numpy 模拟LSTM实现前向传播所计算出的隐藏状态基本一致（但是 4时刻的隐藏状态差距有点大，不知道为什么）

3. 指定sequence_length时

import tensorflow as tf
import numpy as np
import random
random.seed(10)

# 创建输入数据
# batch=2, 第一个序列中的time_step=4,
# 第二个序列中time_step=2, 不够直接补0.0
data = np.array([[[1, 2, 3],
                  [4, 5, 6],
                  [5, 6, 4],
                  [1, 2, 1]],

                  [[3, 2, 4],
                   [2, 2, 2],
                   [0, 0, 0],
                   [0, 0, 0]]
                 ])/10.0

# 数据占位符
data_placeholder = tf.placeholder(dtype=tf.float32, shape=[2, 4, 3])
# rnn的隐层维数 ： 4
cell_f = tf.nn.rnn_cell.LSTMCell(num_units=4, forget_bias=0.0)
outputs, last_states = tf.nn.dynamic_rnn(cell=cell_f, dtype=tf.float32, inputs=data_placeholder, sequence_length=[4, 2])

def get_rnn_variables_to_restore():
  return [v for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES) if 'lstm_cell' in v.name]
t = get_rnn_variables_to_restore()
print('输出LSTM_cell中的参数信息:')
print(t[0])
print(t[1])
# 由于我们看不到LSTM参数的初始化,所以不能指定参数的初始化方式
# 但是可以通过上边的方式找到LSTM的参数,从而用tf.assign()初始化
# 用tf.assign将parameter参数赋给LSTM中的参数
parameter = [i + 1 for i in range(112)]
parameter = np.array(parameter)
parameter = np.reshape(parameter, (7, 16)) / 100.0
print('parameter: ')
print(parameter)
parameter_placeholder = tf.placeholder(dtype=tf.float32, shape=[7, 16])

assign = tf.assign(t[0], parameter_placeholder)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run(assign, feed_dict={parameter_placeholder: parameter}))
    print('各个时间步的隐层状态：')
    print(sess.run(outputs, feed_dict={data_placeholder: data}))

output:
输出LSTM_cell中的参数信息:
<tf.Variable 'rnn/lstm_cell/kernel:0' shape=(7, 16) dtype=float32_ref>
<tf.Variable 'rnn/lstm_cell/bias:0' shape=(16,) dtype=float32_ref>
parameter: 
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13 0.14
  0.15 0.16]
 [0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
  0.31 0.32]
 [0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41 0.42 0.43 0.44 0.45 0.46
  0.47 0.48]
 [0.49 0.5  0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6  0.61 0.62
  0.63 0.64]
 [0.65 0.66 0.67 0.68 0.69 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
  0.79 0.8 ]
 [0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94
  0.95 0.96]
 [0.97 0.98 0.99 1.   1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
  1.11 1.12]]
2018-10-19 16:30:13.588079: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
[[0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1  0.11 0.12 0.13 0.14
  0.15 0.16]
 [0.17 0.18 0.19 0.2  0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.3
  0.31 0.32]
 [0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.4  0.41 0.42 0.43 0.44 0.45 0.46
  0.47 0.48]
 [0.49 0.5  0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6  0.61 0.62
  0.63 0.64]
 [0.65 0.66 0.67 0.68 0.69 0.7  0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78
  0.79 0.8 ]
 [0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.9  0.91 0.92 0.93 0.94
  0.95 0.96]
 [0.97 0.98 0.99 1.   1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.1
  1.11 1.12]]
各个时间步的隐层状态：
[[[0.04597805 0.04794675 0.04993038 0.05192876]
  [0.21005629 0.2185854  0.22713308 0.23569286]
  [0.5031241  0.51528114 0.5271909  0.5388504 ]
  [0.77954775 0.78653383 0.7932657  0.79975235]]

 [[0.06209857 0.06523891 0.06840938 0.07160922]
  [0.14686525 0.15213361 0.1574358  0.1627702 ]
  [0.         0.         0.         0.        ]
  [0.         0.         0.         0.        ]]]

可以看到：

第二个序列的第3 与谛 4时间步的隐藏状态已经为0

4. 动态双向rnn

待续

tf.nn.bidirectional_dynamic_rnn 使用方法及其验证：

1.使用方法

2.未指定sequence_length

2.1 tensorflow LSTM_cell原理

2.2 tensorflow中LSTM的前向传播

2.3 用numpy来验证tensorflow中 tf.nn.rnn_cell.LSTMCell的前向传播

3. 指定sequence_length时

4. 动态双向rnn

猜你喜欢