1. Introduction

今天是尝试用 PyTorch 框架来跑 MNIST 手写数字数据集的第二天，主要学习加载 MNIST 数据集。本 blog 主要记录一个学习的路径以及学习资料的汇总。

注意：这是用 Python 2.7 版本写的代码

第一天（LeNet 网络的搭建）：https://blog.csdn.net/qq_36627158/article/details/108245969

第二天（训练网络）：https://blog.csdn.net/qq_36627158/article/details/108315239

第三天（测试网络）：https://blog.csdn.net/qq_36627158/article/details/108321673

第四天（单例测试）：https://blog.csdn.net/qq_36627158/article/details/108397018

2. Code（mnist_lenet.py）

感谢凯神提供的代码与耐心指导！

import tensorflow as tf


def build_model_and_forward(input_tensor):
    # C1: Input = 32x32x1    Output = 28x28x6
    C1_weights = tf.get_variable(
        name='C1_weights',
        shape=[5, 5, 1, 6],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    C1_biases = tf.get_variable(
        name='C1_biases',
        shape=[6],
        initializer=tf.constant_initializer(0.1)
    )
    C1 = tf.nn.conv2d(
        input=input_tensor,
        filter=C1_weights,
        strides=[1, 1, 1, 1],
        padding='SAME'
    )
    C1_output = tf.nn.relu(
        tf.nn.bias_add(C1, C1_biases)
    )



    # S2: Input = 28x28x6    Output = 14x14x6
    S2_output = tf.nn.max_pool(
        value=C1_output,
        ksize=[1, 2, 2, 1],
        strides=[1, 2, 2, 1],
        padding='SAME'
    )



    # C3: Input = 14x14x6    Output = 10x10x16
    C3_weights = tf.get_variable(
        name='C3_weights',
        shape=[5, 5, 6, 16],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    C3_biases = tf.get_variable(
        name='C3_biases',
        shape=[16],
        initializer=tf.constant_initializer(0.1)
    )
    C3 = tf.nn.conv2d(
        input=S2_output,
        filter=C3_weights,
        strides=[1, 1, 1, 1],
        padding='SAME'
    )
    C3_output = tf.nn.relu(
        tf.nn.bias_add(C3, C3_biases)
    )



    # S4: Input = 10x10x16   Output = 5x5x16
    S4_output = tf.nn.max_pool(
        value=C3_output,
        ksize=[1, 2, 2, 1],
        strides=[1, 2, 2, 1],
        padding='SAME'
    )

    # flatten the output (C5_input_nodes_num: 16*5*5=400)
    S4_output_shape = S4_output.get_shape().as_list()
    C5_input_nodes_num = S4_output_shape[1] * S4_output_shape[2] * S4_output_shape[3]
    C5_input_nodes_reshaped = tf.reshape(
        tensor=S4_output,
        shape=[S4_output_shape[0], C5_input_nodes_num]
    )



    # C5: Input = 16*5*5 = 400   Output = 120
    C5_weights = tf.get_variable(
        name='C5_weights',
        shape=[C5_input_nodes_num, 120],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    C5_biases = tf.get_variable(
        name='C5_biases',
        shape=[120],
        initializer=tf.constant_initializer(0.1)
    )
    C5_output = tf.nn.relu(
        tf.matmul(C5_input_nodes_reshaped, C5_weights) + C5_biases
    )



    # F6: Input = 120   Output = 84
    F6_weights = tf.get_variable(
        name='F6_weights',
        shape=[120, 84],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    F6_biases = tf.get_variable(
        name='F6_biases',
        shape=[84],
        initializer=tf.constant_initializer(0.1)
    )
    F6_output = tf.nn.relu(
        tf.matmul(C5_output, F6_weights) + F6_biases
    )



    # F7: Input = 84   Output = 10
    F7_weights = tf.get_variable(
        name='F7_weights',
        shape=[84, 10],
        initializer=tf.truncated_normal_initializer(stddev=0.1)
    )
    F7_biases = tf.get_variable(
        name='F7_biases',
        shape=[10],
        initializer=tf.constant_initializer(0.1)
    )
    F7_output = tf.nn.relu(
        tf.matmul(F6_output, F7_weights) + F7_biases
    )



    return F7_output

3. Materials

1、卷积层 tf_nn_conv2d() 文档

https://www.w3cschool.cn/tensorflow_python/tf_nn_conv2d.html

2、卷积层 tf.nn.max_pool() 文档

https://www.w3cschool.cn/tensorflow_python/tf_nn_max_pool.html

4、Code Details

1、tf_nn_conv2d() 中 padding 参数的值

因为 LeNet-5 网络结构输入尺寸是 32*32，这里没体现出来，所以一开始没弄明白如何得到的 28*28 的输出。

后来知道 padding 设置为 same 后，output_height 和 output_width 分别等于 ceil(input_height / stride_height) 和 ceil(input_width / stride_width)

https://blog.csdn.net/zuoyouzouzou/article/details/101871895

2、tf_nn_conv2d() 中 strides 参数的值

https://blog.csdn.net/wangpengfei163/article/details/80987145

3、tf.truncated_normal_initializer()

从截断的正态分布中输出随机值，如果生成的值大于平均值2个标准偏差的值则丢弃重新选择

https://www.cnblogs.com/qianchaomoon/p/12286410.html

4、tf.nn.bias_add()

像第一层卷积操作得到的结果是 6 个通道尺寸大小为 28*28 的图片，那么加上大小为 6 的 biases，就相当于第一个通道中的每一个像素（共 28*28 个）都加上 biases[0] 的值；第二个通道中的每一个像素（共 28*28 个）都加上 biases[1] 的值；第三个通道中的每一个像素（共 28*28 个）都加上 biases[2] 的值；以此类推......

bias：一个 1-D Tensor，其大小与value的最后一个维度匹配；

https://www.cnblogs.com/smallredness/p/11197139.html

（系列更新完毕）深度学习零基础使用 TensorFlow 框架跑 MNIST 数据集的第一天：定义 LeNet 网络结构