Caffe下卷积神经网络（CNN）中的一些特殊层(Batch Normalization)

Original url：

https://blog.csdn.net/xg123321123/article/details/52610919

Batch Normalization

意义：网络训练时，用来加速收敛速度
提醒：
- 已经将BN集成为一个layer了，使用时需要和scale层一起使用
- 训练的时候，将BN层的use_global_stats设置为false；测试的时候将use_global_stats设置为true，不然训练的时候会报“NAN”或者模型不收敛 – 师兄的经验，我还没试验过
用法：详见残差神经网络的使用

意义：防止模型过拟合；训练模型时，随机让网络某些隐含层节点的权重不工作（不工作的那些节点可以暂时认为不是网络结构的一部分，但是它的权重得保留下来，只是暂时不更新而已，因为下次样本输入时它可能又得工作了）
用法：

layer {
name: “drop7”
type: “Dropout”
bottom: “fc7-conv”
top: “fc7-conv”
dropout_param {
dropout_ratio: 0.5
}
}

意义：激活函数的一种；对于给定的一个输入值x，如果x > 0，ReLU层的输出为x，如果x < 0，ReLU层的输出为0。
提醒：可选参数negative_slope，此参数使得x < 0时，ReLU层的输出为negative_slope * x；目前已经有了ReLU的进化版 – PReLU
用法：

layer {
name: “relu1”
type: “ReLU”
bottom: “conv1”
top: “conv1”
relu_param{
negative_slope: [默认：0]
}
}

意义：将输入数据以简单的向量形式进行处理，并且输出一个简单的向量；简单来说，这是一个卷积操作，只不过卷积核尺寸和feature map相同，故输出向量大小为1*1
缺点：使用包含全连接层的模型(如AlexNet)必须使用固定大小的输入，有时这是非常不合理的，因为必须对输入图片进行变形。
提醒：
- 必要参数：
  num_output (c_o)：滤波器数量
- 强烈建议参数：
  weight_filler：滤波器的初始分布和分布参数。
- 可选参数：
  bias_filler：[默认： type: ‘constant’ value: 0]
  bias_term：[默认：true] 指定是否在滤波器输出之后学习并应用附加的偏置。
用法：

layer {
name: “fc8”
type: “InnerProduct”
bottom: “fc7”
top: “fc8”

param { # learning rate and decay multipliers for the weights
lr_mult: 1 decay_mult: 1
}

param { # learning rate and decay multipliers for the biases
lr_mult: 2 decay_mult: 0
}

inner_product_param {
num_output: 1000

weight_filler {
type: “xavier”
std: 0.01
}

bias_filler {
type: “constant”
value: 0
}
}
}
注：比如上面层的输入为 n * c_i * h_i * w_i，那么输入为 n * 1000 * 1 * 1

意义：输入两个blob，将bottom[0] 按照bottom[1]的尺寸进行剪裁
提醒:
- axis=0,1,2,3分别表示为N,C,H,W；默认axis等于2，即默认从H开始裁剪（裁剪H和W）;可以只设置1个，也可以为每个dimension分别设置
- offset表示裁剪时的偏移量（如果还是不太清楚的话，戳这儿
用法：

layer {
type: “Crop”
name: ‘crop’
bottom: ‘score-dsn1-up’
bottom: ‘data’
top: ‘upscore-dsn1’
crop_param {
axis: 2
offset: 5
}
}