Caffe源代码学习 — AlexNet(Caffenet.py)

    <!-- 作者区域 -->
    <div class="author">
      <a class="avatar" href="/u/97312ee25441">
        <img src="//upload.jianshu.io/users/upload_avatars/10028058/b271e8e1-d3cf-432a-8f02-ef907c06a1a1?imageMogr2/auto-orient/strip|imageView2/1/w/96/h/96" alt="96">

江南竹影

关注

2018.03.08 15:47*
字数 1049
阅读 346 评论 0 喜欢 0

    <!-- 文章内容 -->
    <div data-note-content="" class="show-content">
      <div class="show-content-free">
        <p>欢迎访问我的个人Blog： <a href="https://link.jianshu.com?t=http%3A%2F%2Fzengzeyu.com%2F" target="_blank" rel="nofollow">zengzeyu.com</a></p>

导言

源码位置：caffe/examples/pycaffe/caffenet.py
该文件源代码是经典模型AlexNet的Caffe实现，有兴趣的小伙伴去拜读一下论文: ImageNet Classification with Deep Convolutional Neural Networks.

源码解读

1. 导入模块

from __future__ import print_function
from caffe import layers as L, params as P, to_proto
from caffe.proto import caffe_pb2

2. 定义Layer函数

包括： 卷积层（Convolution Layer）、全连接层（Full Connected Layer）和池化层（Pooling Layer）

2.1 卷积层（Convolution Layer）函数

def conv_relu(bottom, ks, nout, stride=1, pad=0, group=1):
    conv = L.Convolution(bottom, kernel_size=ks, stride=stride,
                                num_output=nout, pad=pad, group=group)
    return conv, L.ReLU(conv, in_place=True)

1. 函数输入

bottom - 输入节点（blob）名
ks - 卷积核尺寸（kernel size）
nout - 输出深度尺寸（number output）
stride - 卷积核滑窗距离
pad - 图像边缘添加尺寸，即在图像周围一周添加尺寸为pad的空白像素
group - 将数据进行分开训练堆数目

2. 调用Caffe卷基层生成函数

conv = L.Convolution(bottom, kernel_size=ks, stride=stride,num_output=nout, pad=pad, group=group)

3. 返回参数

conv - 卷积层配置
L.ReLU(conv, in_place=True) - 卷积后的数据经过ReLU激活函数得到的数据

2.2 全连接层（Full Connected Layer）

def fc_relu(bottom, nout):
    fc = L.InnerProduct(bottom, num_output=nout)
    return fc, L.ReLU(fc, in_place=True)

1. 调用Caffe内积函数

fc = L.InnerProduct(bottom, num_output=nout)

2. 返回参数

fc, L.ReLU(fc, in_place=True) - 全连接分类之后数据通过ReLU函数

2.3 池化层（Pooling Layer）

def max_pool(bottom, ks, stride=1):
    return L.Pooling(bottom, pool=P.Pooling.MAX, kernel_size=ks, stride=stride)

调用Caffe池化层生成函数

L.Pooling)（）
pool=P.Pooling.MAX - 池化类型选择MAX，即取模板内最大值输出

3. 定义网络结构

data, label = L.Data(source=lmdb, backend=P.Data.LMDB, batch_size=batch_size, ntop=2,
        transform_param=dict(crop_size=227, mean_value=[104, 117, 123], mirror=True))
     <span class="hljs-comment"># the net itself</span>
conv1, relu1 = conv_relu(data, <span class="hljs-number">11</span>, <span class="hljs-number">96</span>, stride=<span class="hljs-number">4</span>)
pool1 = max_pool(relu1, <span class="hljs-number">3</span>, stride=<span class="hljs-number">2</span>)
norm1 = L.LRN(pool1, local_size=<span class="hljs-number">5</span>, alpha=<span class="hljs-number">1e-4</span>, beta=<span class="hljs-number">0.75</span>)
conv2, relu2 = conv_relu(norm1, <span class="hljs-number">5</span>, <span class="hljs-number">256</span>, pad=<span class="hljs-number">2</span>, group=<span class="hljs-number">2</span>)
pool2 = max_pool(relu2, <span class="hljs-number">3</span>, stride=<span class="hljs-number">2</span>)
norm2 = L.LRN(pool2, local_size=<span class="hljs-number">5</span>, alpha=<span class="hljs-number">1e-4</span>, beta=<span class="hljs-number">0.75</span>)
conv3, relu3 = conv_relu(norm2, <span class="hljs-number">3</span>, <span class="hljs-number">384</span>, pad=<span class="hljs-number">1</span>)
conv4, relu4 = conv_relu(relu3, <span class="hljs-number">3</span>, <span class="hljs-number">384</span>, pad=<span class="hljs-number">1</span>, group=<span class="hljs-number">2</span>)
conv5, relu5 = conv_relu(relu4, <span class="hljs-number">3</span>, <span class="hljs-number">256</span>, pad=<span class="hljs-number">1</span>, group=<span class="hljs-number">2</span>)
pool5 = max_pool(relu5, <span class="hljs-number">3</span>, stride=<span class="hljs-number">2</span>)
fc6, relu6 = fc_relu(pool5, <span class="hljs-number">4096</span>)
drop6 = L.Dropout(relu6, in_place=<span class="hljs-keyword">True</span>)
fc7, relu7 = fc_relu(drop6, <span class="hljs-number">4096</span>)
drop7 = L.Dropout(relu7, in_place=<span class="hljs-keyword">True</span>)
fc8 = L.InnerProduct(drop7, num_output=<span class="hljs-number">1000</span>)
loss = L.SoftmaxWithLoss(fc8, label)

<span class="hljs-keyword">if</span> include_acc:
    acc = L.Accuracy(fc8, label)
    <span class="hljs-keyword">return</span> to_proto(loss, acc)
<span class="hljs-keyword">else</span>:
    <span class="hljs-keyword">return</span> to_proto(loss)

1. 函数输入

lmdb - 文件名
batch_size - 每次训练输入样本数目
include_acc - 加速？

2. 调用Caffe数据层输入函数（Data）
L.Data(source=lmdb, backend=P.Data.LMDB, batch_size=batch_size, ntop=2, transform_param=dict(crop_size=227, mean_value=[104, 117, 123], mirror=True))

backend - 数据类型
ntop - 输出blob数目，因为数据层处理数据输出data和label，所以值为 2
transform_param - 对单个图片处理： crop_size图片剪裁大小，mean_valueRGB图像需要减去的值（目的是更好突出特征）和mirror镜像处理。

Layer	Operation	Output
Data	crop_size:227, mean_value: [104, 117, 123], mirror: true	data: 227x227x3; label: 227x227x1
1	conv1 -> relu1 -> pool1 -> norm1	27x27x96
2	conv2 -> relu2 -> pool2 -> norm2	13x13x256
3	conv3 -> relu3	11x11x384
4	conv4 -> relu4	11x11x384
5	conv5 -> relu5 -> pool5	6x6x256
6	fc6 -> relu6 -> drop6	4096
7	fc7 -> relu7 -> drop7	4096
8	fc8 -> loss	1000

3. 网络结构
此博客绘制了AlexNet网络结构图和数据流动图，方便直观理解网络结构，可移步：深度学习之图像分类模型AlexNet解读
第1-5层为卷积层，如下表所示：

Layer	Operation	Output
Data	crop_size:227, mean_value: [104, 117, 123], mirror: true	data: 227x227x3; label: 227x227x1
1	conv1 -> relu1 -> pool1 -> norm1	27x27x96
2	conv2 -> relu2 -> pool2 -> norm2	13x13x256
3	conv3 -> relu3	11x11x384
4	conv4 -> relu4	11x11x384
5	conv5 -> relu5 -> pool5	6x6x256
6	fc6 -> relu6 -> drop6	4096
7	fc7 -> relu7 -> drop7	4096
8	fc8 -> loss	1000

以第1层代码为例进行分析:

第1层 = 卷积层（conv1+relu1） + 池化层（pool1） + 归一化（norm1）

（1）. 第1层 - 卷积层（conv1+relu1）
作用：提取局部特征，使用ReLU作为CNN的激活函数，并验证其效果在较深的网络超过了Sigmoid，成功解决了Sigmoid在网络较深时的梯度弥散问题。
conv1, relu1 = conv_relu(data, 11, 96, stride=4)

数据：数据层输出data数据
卷积核大小： 11
输出节点深度： 96
滑窗距离： 4

（2）. 第1层 - 池化层（pool1）
作用：提取最大值，避免平均池化的模糊化效果。在AlexNet中提出让步长比池化核的尺寸小，这样池化层的输出之间会有重叠和覆盖，提升了特征的丰富性。
pool1 = max_pool(relu1, 3, stride=2)

数据： relu1
模板核大小： 3
滑窗距离： 2

（3）. 第1层 - 局部响应归一化（Local Response Normalize）（norm1）
作用：对局部神经元的活动创建竞争机制，使得其中响应比较大的值变得相对更大，并抑制其他反馈较小的神经元，增强了模型的泛化能力
norm1 = L.LRN(pool1, local_size=5, alpha=1e-4, beta=0.75)

数据： pool1
取值模板尺寸： 5
alpha: 0.0001
beta: 0.75

4. 输出网络结构文件（.prototxt）

def make_net():
    with open('train.prototxt', 'w') as f:
        print(caffenet('/path/to/caffe-train-lmdb'), file=f)
<span class="hljs-keyword">with</span> open(<span class="hljs-string">'test.prototxt'</span>, <span class="hljs-string">'w'</span>) <span class="hljs-keyword">as</span> f:
    print(caffenet(<span class="hljs-string">'/path/to/caffe-val-lmdb'</span>, batch_size=<span class="hljs-number">50</span>, include_acc=<span class="hljs-keyword">True</span>), file=f)

5. 运行

if __name__ == '__main__':
    make_net()

总结

Caffene.py是入门Caffe较好的源代码，结合原论文看，同时能加深对网络结构的理解，补充理论知识。下面根据这个example形式构建自己的网络结构，其中第一步，也是学习深度学习最重要的一步，编写自己的数据类型接口层程序。

以上。

附：

      </div>
    </div>
</div>