写在前面

为什么重启三年之前的系列

在2016年9月，笔者开始读研之初，曾经对caffe源码进行了一系列解析。随着项目进展与科研的深入，使用的深度学习框架也发生了变化。就如大家所见，TensorFlow和PyTorch等安装简洁，使用灵活，不需要构造反传代码的深度学习框架越来越多地走进了深度学习研究者的视野，为大家提供了很多便捷。不过同时，也造就了很多调包侠，调包侠们既编程能力有限，也不明了底层原理，仅仅知道调Python包解决问题，脱离深度学习框架之后无法解决深层次的深度学习问题。因此，在3年后，笔者重重启caffe源码解析，主要目的是搞清楚深度学习底层实现相关代码，并与大家分享。

为什么是caffe

由于caffe框架主要由C++语言驱动，源码可以非常方便地展示，也比较有条理。虽然caffe是一个有些过气的深度学习框架，但也不妨碍我们对其进行学习与领悟，再通过举一反三地思考，明白深度学习在底层是如何实现的。

该系列之后的写作计划与打算

在caffe源码深入学习系列中，笔者打算向大家放送更多的干货，这些干货这要体现在：

caffe框架中Layer的反传代码的写作。
caffe框架中典型的模块的相关解析，比如数据层的构造。
caffe框架中net（网络）的组织形式。
caffe中solver（优化器）的组织形式。
上述四个系列与2016-2017年笔者撰写的caffe源码深入学习1-6以及部分其他博客结合起来，就能够成为相对完整的caffe源码解析合集。笔者将通过若干篇博客，进行解析。下面，就开始重启caffe源码解析系列后的第一篇博文。ReLU层源码解析。

ReLU层源码解析

在进行caffe框架的反传代码解析时，由于caffe自带的层非常多，因此，笔者打算从比较简单的层开始进行讲解。在选择层的时候，首先选择了ReLU层，因为ReLU激活函数在许多场合都被使用。ReLU函数的原理可参见百度百科中的ReLU函数。

caffe中的梯度

caffe中的梯度数组存储在Blob中的diff_指针指向的内存中，在训练深度神经网络时，进行层的反传时会被大量使用。梯度数组的形状与数据数组（data_）一致。让我们先来看看在layer.hpp中，Backward函数的定义：

  /**
   * @brief Given the top blob error gradients, compute the bottom blob error
   *        gradients.
   *
   * @param top
   *     the output blobs, whose diff fields store the gradient of the error
   *     with respect to themselves 在top blob中存储了误差对层输出数据的梯度
   * @param propagate_down
   *     a vector with equal length to bottom, with each index indicating
   *     whether to propagate the error gradients down to the bottom blob at
   *     the corresponding index
   * @param bottom
   *     the input blobs, whose diff fields will store the gradient of the error
   *     with respect to themselves after Backward is run 在bottom blob中存储了误差对层输入数据的梯度
   *
   * The Backward wrapper calls the relevant device wrapper function
   * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the
   * top blob diffs.
   *
   * Your layer should implement Backward_cpu and (optionally) Backward_gpu.
   */
  inline void Backward(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down,
      const vector<Blob<Dtype>*>& bottom);

如上代码所示，可见在层的输入的bottom blob中，diff_数组存储了网络误差相对于层输入数据的梯度，该梯度可由层输出的top blob中的梯度，经Backward函数反传得到。

然后，在一个blob中，梯度与数据的维度是相同的，比如看看blob.cpp中的reshape函数：

template <typename Dtype>
void Blob<Dtype>::Reshape(const vector<int>& shape) {
    
    
  CHECK_LE(shape.size(), kMaxBlobAxes);
  count_ = 1;
  shape_.resize(shape.size());
  if (!shape_data_ || shape_data_->size() < shape.size() * sizeof(int)) {
    
    
    shape_data_.reset(new SyncedMemory(shape.size() * sizeof(int)));
  }
  int* shape_data = static_cast<int*>(shape_data_->mutable_cpu_data());
  for (int i = 0; i < shape.size(); ++i) {
    
    
    CHECK_GE(shape[i], 0);
    if (count_ != 0) {
    
    
      CHECK_LE(shape[i], INT_MAX / count_) << "blob size exceeds INT_MAX";
    }
    count_ *= shape[i];
    shape_[i] = shape[i];
    shape_data[i] = shape[i];
  }
  if (count_ > capacity_) {
    
    
    capacity_ = count_;
    data_.reset(new SyncedMemory(capacity_ * sizeof(Dtype)));
    diff_.reset(new SyncedMemory(capacity_ * sizeof(Dtype)));
  }
}

在代码倒数后两行，data_和diff_都被指向相同大小空间的指针初始化。下面，本篇博文放出经过注释的caffe中ReLU层的源码。

ReLU层源码

首先，在caffe.proto中，ReLU层的设置参数如下：

// Message that stores parameters used by ReLULayer
message ReLUParameter {
    
    
  // Allow non-zero slope for negative inputs to speed up optimization
  // Described in:
  // Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities
  // improve neural network acoustic models. In ICML Workshop on Deep Learning
  // for Audio, Speech, and Language Processing.
  optional float negative_slope = 1 [default = 0];
  enum Engine {
    
    
    DEFAULT = 0;
    CAFFE = 1;
    CUDNN = 2;
  }
  optional Engine engine = 2 [default = DEFAULT];
}

其中有一个negative_slop参数，我们在源码中看看该参数有何作用。
首先是relu_layer.hpp的源码：

#ifndef CAFFE_RELU_LAYER_HPP_
#define CAFFE_RELU_LAYER_HPP_

#include <vector>

#include "caffe/blob.hpp"
#include "caffe/layer.hpp"
#include "caffe/proto/caffe.pb.h"

#include "caffe/layers/neuron_layer.hpp"

namespace caffe {
    
    

/**
 * @brief Rectified Linear Unit non-linearity @f$ y = \max(0, x) @f$.
 *        The simple max is fast to compute, and the function does not saturate.
 */
template <typename Dtype>
class ReLULayer : public NeuronLayer<Dtype> {
    
    
 public:
  /**
   * @param param provides ReLUParameter relu_param,
   *     with ReLULayer options:
   *   - negative_slope (\b optional, default 0).
   *     the value @f$ \nu @f$ by which negative values are multiplied.
   */
  explicit ReLULayer(const LayerParameter& param)
      : NeuronLayer<Dtype>(param) {
    
    }

  virtual inline const char* type() const {
    
     return "ReLU"; }

 protected:
  /**
   * @param bottom input Blob vector (length 1)
   *   -# @f$ (N \times C \times H \times W) @f$
   *      the inputs @f$ x @f$
   * @param top output Blob vector (length 1)
   *   -# @f$ (N \times C \times H \times W) @f$
   *      the computed outputs @f$
   *        y = \max(0, x)
   *      @f$ by default.  If a non-zero negative_slope @f$ \nu @f$ is provided,
   *      the computed outputs are @f$ y = \max(0, x) + \nu \min(0, x) @f$.
   */
  virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top); //cpu前传
  virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom,
      const vector<Blob<Dtype>*>& top); //cpu反传

  /**
   * @brief Computes the error gradient w.r.t. the ReLU inputs.
   *
   * @param top output Blob vector (length 1), providing the error gradient with
   *      respect to the outputs
   *   -# @f$ (N \times C \times H \times W) @f$
   *      containing error gradients @f$ \frac{\partial E}{\partial y} @f$
   *      with respect to computed outputs @f$ y @f$
   * @param propagate_down see Layer::Backward.
   * @param bottom input Blob vector (length 1)
   *   -# @f$ (N \times C \times H \times W) @f$
   *      the inputs @f$ x @f$; Backward fills their diff with
   *      gradients @f$
   *        \frac{\partial E}{\partial x} = \left\{
   *        \begin{array}{lr}
   *            0 & \mathrm{if} \; x \le 0 \\
   *            \frac{\partial E}{\partial y} & \mathrm{if} \; x > 0
   *        \end{array} \right.
   *      @f$ if propagate_down[0], by default.
   *      If a non-zero negative_slope @f$ \nu @f$ is provided,
   *      the computed gradients are @f$
   *        \frac{\partial E}{\partial x} = \left\{
   *        \begin{array}{lr}
   *            \nu \frac{\partial E}{\partial y} & \mathrm{if} \; x \le 0 \\
   *            \frac{\partial E}{\partial y} & \mathrm{if} \; x > 0
   *        \end{array} \right.
   *      @f$.
   */
  virtual void Backward_cpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom); //gpu前传
  virtual void Backward_gpu(const vector<Blob<Dtype>*>& top,
      const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom); //gpu反传
};

}  // namespace caffe

#endif  // CAFFE_RELU_LAYER_HPP_

然后是relu_layer.cpp的源码：

#include <algorithm>
#include <vector>

#include "caffe/layers/relu_layer.hpp"

namespace caffe {
    
    

//cpu前传
template <typename Dtype>
void ReLULayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
    const vector<Blob<Dtype>*>& top) {
    
    
  const Dtype* bottom_data = bottom[0]->cpu_data(); //取得数据的cpu_data
  Dtype* top_data = top[0]->mutable_cpu_data(); //top_data指针可以改动relu层输出数据的cpu_data
  const int count = bottom[0]->count(); //count记录一层中数据的总数(n×c×h×w)
  Dtype negative_slope = this->layer_param_.relu_param().negative_slope(); //找到prototxt中定义的negative_slope参数
  for (int i = 0; i < count; ++i) {
    
    
    top_data[i] = std::max(bottom_data[i], Dtype(0))
        + negative_slope * std::min(bottom_data[i], Dtype(0)); //前传，若输入位置大于零，则为输入位置的数；若输入位置小于零，则将输入位置乘以negative_slope，否则输出0.
  }
}

//cpu反传
template <typename Dtype>
void ReLULayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
    const vector<bool>& propagate_down,
    const vector<Blob<Dtype>*>& bottom) {
    
    
  if (propagate_down[0]) {
    
    
    const Dtype* bottom_data = bottom[0]->cpu_data(); //获得输入数据，反转需要根据输入数据来
    const Dtype* top_diff = top[0]->cpu_diff(); //得到顶层的梯度信息
    Dtype* bottom_diff = bottom[0]->mutable_cpu_diff(); //bottom_diff指针可以改动底层的梯度信息
    const int count = bottom[0]->count(); //count记录一层中数据的总数(n×c×h×w)
    Dtype negative_slope = this->layer_param_.relu_param().negative_slope(); //找到prototxt中定义的negative_slope参数
    for (int i = 0; i < count; ++i) {
    
    
      bottom_diff[i] = top_diff[i] * ((bottom_data[i] > 0)
          + negative_slope * (bottom_data[i] <= 0)); //反传梯度信息，若输入位置大于零，对应梯度信息为该位置顶层梯度信息；否则为该位置顶层梯度信息 * negative_slope
    }
  }
}


#ifdef CPU_ONLY
STUB_GPU(ReLULayer);
#endif

INSTANTIATE_CLASS(ReLULayer);

}  // namespace caffe

ReLU层源码解析

在上文的源码中，前传部分可由如下公式表示：
$data\_y= \begin{cases} data\_x \qquad \qquad \qquad \qquad \qquad data\_x > 0 \\ negative\_slope\times data\_x \qquad data\_x \leq 0 \end{cases}$
那么，在进行反传的时候，根据求导法则，梯度可由下述公示表示：
$diff\_x= \begin{cases} 1 \times diff\_y \qquad \qquad \qquad \qquad data\_x > 0 \\ negative\_slope\times diff\_y \qquad data\_x \leq 0 \end{cases}$
在梯度反传的时候，根据求导法则，只需要根据前传时data_x大于还是小于等于0，就可以直接将顶层梯度乘以1或者negative_slope参数即可。大家可以对照公式看看上述Backward_cpu函数源码，梯度反传实现非常科学与规范。

写到这里，关于caffe梯度反传的第一篇博客就接近尾声了。由于ReLU层比较简单，因此本篇博文只是开了一个头。后续会为大家带来更多干货，欢迎大家关注。也诚挚地欢迎各位读者朋友，在博文的评论区指出博文的疏漏与不当之处，笔者在此表示衷心地感谢。

欢迎阅读笔者后续博客，各位读者朋友的支持与鼓励是我最大的动力

written by jiong
我和我的祖国，一刻也不能分割！

重启caffe源码深入学习7：caffe框架深度神经网络反传代码解析（一）之ReLU层源码解析

重启caffe源码深入学习7：caffe框架深度神经网络反传代码解析（一）之ReLU层源码解析

写在前面

为什么重启三年之前的系列

为什么是caffe

该系列之后的写作计划与打算

ReLU层源码解析

caffe中的梯度

ReLU层源码

ReLU层源码解析

猜你喜欢