keras版本(.h5文件)转化为caffe版本(caffemodel和prototxt文件)的总结和思考

版本转化核心

keras训练的模型转化为caffe模型的核心思想是拆分keras的每层，之后赋值给caffe版本对应层，这其中保证以下：

每层的名称的唯一性；
每层输入名称和输出名称上下连贯性；
每层数据的格式符合caffe模型；

keras版本加载

import keras
file = '../xxx.h5'
keras_model = keras.models.load_model(file, custom_objects=backbone.custom_objects)

keras模型每层的参数和名称的提取

for layer in keras_model.layers:
	# 获取该层的算子类型
	layer_type = type(layer)
	# 该层的名称
	name = layer.name
	# 该层的参数
	config = layer.get_config()
	# 获取该层的权重和偏置
	blobs = layer.get_weights()
	blobs_num = len(blobs)
	# 获取该层的输入名称
	bottom = layer.input.name
	# 获取该层的输出名称
	top = layer.output.name

caffe网络框架的创建

import caffe
# NetSpec 是包含Tops（可以直接赋值作为属性）的集合。
# 调用 NetSpec.to_proto 创建包含所有层(layers)的网络参数，这些层(layers)需要被赋值，
# 并使用被赋值的名字。
# 就是利用NetSpec可以构建caffe模型的网络框架，再利用.to_proto()就可以生成网络框架文件.prototxt。
# 之后在(netscope)(http://ethereon.github.io/netscope/#/editor)在线网站就可以可视化网络了。
caffe_net = caffe.NetSpec()

caffe每层的参数和名称的赋予

算子的参数可参考网址：官方文档，可以查询不同算子的参数。
这里列举常用的算子：

outputs = dict()
if layer_type = keras.layer.InputLayer:
	# 获取输入的形状
	input_shape = config['batch_input_shape']
	# 将(N,h,w,c)更改成(N,c,h,w)
    input_shape = [1, input_shape[3], input_shape[1], input_shape[2]]
    # 将输入算子添加到caffe网络中，其方式类似于字典，其中name为该层的名称，作为keys，算子作为values。
    caffe_net[name] = L.Input(shape=[dict(dim=input_shape)])
    outputs[top] = name

对于Conv2D这个算子，需要考虑padding的方式。因为keras和caffe表示方法不同。
keras有两种’same’和’valid’：
“same”：
对于常用的参数如下

kernel_size = 3, strides = 2, 对于这样的参数，则如果输入为奇数，会出现pad只能进行单边补齐，在tesnorflow中，一般是右下，但是在caffe中，没有对应的pad单边参数，所以设计参数时尽量避免这种设计。否者转换比较困难。可以将kernel_size=2, strides=2，这样可以满足条件，但是感受野小了些。
kernel_size = 3/2, strides = 1, 则pad=1。
‘valid’：
直接pad=0, 即可。
caffe中表示padding的方式是直接赋值，参考上述网址，可知Conv2D层如下：

  layer {
    
    
    name: "conv1"
    type: "Convolution"
    bottom: "data"
    top: "conv1"
    # learning rate and decay multipliers for the filters
    param {
    
     lr_mult: 1 decay_mult: 1 }
    # learning rate and decay multipliers for the biases
    param {
    
     lr_mult: 2 decay_mult: 0 }
    convolution_param {
    
    
      num_output: 96     # learn 96 filters
      kernel_size: 11    # each filter is 11x11
      stride: 4          # step 4 pixels between each filter application
      weight_filler {
    
    
        type: "gaussian" # initialize the filters from a Gaussian
        std: 0.01        # distribution with stdev 0.01 (default mean: 0)
      }
      bias_filler {
    
    
        type: "constant" # initialize the biases to zero (0)
        value: 0
      }
    }
  }
Parameters (ConvolutionParameter convolution_param)
Required
num_output (c_o): the number of filters
kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Strongly Recommended
weight_filler [default type: 'constant' value: 0]
Optional
bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
pad (or pad_h and pad_w) [default 0]: specifies the number of pixels to (implicitly) add to each side of the input
stride (or stride_h and stride_w) [default 1]: specifies the intervals at which to apply the filters to the input
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the ith output group channels will be only connected to the ith input group channels.

我们重点关注参数

if layer_type = keras.layers.Conv2D:

prototxt文件的生成

with open(caffe_net_file, 'w') as fpb:
    print(caffe_net.to_proto(), file=fpb)

caffe参数的生成

# 构建字典，保存参数
blobs[0] = np.array(blobs[0]).transpose(3, 2, 0, 1)
net_params[name] = blobs
# 生成参数文件
caffe_model = caffe.Net(caffe_net_file, caffe.TEST)
for layer in caffe_model.params.keys():
    for n in range(0, len(caffe_model.params[layer])):
        caffe_model.params[layer][n].data[...] = net_params[layer][n]
caffe_model.save(caffe_params_file)

caffe可转换算子

- InputLayer
- Slice
- Dense
- Dropout
- ZeroPadding2D
- Multiply、Concatenate、Maximum、Add
- Conv2D、Conv2DTranspose
- BatchNormalization
- MaxPooling2D、AveragePooling2D、GlobalAveragePooling2D
- relu、prelu、elu、softmax、sigmoid、tanh

keras和caffe在算子方面的异同

对于keras.layer.Lambda来拆分输入，可以使用caffe的slice替代。

网络重用，如何定义caffe自网络

如果网络添加了子网络，单独提取子网络。这里利用到keras中layer的属性。记住关键就是识别子网络的名称构成，子网络名称/layer名称/算子。之后再将每层名称和参数添加到对应的字典里。就可以了。

submodel_names = ['a'] + [submodel_name] + [submodel_name + '_'+ str(i) for i in range(1,5)]
            for i, node in enumerate(submodel._inbound_nodes):  # get inbound nodes to current layer
                input_layers = node.inbound_layers  # get layers pointing to this node
                if input_layers:
                    if type(layer._inbound_nodes[0].inbound_layers[0]).__name__ == "InputLayer":
                        bottom = input_layers[0].output.name
                        top_cp = submodel_names[i] + '/' + top
                        name_cp = submodel_names[i] + '/' + name
                        caffe_net[name_cp] = L.Convolution(caffe_net[outputs[bottom]], **kwargs)

                    else:
                        bottom_cp = submodel_names[i] + '/' + bottom
                        name_cp = submodel_names[i] + '/' + name
                        top_cp = submodel_names[i] + '/' + top
                        caffe_net[name_cp] = L.Convolution(caffe_net[outputs[bottom_cp]], **kwargs)

                    blobs[0] = np.array(blobs[0]).transpose(3, 2, 0, 1)
                    net_params[name_cp] = blobs
                    outputs[top_cp] = name_cp