Keras版Faster-RCNN代码学习(IOU,RPN)1
Keras版Faster-RCNN代码学习(Batch Normalization)2
Keras版Faster-RCNN代码学习(loss,xml解析)3
Keras版Faster-RCNN代码学习(roipooling resnet/vgg)4
Keras版Faster-RCNN代码学习(measure_map,train/test)5
RoiPooling
参考文献:Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
ROIs Pooling简单来说,是Pooling层的一种,而且是针对RoIs的Pooling,他的特点是输入特征图尺寸不固定,但是输出特征图尺寸固定;在faster RCNN中输出的是一个7×7的固定特征图。
ROI pooling的图
输出的shape为(1, num_rois, channels, pool_size, pool_size) (为channel first),图片的batch为1故第0维,TimeDistributed包装器默认维度1为时间维,故num_rois在第1维,再参考关于Keras的“层”(Layer) 和 编写自己的层
RoiPoolingConv.py
from keras.engine.topology import Layer
import keras.backend as K
if K.backend() == 'tensorflow':
import tensorflow as tf
class RoiPoolingConv(Layer):
'''ROI pooling layer for 2D inputs.
See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,
K. He, X. Zhang, S. Ren, J. Sun
# Arguments
pool_size: int
Size of pooling region to use. pool_size = 7 will result in a 7x7 region.
num_rois: number of regions of interest to be used
# Input shape
list of two 4D tensors [X_img,X_roi] with shape:
X_img:
`(1, channels, rows, cols)` if dim_ordering='th'
or 4D tensor with shape:
`(1, rows, cols, channels)` if dim_ordering='tf'.
X_roi:
`(1,num_rois,4)` list of rois, with ordering (x,y,w,h)
# Output shape
3D tensor with shape:
`(1, num_rois, channels, pool_size, pool_size)`
'''
#参数定义
def __init__(self, pool_size, num_rois, **kwargs):
self.dim_ordering = K.image_dim_ordering()
assert self.dim_ordering in {'tf', 'th'}, 'dim_ordering must be in {tf, th}'
self.pool_size = pool_size
self.num_rois = num_rois
super(RoiPoolingConv, self).__init__(**kwargs)
def build(self, input_shape):
if self.dim_ordering == 'th':
self.nb_channels = input_shape[0][1]
elif self.dim_ordering == 'tf':
self.nb_channels = input_shape[0][3]
def compute_output_shape(self, input_shape):
if self.dim_ordering == 'th':
return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_size
else:
return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels
def call(self, x, mask=None):
assert(len(x) == 2)
img = x[0]
rois = x[1]
input_shape = K.shape(img)
outputs = []
for roi_idx in range(self.num_rois):
x = rois[0, roi_idx, 0]
y = rois[0, roi_idx, 1]
w = rois[0, roi_idx, 2]
h = rois[0, roi_idx, 3]
row_length = w / float(self.pool_size)
col_length = h / float(self.pool_size)
num_pool_regions = self.pool_size
#NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op
# in theano. The theano implementation is much less efficient and leads to long compile times
if self.dim_ordering == 'th':
for jy in range(num_pool_regions):
for ix in range(num_pool_regions):
x1 = x + ix * row_length
x2 = x1 + row_length
y1 = y + jy * col_length
y2 = y1 + col_length
x1 = K.cast(x1, 'int32')
x2 = K.cast(x2, 'int32')
y1 = K.cast(y1, 'int32')
y2 = K.cast(y2, 'int32')
x2 = x1 + K.maximum(1,x2-x1)
y2 = y1 + K.maximum(1,y2-y1)
new_shape = [input_shape[0], input_shape[1],
y2 - y1, x2 - x1]
x_crop = img[:, :, y1:y2, x1:x2]
xm = K.reshape(x_crop, new_shape)
pooled_val = K.max(xm, axis=(2, 3))
outputs.append(pooled_val)
elif self.dim_ordering == 'tf':
x = K.cast(x, 'int32')
y = K.cast(y, 'int32')
w = K.cast(w, 'int32')
h = K.cast(h, 'int32')
rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))
outputs.append(rs)
final_output = K.concatenate(outputs, axis=0)
final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))
if self.dim_ordering == 'th':
final_output = K.permute_dimensions(final_output, (0, 1, 4, 2, 3))
else:
final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))
return final_output
def get_config(self):
config = {'pool_size': self.pool_size,
'num_rois': self.num_rois}
base_config = super(RoiPoolingConv, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
值得注意的是,ROI pooling层接收的是由2个张量组成的list,而输出是个5D张量,所有需要配置compute_output_shape(self, input_shape) ,其中input_shape为2维张量。
简单来说,roipooling接受了一个[图像特征图信息,RPN中选取的框(1:1的正负样本)]的list,输出了num_rois个7×7×channel的特征层
resnet/vgg
这里定义了Faster RCNN整个流程所需要的网络,包括基础网络层,RPN,和后面的分类层。
值得注意的是:用resnet50的时候,把resnet50拆成了2个部分,1部分用前面的共享基础网络,一部分放在ROI pooling之后,这样的好处是可以提高后面分类网络的精度。