经典网络复现系列（二）：SegNet

1、论文简要
和FCN结构相似，只不过编码器使用了VGG16的13个卷积层，在池化过程中，保存了最大池化的索引。上采样到恢复到原本的位置，其他位置的元素为0，然后进行反卷积。

这样做的好处在于
1)改善边界描述
2)减少end2end的训练参数(与FCN相比节约内存)
3)这样的形式可用于多种encoder-decoder架构

有工作将RNN、条件随机场(CRF)引入配合decoder做预测，有助于提高边界描绘能力，并且指出了，CRF-RNN这一套可以附加到包括SegNet在内的任何深度分割模型。

现有的多尺度的深度神经网络架构的应用，常见两种形式：

将输入放缩为多个尺度得到相应的feature map
将一张图送到模型，得到不同层的feature map
这些方法的共同想法都是使用多尺度信息将高层的feature map包含的语义信息与底层的feature map包含的精度信息融合到一起。但是，这样方法参数多，比较难训练。
参考博客https://blog.csdn.net/u011974639/article/details/78916327

2、网络结构

SegNet的正式版有13个卷积层，5个池化层，对应有13个反卷积层，5个上采样层。
我复现了一个basic版，共4个卷积层，4个池化层，对应4个反卷积层和4个上采样层。
image
|
lrn(alpha=0.0001，beta=0.5)
|
conv1(conv+bias+batchnormal+relu) [7,7,3,64]
|
pool1(保留池化索引pool1_indices)
|
conv2(conv+bias+batchnormal+relu) [7,7,64,64]
|
pool2(保留池化索引pool2_indices)
|
conv3(conv+bias+batchnormal+relu) [7,7,64,64]
|
pool3(保留池化索引pool3_indices)
|
conv4(conv+bias+batchnormal+relu) [7,7,64,64]
|
pool4(保留池化索引pool4_indices)
|
upsample4(with pool4_indices)
|
deconv4(conv+bias+batchnormal) [7,7,64,64]
|
upsample3(with pool3_indices)
|
deconv3(conv+bias+batchnormal) [7,7,64,64]
|
upsample2(with pool2_indices)
|
deconv2(conv+bias+batchnormal) [7,7,64,64]
|
upsample1(with pool1_indices)
|
deconv1(conv+bias+batchnormal) [7,7,64,64]
|
final_deconv(conv+bias+batchnormal) [1,1,64,NUM_CLASSES]

可以看到
1)batchnormalization用在conv+bias以后
2)在encorder中使用relu 在decoder中不用。
3)原论文中在计算损失时对不同类别造成的损失乘以不同的权重，以此来实现类平衡，tensorflow中没有找到相关实现，因此我直接求了交叉熵，没有考虑类平衡的问题。

3、复现中的小trick

1)tensorflow 读取数据的方式
参考博客
https://blog.csdn.net/lujiandong1/article/details/53376802

Tensorflow数据读取有三种方式：

Preloaded data: 预加载数据
Feeding: Python产生数据，再把数据喂给后端。
Reading from file: 从文件中直接读取

这两种方案的缺点：
1、预加载：将数据直接内嵌到Graph中，再把Graph传入Session中运行。当数据量比较大时，Graph的传输会遇到效率问题。

2、用占位符替代数据，待运行的时候填充数据。

前两种方法很方便，但是遇到大型数据的时候就会很吃力，即使是Feeding，中间环节的增加也是不小的开销，比如数据类型转换等等。最优的方案就是在Graph定义好文件读取的方法，让TF自己去从文件中读取数据，并解码成可使用的样本集。

这次我们使用的是第三种方法，即直接从文件中读取数据。

代码示例：

#imgs_dir labels_dir是数据路径列表和标签路径列表



#首先将两个列表转化为tensor
imgs_tensor=ops.convert_to_tensor(imgs_dir,dtype=tf.string)
labels_tensor=ops.convert_to_tensor(labels_dir,dtype=tf.string)



#建立队列
filename_queue=tf.train.slice_input_producer([imgs_tensor,labels_tensor])



#从队列中读取图片名称和标签名称
image_filename = filename_queue[0]
label_filename = filename_queue[1]



#通过tf.read_file 将图片和gt的值读出来
imgs_values=tf.read_file(image_filename)
label_values=tf.read_file(label_filename)


#对读出的值解码恢复成图片格式
imgs_decorded=tf.image.decode_png(imgs_values)
labels_decorded=tf.image.decode_png(label_values)


#reshape成原本的形状
imgs_reshaped=tf.reshape(imgs_decorded,[FLAGS.img_height,FLAGS.img_width,3])
labels_reshaped=tf.reshape(labels_decorded,[FLAGS.img_height,FLAGS.img_width,1])


#转化数据类型
imgs_reshaped = tf.cast(imgs_reshaped, tf.float32)


#确定队列中最小数据量，一般取总样本量一定比例的数据，因此，当总样本量很大，选取的比例值要小一点，不然会导致最小数据量过大
min_fraction_of_examples_in_queue = FLAGS.fraction_of_examples_in_queue
min_queue_examples = int(FLAGS.num_examples_epoch_train *min_fraction_of_examples_in_queue)


print ('Filling queue with %d input images before starting to train.This may take some time.' % min_queue_examples)


#train的时候打乱顺序  test的时候顺序保持不变
if FLAGS.train==True:
images_batch, labels_batch = tf.train.shuffle_batch([imgs_reshaped,labels_reshaped],
                                                   batch_size=FLAGS.batch_size,
                                                   num_threads=6,
                                                   capacity=min_queue_examples + 3 * FLAGS.batch_size,
                                                   min_after_dequeue=min_queue_examples)
if FLAGS.train==False:
images_batch, labels_batch = tf.train.batch([imgs_reshaped, labels_reshaped],
                                            batch_size=FLAGS.batch_size,
                                            num_threads=6,
                                            capacity=min_queue_examples + 3 * FLAGS.batch_size)

2)卷积核的初始化方式采取了 He.al 的方法
也就是initializer=tf.contrib.layers.variance_scaling_initializer()
激活函数使用sigmoid和tanh的话 xavir的初始化方式更好

3)常见的变量初始化方式
参考博客https://blog.csdn.net/zlrai5895/article/details/80550924

4)batch_normalization的使用
介绍：https://blog.csdn.net/hjimce/article/details/50866313
用处加快收敛、可以不使用dropout、L2正则项参数、可以不使用lrn

使用参考博客https://blog.csdn.net/candy_gl/article/details/79551149
https://blog.csdn.net/zlrai5895/article/details/80551528

需要注意的是，保存模型的时候并没有采用第二篇博客所提出的方式，具体可参考代码。

采用batch_normalization以后，计算loss和优化的时候需要

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
	loss=tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=tf.squeeze(labels,squeeze_dims=3),logits=logits))
        train_op=tf.train.AdamOptimizer(0.001).minimize(loss)

从pre_trained的权重恢复模型的时候

sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())

5)unpool_with_argmax层的编写
这一步要实现的功能是把pool的特征图上采样成pool之前大小的特征图，同时根据之前存储的pool_indices将最大值重新放回原来的位置。
关键的函数是tf.scatter_nd()
使用方法参考博客https://blog.csdn.net/zlrai5895/article/details/80551056

相关实现代码：

def unpool_with_argmax(pool, ind, output_shape,name = None, ksize=[1, 2, 2, 1]):


    """
       Unpooling layer after max_pool_with_argmax.
       Args:
           pool:   max pooled output tensor
           ind:      argmax indices
           ksize:     ksize is the SAME as for the pool
       Return:
           unpool:    unpooling tensor
    """
    with tf.variable_scope(name):
        input_shape = pool.get_shape().as_list()      #输入tensor的shape    
        flat_input_size = np.prod(input_shape)        #tf.prod 把列表中的全部元素相乘
        ind_=tf.cast(tf.reshape(ind,[flat_input_size,1]),tf.int32)  
        pool_=tf.reshape(pool,[flat_input_size])      #把索引和pool层中的元素全部flat
        flat_output_shape =tf.constant([ output_shape[0]*output_shape[1] * output_shape[2] * output_shape[3]])  #输出的shape flat
        ret= tf.scatter_nd(ind_, pool_, flat_output_shape)  #嵌入
        ret = tf.reshape(ret, output_shape)   #reshape到输出的shape
        return ret

6)lrn的使用：(局部响应归一化层)
参考博客 https://blog.csdn.net/yangdashi888/article/details/77918311
被证明了用处不大

7)最大池化的时候保留索引

tf.nn.max_pool_with_argmax()

8)tf.cond()的用法
在TensorFlow中，tf.cond()类似于c语言中的if...else...，用来控制数据流向，但是仅仅类似而已，其中差别还是挺大的。
代码：

z = tf.multiply(a, b)  
result = tf.cond(x < y, lambda: tf.add(x, z), lambda: tf.square(y))

9)collection
tensorflow的collection提供一个全局的存储机制，不会受到变量名生存空间的影响。一处保存，到处可取。

tf.add_to_collection(name, value)  		#向collection中存数据 
tf.Graph.get_collection(name, scope=None)       #从collection中获取数据

10)tf.one_hot()用于将label转换成one-hot的形式

11)instance segmentation其实是semantic segmentation和object detection殊途同归的一个结合点, 是个挺重要的研究问题. 非常期待后面能同时结合semantic segmentation和object detection两者优势的instance segmentation算法和网络结构.（Mask R-CNN等系列正在突破)

12)数组的奇异值分解
参考博客https://blog.csdn.net/u012162613/article/details/42214205
代码：

A=mat([[1,2,3],[4,5,6]])  
from numpy import linalg as la  
U,sigma,VT=la.svd(A)

13)numpy.prod

numpy.prod(a, axis=None, dtype=None, out=None, keepdims=<class 'numpy._globals._NoValue'>)

返回给定轴上的数组元素的乘积。

14)tf.train.slice_input_producer,tf.train.string_input_producer两种队列批量读取方式的比较
参考博客https://blog.csdn.net/qq_30666517/article/details/79715045

tf.train.string_input_producer(path),传入路径时，不需要放入list中。然后加载图片的reader是tf.WholeFileReader(),其他地方和tf.train.slice_input_producer()函数用法基本类似。

15)tf.convert_to_tensor
tf.convert_to_tensor用于将不同数据变成张量：比如可以让数组变成张量、也可以让列表变成张量。

16)tf.concat
tf.concat是连接两个矩阵的操作

tf.concat(concat_dim, values, name='concat')

17)numpy.bincount(x)详解
其实就是返回索引值在x中出现的次数。
参考博客https://blog.csdn.net/xlinsist/article/details/51346523

18)numpy.diag()
返回一个矩阵的对角线元素，或者创建一个对角阵
参考 https://jingyan.baidu.com/article/59703552e03ce18fc0074005.html

19)预测结果的评价：
采用三个指标：总体精度、类精度、IOU(predict和label的交集/并集)
代码：

def predict_eval(predictions, label_tensor):
    labels = label_tensor
    num_class = FLAGS.num_class
    size = predictions.shape[0]
    hist = np.zeros((num_class, num_class))
    for i in range(size):
      hist += fast_hist(labels[i].flatten(), predictions[i].argmax(2).flatten(), num_class)
    acc_total = np.diag(hist).sum() / hist.sum()
    print ('accuracy = %f'%np.nanmean(acc_total))
    iu = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
    print ('mean IU  = %f'%np.nanmean(iu))
    for ii in range(num_class):
        if float(hist.sum(1)[ii]) == 0:
          acc = 0.0
        else:
          acc = np.diag(hist)[ii] / float(hist.sum(1)[ii])
        print("    class # %d accuracy = %f "%(ii,acc))


def fast_hist(a, b, n):
    k = (a >= 0) & (a < n)
    return np.bincount(n * a[k].astype(int) + b[k], minlength=n**2).reshape(n, n)

4、源代码：

代码地址：

https://github.com/zlrai5895/SegNet_tensorflow

5、实验效果

在验证集上部分图片取得了90%以上的总体准确率，但在测试集上总体准确率掉到了50%+，但验证集和测试集均没有参与训练，比较奇怪，可能是两个数据集分布有较大的差异。

经典网络复现系列（二）：SegNet

猜你喜欢