代码解读:基于深度学习的单目深度估计(1)

代码解读:基于深度学习的单目深度估计(1)


在分析完Depth Map Prediction from a Single Image这篇论文之后,来分析一下背后的代码

如果读者对这篇论文不熟悉,可以看我的博文,或是直接阅读原始文献等


言归正传,作者并没有附上全部的代码,

This tree contains code for depth prediction network inference.  While there is

some code relating to training, much of the training code including most data

processing is not provided here.  We may release this in the future, however.

总之,论文中所说的输入图像的处理手段统统没有


并且,

While developing this project, we made a few modifications in theano not

currently part of the main codeline.  While the above instructions should work

for inference on a current unmodified theano build, it may take up more GPU

memory than needed due to use of test values for shape information.  The git

patch file "theano_test_value_size.patch" is also included and might be used to

enable this feature on your own tree.

作者对theano的文件做了修改


读完ReadMe,就开始分析代码了,打开test.py:

做一个初始化的工作,载入CNN经过训练而得到的参数,

    # location of depth module, config and parameters
    module_fn = 'models/depth.py'
    config_fn = 'models/depth.conf'#网络结构
    params_dir = 'weights/depth'#网络相关参数

    # load depth network
    machine = net.create_machine(module_fn, config_fn, params_dir)

先来看depth.conf文件,记录了不同卷积层池化层的具体参数,比较好理解,摘录如下:

[config]
imports = numpy as np

[train]
resumptive = True
learning_rate = 1
bsize = 32
momentum = 0.9
nepochs = 151
evaluate_epochs = 10
save_stats_epochs = 10
checkpoint_all_freq = 50
train_conv = True

[data]
data_dir = /home/deigen/proj/depth/output/000433-data-all-320x240-normals
local_dir = /scratch/deigen/data/000433-data-all-320x240-normals
depth_space = log
zero_mean_images = False
divide_std_images = False
zero_mean_depths = False
divide_std_depths = True

data_dir 指训练用的图像集位置,深度图像用log来表示数据

注意一些术语:

batchsize:中文翻译为批大小(批尺寸)。在深度学习中,一般采用SGD训练,即每次训练在训练集中取

batchsize个样本训练;

iteration:中文翻译为迭代,1个iteration等于使用batchsize个样本训练一次;

一个迭代 = 一个正向通过+一个反向通过

epoch:迭代次数,1个epoch等于使用训练集中的全部样本训练一次;

一个epoch = 所有训练样本的一个正向传递和一个反向传递

另外,zero_mean使图像像素均值为零,divide_std使图像像素除以整体方差值,都是简单的输入图像预处理


全连接层1的数据(一共有两个全连接层)

[init]

[load]

[full1]
type = full
load_key = coarse_stack
noutput = 4096
init_w = lambda shp: 0.01*np.random.randn(*shp)
bias = True
weight_decay_w = 0.0001
learning_rate_scale_w = 0.1
learning_rate_scale_b = 0.1
dropout = False

可以了解到:

1,type指网络层的类型,load_key 指所属网络类型(是coarse网络,还是fine网络)

2,noutput 指输出的个数,init_w 指初始权值的大小(随机生成)

3,bias 指有无偏置,learning_rate指学习权值和偏置的学习率,weight_delay 指学习滞后率


不难理解这个卷积层的配置

[conv_s2_1]
type = conv
load_key = fine_stack
filter_shape = (64,3,9,9)
stride = 2
init_w = lambda shp: 0.001*np.random.randn(*shp)
init_b = 0.0
conv_mode = valid
weight_decay_w = 0.0001
learning_rate_scale_w = 0.001
learning_rate_scale_b = 0.001

池化层也容易理解,

pool_s2_1]
type = maxpool
poolsize = (3,3)
poolstride = (2,2)


接下来再分析depth.py,

分析machine这个类,

class machine(Machine):
    def __init__(self, conf):
        Machine.__init__(self, conf)

分析infer_depth函数,

    def infer_depth(self, images):
        '''
        Infers depth maps for a list of 320x240 images.
        images is a nimgs x 240 x 320 x 3 numpy uint8 array.
        returns depths (nimgs x 55 x 74) corresponding to the center box
        in the original rgb image.
        '''
        images = images.transpose((0,3,1,2))
        (nimgs, nc, nh, nw) = images.shape
        assert (nc, nh, nw) == (3, 240, 320)#网络的输出图片数据为(1,3, 240, 320)

        (input_h, input_w) = self.input_size#网络输入feature map 图片的大小
        (output_h, output_w) = self.output_size#网络输出feature map大小

        bsize = self.bsize
        b = 0

        # pred_depth为输出,Tensor 类型变量,
        v = self.vars
        pred_depth = self.inverse_depth_transform(self.fine.pred_mean)
        infer_f = theano.function([v.images], pred_depth)

        depths = np.zeros((nimgs, output_h, output_w), dtype=np.float32)

        # 一张图片的中心 bbox ,(i0, i1)为矩形的左上角、(j0, j1)为矩形的右下角
        dh = nh - input_h
        dw = nw - input_w
        (i0, i1) = (dh/2, nh - dh/2)
        (j0, j1) = (dw/2, nw - dw/2)

        # infer depth for images in batches
        b = 0
        while b < nimgs:
            batch = images[b:b+bsize]
            n = len(batch)
            if n < bsize:
                batch = _zero_pad_batch(batch, bsize)

            # crop to network input size
            batch = batch[:, :, i0:i1, j0:j1]

            # infer depth with nnet
            depths[b:b+n] = infer_f(batch)[:n]
            
            b += n

        return depths

这个函数大致的意思是把RGB图像和depth图像对应起来,(用训练好的CNN,输入层是GRB图像,输出是depth)

RGB图像是240×320,depth图像是55×74,缩放比例大致是4.32


下次再来具体分析!


猜你喜欢

转载自blog.csdn.net/qq_39732684/article/details/80647093