#读源码+论文# 三维点云分割Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

from Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

超体素方法进行预分割，将点云根据相似性变成表层面片（surface patches）降低计算复杂度。

将场景分割问题转换为图分割问题（graph partitioning problem）

Method 1：Mean-shift聚类算法计算node之间的距离

node指的是每个patch，连接node之间的line就是相邻patch的共边；
距离可以是欧氏距离，也可以是马氏距离；
Mean-shift算法可见简单介绍及Python实现或者简单的机器学习算法Mean-shift算法

缺点：计算量太大

Method 2：利用面片的法向量方法聚类法向量可以表示出局部凸性信息。

缺点：当noise太多的时候可靠性降低。

最终使用method 2 结合可靠性平面来做分割最后使用图割法分割

关于2D Object Detection and Semantic Segmentation

An essential component to get semantic information is object detection, which can localize object instances in images. Girshick et al. [21] presented R-CNN, which proposed to apply CNN to object detection. Other similar methods have been proposed in recent years, like Fast R-CNN [22], Faster RCNN [23], Mask-RCNN [24] and YOLO [25-26]. R-CNN uses selective search algorithm for generating region proposals, which runs very slow. Faster R-CNN replaces the slow selective search algorithm with a fast neural net. Mask R-CNN improves the region of interest (ROI) pooling layer and extends Faster R-CNN to pixel-level image segmentation
Semantic segmentation is to understand an image at a pixel level, which can label each pixel with a class identity. Similar to object detection, state-of-the-art semantic segmentation approaches also rely on CNN. FCN [4] by Long et al. is the first end-to-end system, which popularizes CNN architecture for semantic segmentation. U-Net [5] is a popular encoder-decoder architecture which can make use of annotated samples more efficiently and have a higher accuracy. SegNet [6] is a similar encoderdecoder architecture. SegNet copies indices from max-pooling for up-sampling, which makes it more memory efficient. RefineNet [7] proposes a method called RefineNet block which fuses both high resolution and low resolution features. It solves the problem of significant decrease in image resolution when we repeat the sub-sampling operation. PSPNet [8] introduces a pyramid pooling method to aggregate the context. DeepLab [9-11] utilizes dilated convolutions to increase the field of view.

#读源码+论文# 三维点云分割Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM

猜你喜欢