from Deep Learning Based Semantic Labelling of 3D Point Cloud in Visual SLAM
-
超体素方法进行预分割,将点云根据相似性变成表层面片(surface patches)降低计算复杂度。
将场景分割问题转换为图分割问题(graph partitioning problem)
- Method 1:Mean-shift聚类算法 计算node之间的距离
-
node指的是每个patch,连接node之间的line就是相邻patch的共边;
- 距离可以是欧氏距离,也可以是马氏距离;
- Mean-shift算法可见简单介绍及Python实现或者简单的机器学习算法Mean-shift算法
缺点:计算量太大
- Method 2:利用面片的法向量方法 聚类 法向量可以表示出局部凸性信息。
缺点:当noise太多的时候可靠性降低。
- 最终使用method 2 结合可靠性平面来做分割 最后使用图割法分割
关于2D Object Detection and Semantic Segmentation
An essential component to get semantic information is object detection, which can localize object instances in images. Girshick et al. [21] presented R-CNN, which proposed to apply CNN to object detection. Other similar methods have been proposed in recent years, like Fast R-CNN [22], Faster RCNN [23], Mask-RCNN [24] and YOLO [25-26]. R-CNN uses selective search algorithm for generating region proposals, which runs very slow. Faster R-CNN replaces the slow selective search algorithm with a fast neural net. Mask R-CNN improves the region of interest (ROI) pooling layer and extends Faster R-CNN to pixel-level image segmentation
Semantic segmentation is to understand an image at a pixel level, which can label each pixel with a class identity. Similar to object detection, state-of-the-art semantic segmentation approaches also rely on CNN. FCN [4] by Long et al. is the first end-to-end system, which popularizes CNN architecture for semantic segmentation. U-Net [5] is a popular encoder-decoder architecture which can make use of annotated samples more efficiently and have a higher accuracy. SegNet [6] is a similar encoderdecoder architecture. SegNet copies indices from max-pooling for up-sampling, which makes it more memory efficient. RefineNet [7] proposes a method called RefineNet block which fuses both high resolution and low resolution features. It solves the problem of significant decrease in image resolution when we repeat the sub-sampling operation. PSPNet [8] introduces a pyramid pooling method to aggregate the context. DeepLab [9-11] utilizes dilated convolutions to increase the field of view.