IROS2019 部分论文整理2

Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

abstract

To autonomously navigate and plan interactions in real-world environments, robots require the ability to robust perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene geometry, the key insight toward a truly functional understanding of the environment is the usage of higher level entities during mapping, such as individual object instances.
(个人理解:除了使用环境中物体的几何信息作为传感器的量测值用作SLAM之外,还要对周围环境做语义理解才能达到更高层面的感知范围。)
This work presents an approach to incrementally build volumetric object-centric maps during online scanning with a localized RGB-D camera. First, a per-frame segmentation scheme combines an unsupervised geometric approach with instance-aware semantic predictions to detect both recognized scene elements as well as previously unseen objects.
(第一步,将单帧用基于几何的无监督方法进行分割预测,同时检测已经识别的元素和未见过的元素。)
Next, a data association step tracks the predicted instances across the different frames.
(第二步进行一个数据关联步骤,在不同帧之间跟踪预测的个体。)
Finally, a map integration strategy fuses information about their 3D shape, location, and, if available, semantic class into a global volume.
(最后,地图集成策略将物体的3D形状、位置和语义类的信息融合到一个全局地图中。)
Evaluation on a publicly available dataset shows that the proposed approach for building instance-level semantic maps is competitive with state-of-the-art methods, while additionally able to discover objects of unseen categories. The system is further evaluated within a real-world robotic mapping setup, for which qualitative results highlight the online nature of the method.
Code is available at https://github.com/ethz-asl/voxblox-plusplus.

Main contribution

  1. A combined geometric-semantic segmentation (组合几何-语义分割) scheme that extends object detection to novel, previously unseen categories.
  2. A data association strategy for tracking and matching instance predictions across multiple frames. (多帧之间的个体跟踪和匹配)
  3. Evaluation of the framework on a publicly available dataset and within an online robotic mapping setup.

Related work

  1. Object detection and segmentation
    最近的Mask-CNN架构能够预测每个被检测实例的逐像素语义注释掩码。基于学习(learning-based)的个体分割方法的主要限制在于需要大量的训练数据。对于在真实场景中可能遇到的所有可能的类别,这种带注释的数据需要大量人力成本。此外,这些算法只能识别训练集中提供的固定的类集合,因此不能正确地分割和分类其他未见过的对象类别。
  2. Semantic object-level mapping
    最近深度学习技术的发展使得在SLAM系统中集成大量丰富的语义信息成为可能,[16]将一个CNN的semantic预测融合在一个使用SLAM框架的稠密图中。然而,传统的语义分割不知道对象实例,即它不能消除属于同一类别的个别实例之间的歧义。因此,[16]中的方法不提供关于场景中单个对象的几何和相对位置的任何信息。然而,基于几何的方法往往过度分割铰接场景元素。因此,在没有instance-level信息的情况下,一个联合的语义几何分割不足以将场景的各个部分分割成不同的独立对象。针对当前主流基于几何和基于学习的方法进行分析和总结。
    与本文相关工作[20]提出了一种增量的基于几何的分割策略,对同一个实例进行与YOLO-v2的bounding box耦合来分类和融合几何分割。
    在本文中,volumetric TSDF-based representation没有丢弃有价值的空闲空间信息,并且在3D地图中显式的从未知空间中区分出被观测到的空闲区间。与之前的所有方法相比,该方法能够逐步提供密集重建的环境体积地图,其中包含关于场景中已知和未知对象元素的形状(shape)和姿态(pose)信息。

Method

本文所提出的方法主要为四个步骤:(1)几何分割(geometric segmentation)(2)语义实例感知分割细化(semantic instance-aware segmentation refinement)(3)数据关联(data association)(4)地图集成(map integration)。首先,深度地图根据一个基于凸形(convexity-based geometric)的几何方法生成准确描述现实世界物理边界的线段轮廓;接下来此RGB frame由Mask RNN检测对象实例并进行像素级的语义标注,每一个实例用来和相关的深度分割结果进行语义标注,合并属于同一几何上过度分割的非凸对象实例的线段。数据关联策略将当前帧中发现的片段及其组成实例与已存储在map中的片段相匹配,最后,片段被集成到稠密的3D地图中,这个融合策略对独立的分割结果进行跟踪。
在这里插入图片描述
(1)几何分割
在这里插入图片描述
计算每个深度点的法线,然后计算相邻发现的夹角来寻找区域边界。再计算深度跳变大的位置作为特征。
(2)语义个体感知分割修正
To complement the unsupervised geometric segmentation of each depth frame with semantic object instance information, the corresponding RGB images are processed with the Mask R-CNN framework [1] (Mask R-CNN完成了在depth上的无监督几何分割和语义信息的结合)。
在这里插入图片描述
(3)数据关联
Because the frame-wise segmentation processes each incoming RGB-D image pair independently, it lacks any spatiotemporal information about corresponding segments and instances across the different frames. Specifically, this means that it does not provide an association between the set of predicted segments S t {S_{t}} and the set of segments S t + 1 {S_{t+1}} . 独立按帧处理的rgb-d图像pair缺乏不同帧之间的时空关联信息.同时,在不同帧之间可能将同一个物体分割成两类。
A data association step is proposed here to track corresponding geometric segments (跟踪相关几何分割) and predicted object instances across frames(跨帧的预测物体).
(4)地图集成
The 3D segments discovered in the current frame, including some which are enriched with class and instance information, are fused into a global volumetric map. To this end, the Voxblox [10] TSDF-based dense mapping framework is extended to additionally encode object segmentation information.

发布了63 篇原创文章 · 获赞 50 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/qq_38649880/article/details/103421816
今日推荐