LEARNING TO EXPLORE USING ACTIVE NEURAL SLAM Paper Reading

Paper information

Title : LEARNING TO EXPLORE USING ACTIVE NEURAL SLAM
Author : Devendra Singh Chaplot, Dhiraj Gandhi
Project address : https://devendrachaplot.github.io/projects/Neural-SLAM
Code address : https://github.com/devendrachaplot/Neural- SLAM
Source : LCLR
Time : 2022

Abstract

This work proposes a modular and hierarchical approach to learning policies for exploring 3D environments, called "Active Neural SLAM".

Our approach combines the strengths of classical and learning-based methods by using an analytical path planner with a learned SLAM module together with global and local policies.

The use of learning provides flexibility in input patterns (in the SLAM module), exploits the structural regularities of the world (in the global policy), and provides robustness against errors in state estimation (in the local policy).

The proposed model is also easily transferable to the PointGoal task and is the winning entry of the CVPR 2019 Habitat PointGoal Navigation Challenge.

Introduction

While using learning for exploration is well motivated, translating the exploration problem into an end-to-end learning problem has its own drawbacks. Learning mappings, state estimation, and path planning purely from data in an end-to-end manner can be very difficult. Thus, past end-to-end learning work for exploration by Chen et al. (2019) relies on imitation learning and the use of millions of frames of experience, but still performs worse than classical methods that do not require any training at all.

In this paper, we examine alternatives to exploration using learning that preserve the advantages that learning has to offer but without the drawbacks of full-fledged end-to-end learning. Our key conceptual insights are that learning is used in order to exploit structural regularities of indoor environments, robustness to state estimation errors, and flexibility to input patterns. These occur on different time scales and thus can be factored out.

Our proposed exploration architecture consists of a learned neural SLAM module, a global policy, and a local policy, which are connected by a map and an analytical path planner.

The learning neural SLAM module generates free-space maps and estimates agent poses from input RGB images and motion sensors.
The global policy uses agent poses to occupy this free-space map, and uses learning to exploit structural regularities in the layout of real-world environments to generate long-term goals.
These long-term goals are used to generate short-term goals for local policy (using a geometric path planner).
The local policy uses learning to directly map the raw RGB image to what the agent should do.

Using learning in the SLAM module provides flexibility in terms of input modality, the learned global policy can exploit regularities in the real-world environment layout, and the learned local policy can use visual feedback to exhibit more robust behavior.

Related Work

Navigation Approaches

Classical navigation methods divide the problem into two parts: mapping and path planning.

Exploration in Navigation

While many works focus on passive map building, path planning, and goal-driven policy learning, a small number of works address the problem of active SLAM, i.e. how to actively control cameras to build maps.

Hierachical and Modular Policies (hierarchical and policy learning)

Hierarchical reinforcement learning (Dayan and Hinton, 1993; Sutton et al., 1999; Barto and Mahadevan, 2003) is an active research area aimed at automatically discovering hierarchical structures to accelerate learning. However, this has proven challenging, so most works resort to using manually defined hierarchies. For example, in terms of navigation, Bansal et al. (2019) and Kaufmann et al. (2019) design modular policies for navigation, connecting learned policies with low-level feedback controllers. Hierarchical and modular strategies have also been used for embedded question answering (Das et al., 2018a; Gordon et al., 2018; Das et al., 2018b).

Task Setup

Actuation and Noise Model (drive and noise model)

We denote the agent's pose by (x, y, o), assuming the agent moves from p 0 = ( 0 , 0 , 0 ) p_0 = (0, 0, 0)p0=(0,0,0 ) to start. Now, suppose the agent takes an action. Each action is implemented as a control command for the robot.

Let the corresponding control command be Δua = ( xa , ya , oa ) Δu_a = (x_a,y_a,o_a)u _a=(xa,ya,oa) . Let the pose of the agent after the action bep 1 = ( x ∗ , y ∗ , o ∗ ) p_1 = (x^*, y^*, o^*)p1=(x,y,o ). Driving noise (ϵ act \epsilon _{act}ϵact) is the actual agent pose after the action ( p 1 p_1p1) with the expected agent pose ( p 0 + Δu p_0 + Δup0+Δ u ) Independent function:
ϵ act = p 1 − ( p 0 + Δ u ) = ( x ∗ − xa , y ∗ − ya , o ∗ − oa ) \epsilon _{act} = p_1-(p_0+\ Delta u)=(x^*-x_a,y^*-y_a,o^*-o_a)ϵact=p1(p0+u ) _=(xxa,yya,ooa)
Mobile robots typically have sensors that estimate the pose of the robot while it is moving. Let the sensor estimate the pose of the agent after the action isp 1 ′ = ( x ′ , y ′ , o ′ ) p^{\prime}_1 = (x^{\prime}, y^{\prime}, o^ {\prime})p1=(x,y,o' ). Sensor noise (ϵ sen \epsilon _{sen}ϵsen) is estimated by the sensor pose ( p 1 ′ p^{\prime}_1p1) and the actual agent pose ( p 1 p_1p1) is given by the difference between
ϵ sen = p 1 ′ − p 1 = ( x ′ − x ∗ , y ′ − y ∗ , o ′ − o ∗ ) \epsilon _{sen} = p^{\prime} _1-p_1=(x^{\prime}-x^*,y^{\prime}-y^*,o^{\prime}-o^*)ϵsen=p1p1=(xx,yy,oo)

We use three default navigation actions: forward: move forward 25 cm, turn right: rotate 10 degrees clockwise in place, turn left: rotate 10 degrees counterclockwise in place. The implementation of the control command is:
u Forward = ( 0.25 , 0 , 0 ) u_{Forward} = (0.25, 0, 0)uForward=(0.25,0,0) u R i g h t : ( 0 , 0 , − 10 ∗ π / 180 ) , u L e f t : ( 0 , 0 , 10 ∗ π / 180 ) u_{Right} : (0, 0, −10∗π/180) , u_{Left} : (0, 0, 10 ∗ π/180) uRight:(0,0,10p /180 ) ,uLeft:(0,0,10p /180 )

Methods

"Active Neural SLAM". It consists of three components: neural SLAM module, global policy and local policy, as shown in Figure 1. The neural SLAM module predicts environment maps and agent poses based on current observations and previous predictions. The global policy uses the predicted map and agent poses to generate long-term goals. Use path planning to convert long-term goals into short-term ones. Local policies take navigational actions based on current observations to achieve short-term goals.
insert image description here

Map Representation

Active Neural SLAM model internal maintenance space map, mt m_tmtand agent xt x_txtattitude. spatial map mt m_tmtis a 2 × M × M matrix, where M × M represents the map size, and each element in the spatial map corresponds to a cell of size 25cm2 (5cm × 5cm) in the physical world. Each element in the first channel represents the probability that an obstacle exists at the corresponding location, and each element in the second channel represents the probability that the location is being explored. A cell is considered explored when it is known to be free space or an obstacle. The spatial map is initialized with all zeros at the beginning of the step, m 0 = [ 0 ] 2 × M × M m_0 = [0]^{2×M×M}m0=[0]2×M×M

Pose xt ∈ R 3 x_t ∈ \mathbb{R}^3xtR3 represents the x and y coordinates of the agent and the orientation of the agent at time t. The agent always starts facing east from the center of the map at the beginning of the step,x 0 = ( M / 2 , M / 2 , 0.0 ) x_0 = (M/2, M/2, 0.0)x0=(M/2,M/2,0.0)

Neural SLAM Module

Neural SLAM module ( f SLAM f_{SLAM}fSLAM) Receive the current RGB observation value st s_tst, the agent pose xt − 1 acquired by the current and last sensor : t ′ x^{\prime}_{t−1:t}xt1:t, the last agent pose and map estimation x ^ t − 1 \hat{x}_{t−1}x^t1 m t − 1 m_{t−1} mt1, and output the updated map mt m_tmtand the current agent pose estimate x ^ t \hat{x}_{t}x^t(见图 2): m t , x ^ t = f S L A M ( s t , x t − 1 : t ′ , x ^ t − 1 , m t − 1 ∣ θ S ) m_t, \hat{x}_t = f_{SLAM}(s_t, x^{\prime}_{t−1:t}, \hat{x}_{t−1}, m_{t−1}|θ_S) mt,x^t=fSLAM(st,xt1:t,x^t1,mt1θS) , whereθ S θ_SiSRepresents a trainable parameter neural SLAM module.

insert image description here
It consists of two learning components: a mapper and a pose estimator. Mapper( f M ap f_{Map}fMap) outputs an egocentric top-down two-dimensional spatial map, ptego ∈ [ 0 , 1 ] 2 × V × V p^{ego}_t ∈ [0, 1]^{2×V ×V}ptego[0,1]2 × V × V (where V is the field of view), predicting obstacles and exploration areas in the current observation. pose estimator (f PE f_{PE}fPE) based on past pose estimates ( x ^ t − 1 \hat{x}_{t-1}x^t1) and the last two egocentric map predictions ( pt − 1 : tegop^{ego} _{t−1:t}pt1:tego) to predict the agent pose ( x ^ t \hat{x}_{t}x^t). It essentially compares the current egocentric map prediction with the last egocentric map prediction transformed to the current frame to predict the pose change between the two maps. Based on the pose estimate given by the pose estimator, the egocentric map from the mapper is transformed into a geocentric map, which is then compared with the previous spatial map ( mt − 1 m_{t−1}mt1) aggregated to get the current map ( mt m_tmt)。

Global Policy

Global Policy以 h t ∈ [ 0 , 1 ] 4 × M × M h_t ∈ [0, 1]^{4×M×M} ht[0,1]4 × M × M is the input, where the first two channels of ht are the spatial mapmt m_tmt, the third channel represents the position of the current agent estimated by the SLAM module, and the fourth channel represents the visited position, that is,
∀ i , j ∈ 1 , 2 , . . . , m ∀i, j ∈ {1, 2, ...,m}i,j1,2,...,m :
insert image description here
inht h_thtWe perform two transformations before passing to the global policy model. The first transformation pair comes from ht h_thtA window of size 4 × G × G around the agent is subsampled. The second transformation performs a max pooling operation to get h_t from hthtObtain an output of size 4×G×G. Both transformations are stacked to form a tensor of size 8 × G × G that is passed as input to the global policy model. The global strategy uses a convolutional neural network to predict the long-term target gtl in the G × G space: gtl = π G ( ht ∣ θ G ) g^l_t: g^l_t = πG(h_t|θ_G)gtlgtl=πG(htθG) , whereθ G θ_GiGis a parameter of the global strategy.

Planner

Planner will long-term goals ( gtlg^l_tgtl), spatial obstacle map ( mt m_tmt) and agnet pose estimation ( x ^ t \hat{x}_tx^t) as input to calculate the short-term target gtsg^s_tgts ,即 g t s = f P l a n ( g t l , m t , x ^ t g^s_t = f_{Plan}(g^l_t, m_t, \hat{x}_t gts=fPlan(gtl,mt,x^t). It uses m_t based on the current spatial map mtmtThe fast-marching method (Sethian, 1996) computes the distance from the current agent position to the long-term goal ( gtlg^l_tgtl) of the shortest path. Undeveloped areas are considered planning free spaces. We compute the short-term goal coordinates (the farthest point within ds (= 0.25m) from the agent) on the planned path.

Local Policy

The local strategy takes the current RGB observations ( st s_tst) and short-term goals ( gtsg^s_tgts) as input, and output the navigation operation at = π L ( st , gts ∣ θ L ) a_t = π_L(s_t, g^s_t |θ_L)at=PiL(st,gtsθL) , whereθ L θ_LiLis a parameter of the local policy. The short-term goal coordinates are converted to relative distances and angles from the agent's position before being passed to the local policy. The local policy is a recurrent neural network consisting of a pretrained ResNet18 (He et al., 2016) as the visual encoder.

Experiment

insert image description here
insert image description here

expand

We propose a modular navigation model that leverages the strengths of classical and learning-based navigation methods. We show that the proposed model outperforms previous methods on both the Exploration and PointGoal tasks, and shows strong generalization across domains, objectives, and tasks. In the future, the proposed model can be extended to complex semantic tasks such as semantic object navigation and Ebmbodied question answering by using the Semantic Neural SLAM module , which creates multi-channel maps that capture the semantic properties of objects in the environment. The model can also be combined with previous localization efforts, relocating in previously created maps for efficient navigation in subsequent episodes.

Guess you like

Origin blog.csdn.net/qin_liang/article/details/132034997