【具身智能】开放词汇移动操控(OVMM) 实验复现教程（Habitat-challenge2023）

在这里插入图片描述

文章目录

Open Vocabulary Mobile Manipulation (OVMM) Challenge复现

Open Vocabulary Mobile Manipulation (OVMM) Challenge复现

1. 源码拉取以及示docker镜像获取

git clone https://github.com/facebookresearch/home-robot.git --branch home-robot-ovmm-challenge-2023-v0.1.2

在这里插入图片描述

git clone https://github.com/facebookresearch/home-robot.git --branch home-robot-ovmm-challenge-2023-v0.1.2

移动目录

cd projects/habitat_ovmm

修改docker文件，docker的存放位置变更
https://cloud.tencent.com/developer/article/1806464?areaSource=102001.13&traceId=EKfzQGOFpKhiAkWHuAxSf

vim projects/habitat_ovmm/docker/ovmm_baseline.Dockerfile

根据本机情况，修改第一行的镜像信息，笔者是ubuntu20.04

FROM fairembodied/habitat-challenge:homerobot-ovmm-challenge-2023-ubuntu20.04

# install dependencies in the home-robot conda environment
RUN /bin/bash -c "\
    . activate home-robot \
    && pip install <some extra package> \
    "

# add your agent code
ADD your_agent.py /home-robot/projects/habitat_ovmm/agent.py

# add submission script
ADD scripts/submission.sh /home-robot/submission.sh

# set evaluation type to remote
ENV AGENT_EVALUATION_TYPE remote

# run submission script
CMD /bin/bash -c "\
    . activate home-robot \
    && cd /home-robot \
    && export PYTHONPATH=/evalai_remote_evaluation:$PYTHONPATH \
    && bash submission.sh \
    "

编译docker

docker build . \
    -f docker/ovmm_baseline.Dockerfile \
    -t ovmm_baseline_submission
    --network host

在这里插入图片描述

在这里插入图片描述

[Core] ManagedContainerBase.h(329)::checkExistsWithMessage : ::getObjectByHandle : Unknown Lighting Layout managed object handle : . Aborting

2. habitat_baslines ovmm示例

python projects/habitat_ovmm/eval_baselines_agent.py --env_config projects/habitat_ovmm/configs/env/hssd_demo.yaml

AttributeError: module ‘PIL.Image’ has no attribute ‘LINEAR’
https://github.com/facebookresearch/detectron2/issues/5010

python3 -m pip install -U 'git+https://github.com/facebookresearch/detectron2.git@ff53992b1985b63bd3262b5a36167098e3dada02'

过程图片在/home/moresweet/home-robot/datadump/images/eval_hssd中
在这里插入图片描述

3. 强化学习DDPG Demo

cd /path/to/home-robot/src/third_party/habitat-lab/

# create soft link to data/ directory
ln -s /path/to/home-robot/data data

在这里插入图片描述

cd src/third_party/habitat-lab
# 将data链接
ln -s ln -s /home/moresweet/home-robot/data data
vim run.sh

单机训练使用以下脚本内容

#/bin/bash
  
export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet

set -x
python -u -m habitat_baselines.run \
   --config-name ovmm/rl_skill.yaml \
   habitat_baselines.evaluate=False benchmark/ovmm=gaze \
   habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/gaze

#/bin/bash

export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet

set -x
python -u -m habitat_baselines.run \
   --exp-config habitat-baselines/habitat_baselines/config/ovmm/rl_skill.yaml \
   --run-type train benchmark/ovmm=<skill_name> \
   habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/<skill_name>

其中<skill_name> 替换为 gaze, place, nav_to_obj 或者nav_to_rec

笔者用的gaze

sudo chmod u+x run.sh
./run.sh

在这里插入图片描述

日志默认在/home/moresweet/home-robot/src/third_party/habitat-lab/tb
查看日志按照下列命令

cd /home/moresweet/home-robot/src/third_party/habitat-lab/tb
conda activate home-robot
tensorboard --logdir

在这里插入图片描述

4. 模型验证

新建脚本eval.sh填入以下内容

#/bin/bash
  
python -u -m habitat_baselines.run \
   --config-name ovmm/rl_skill.yaml \
   habitat_baselines.evaluate=True

修改rl_skill.yaml

eval_ckpt_path_dir: "data/new_checkpoints/ovmm/gaze"

在这里插入图片描述

过程视频
在这里插入图片描述

在这里插入图片描述
当然，修改配置文件rl_skill.yaml中的video_option为["tensorboard"]也可以在tensorboard中查看，区别不大

tensorboard --logdir=./

在这里插入图片描述

5. 问题解决

5.1 CUDA out of memory

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 94.00 MiB (GPU 0; 11.75 GiB total capacity; 5.26 GiB already allocated; 62.62 MiB free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception ignored in: <function VectorEnv.__del__ at 0x7f34f5cf4940>
Traceback (most recent call last):
  File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 615, in __del__
  File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 470, in close
  File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 131, in __call__
  File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 63, in send
  File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 200, in send_bytes
  File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
  File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe

爆显存了
这种情况可以修改配置文件

vim /home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config/ovmm/rl_skill.yaml

修改num_environments的数值，修改到自己的设备可以承受的了即可

5.2 AttributeError: ‘Box’ object has no attribute ‘n’

    action_space.n + 1, self._n_prev_action
AttributeError: 'Box' object has no attribute 'n'

修改为action_space.shape[0]

5.3 ValueError: The API of run.py has changed to be compatible with hydra

hydra读取配置错误，证明加载错了配置文件，我们验证仍然加载rl_skill，既不是projects下的yaml，也不是habitat-lab下的eval_ovmm.yaml

在这里插入图片描述
以下是hydra在本机的搜索路径

Config search path:
	provider=hydra, path=pkg://hydra.conf
	provider=main, path=file:///home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config
	provider=habitat, path=pkg://habitat.config
	provider=habitat, path=pkg://habitat_baselines.config
	provider=schema, path=structured://

根据说明，修改以下脚本，如果读者没有出现报错，不用对脚本做出调整

#/bin/bash

export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet

set -x
python -u -m habitat_baselines.run \
   --config-name ovmm/rl_skill.yaml \
   habitat_baselines.evaluate=False benchmark/ovmm=gaze \
   habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/gaze

通过官方的文档我们发现只有用habitat_baselines来做训练的，而没有做评估的示例
我们的配置文件中确实有评估用的配置文件
在这里插入图片描述
官方教程中验证使用的是projects中的eval程序，这驴唇不对马嘴，通过对比/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config/ovmm和/home/moresweet/home-robot/projects/habitat_ovmm/configs中的配置文件发现风格迥异

在这里插入图片描述
故而配置文件一定不通用

5.4 验证时无限卡

无限循环，因此我们停止查看栈信息
在这里插入图片描述

程序卡在了这里，证明程序没有找到权重文件
打印一下查找路径

好像没什么问题，通过源码调试，作者的代码只浏览一层目录，但是我们的权重在new_checkpoints的二级目录里，所以我们的目录指定应该精确到new_checkpoints/ovmm/gaze
在这里插入图片描述

5.5 RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy

RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy:
	Missing key(s) in state_dict: "action_distribution.linear.weight", "action_distribution.linear.bias". 
	Unexpected key(s) in state_dict: "net.prev_action_embedding.bias", "net.state_encoder.rnn.weight_ih_l1", "net.state_encoder.rnn.weight_hh_l1", "net.state_encoder.rnn.bias_ih_l1", "net.state_encoder.rnn.bias_hh_l1", "action_distribution.std", "action_distribution.mu_maybe_std.weight", "action_distribution.mu_maybe_std.bias". 
	size mismatch for net.prev_action_embedding.weight: copying a param with shape torch.Size([32, 5]) from checkpoint, the shape in current model is torch.Size([6, 32]).
	size mismatch for net.visual_encoder.backbone.conv1.0.weight: copying a param with shape torch.Size([32, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([32, 6, 7, 7]).
	size mismatch for net.visual_encoder.compression.0.weight: copying a param with shape torch.Size([102, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([26, 256, 3, 3]).
	size mismatch for net.visual_encoder.compression.1.weight: copying a param with shape torch.Size([102]) from checkpoint, the shape in current model is torch.Size([26]).
	size mismatch for net.visual_encoder.compression.1.bias: copying a param with shape torch.Size([102]) from checkpoint, the shape in current model is torch.Size([26]).
	size mismatch for net.visual_fc.1.weight: copying a param with shape torch.Size([512, 2040]) from checkpoint, the shape in current model is torch.Size([512, 2080]).

模型文件加载的时候对不上，检查配置文件中的模型所在路径

注：filter是过滤掉，而不是满足条件保留

Reference

[1] Yenamandra S, Ramachandran A, Yadav K, et al. Homerobot: Open-vocabulary mobile manipulation[J]. arXiv preprint arXiv:2306.11565, 2023.
[2] home-robot主页https://github.com/facebookresearch/home-robot
[3] OVMM挑战赛专项文档https://github.com/facebookresearch/home-robot/blob/main/docs/challenge.md
[4] home-robot OVMM 示例项目https://github.com/facebookresearch/home-robot/blob/main/projects/habitat_ovmm/README.md
[5] habitat-baselines 说明文档https://github.com/facebookresearch/habitat-lab/tree/b727ca9f7123101aaedb737ca9ccc1b153525dd9/habitat-baselines
[6] habitat-lab hydra参数说明文档https://github.com/facebookresearch/habitat-lab/blob/b727ca9f7123101aaedb737ca9ccc1b153525dd9/habitat-lab/habitat/config/README.md