文章目录
Open Vocabulary Mobile Manipulation (OVMM) Challenge复现
1. 源码拉取以及示docker镜像获取
git clone https://github.com/facebookresearch/home-robot.git --branch home-robot-ovmm-challenge-2023-v0.1.2
git clone https://github.com/facebookresearch/home-robot.git --branch home-robot-ovmm-challenge-2023-v0.1.2
移动目录
cd projects/habitat_ovmm
修改docker文件,docker的存放位置变更
https://cloud.tencent.com/developer/article/1806464?areaSource=102001.13&traceId=EKfzQGOFpKhiAkWHuAxSf
vim projects/habitat_ovmm/docker/ovmm_baseline.Dockerfile
根据本机情况,修改第一行的镜像信息,笔者是ubuntu20.04
FROM fairembodied/habitat-challenge:homerobot-ovmm-challenge-2023-ubuntu20.04
# install dependencies in the home-robot conda environment
RUN /bin/bash -c "\
. activate home-robot \
&& pip install <some extra package> \
"
# add your agent code
ADD your_agent.py /home-robot/projects/habitat_ovmm/agent.py
# add submission script
ADD scripts/submission.sh /home-robot/submission.sh
# set evaluation type to remote
ENV AGENT_EVALUATION_TYPE remote
# run submission script
CMD /bin/bash -c "\
. activate home-robot \
&& cd /home-robot \
&& export PYTHONPATH=/evalai_remote_evaluation:$PYTHONPATH \
&& bash submission.sh \
"
编译docker
docker build . \
-f docker/ovmm_baseline.Dockerfile \
-t ovmm_baseline_submission
--network host
[Core] ManagedContainerBase.h(329)::checkExistsWithMessage : ::getObjectByHandle : Unknown Lighting Layout managed object handle : . Aborting
2. habitat_baslines ovmm示例
python projects/habitat_ovmm/eval_baselines_agent.py --env_config projects/habitat_ovmm/configs/env/hssd_demo.yaml
AttributeError: module ‘PIL.Image’ has no attribute ‘LINEAR’
https://github.com/facebookresearch/detectron2/issues/5010
python3 -m pip install -U 'git+https://github.com/facebookresearch/detectron2.git@ff53992b1985b63bd3262b5a36167098e3dada02'
过程图片在/home/moresweet/home-robot/datadump/images/eval_hssd
中
3. 强化学习DDPG Demo
cd /path/to/home-robot/src/third_party/habitat-lab/
# create soft link to data/ directory
ln -s /path/to/home-robot/data data
cd src/third_party/habitat-lab
# 将data链接
ln -s ln -s /home/moresweet/home-robot/data data
vim run.sh
单机训练使用以下脚本内容
#/bin/bash
export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet
set -x
python -u -m habitat_baselines.run \
--config-name ovmm/rl_skill.yaml \
habitat_baselines.evaluate=False benchmark/ovmm=gaze \
habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/gaze
#/bin/bash
export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet
set -x
python -u -m habitat_baselines.run \
--exp-config habitat-baselines/habitat_baselines/config/ovmm/rl_skill.yaml \
--run-type train benchmark/ovmm=<skill_name> \
habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/<skill_name>
其中<skill_name>
替换为 gaze
, place
, nav_to_obj
或者nav_to_rec
笔者用的gaze
sudo chmod u+x run.sh
./run.sh
日志默认在/home/moresweet/home-robot/src/third_party/habitat-lab/tb
查看日志按照下列命令
cd /home/moresweet/home-robot/src/third_party/habitat-lab/tb
conda activate home-robot
tensorboard --logdir
4. 模型验证
新建脚本eval.sh
填入以下内容
#/bin/bash
python -u -m habitat_baselines.run \
--config-name ovmm/rl_skill.yaml \
habitat_baselines.evaluate=True
修改rl_skill.yaml
eval_ckpt_path_dir: "data/new_checkpoints/ovmm/gaze"
过程视频
当然,修改配置文件rl_skill.yaml
中的video_option
为["tensorboard"]
也可以在tensorboard中查看,区别不大
tensorboard --logdir=./
5. 问题解决
5.1 CUDA out of memory
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 94.00 MiB (GPU 0; 11.75 GiB total capacity; 5.26 GiB already allocated; 62.62 MiB free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception ignored in: <function VectorEnv.__del__ at 0x7f34f5cf4940>
Traceback (most recent call last):
File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 615, in __del__
File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 470, in close
File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/core/vector_env.py", line 131, in __call__
File "/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 63, in send
File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 200, in send_bytes
File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 411, in _send_bytes
File "/home/moresweet/.conda/envs/home-robot/lib/python3.9/multiprocessing/connection.py", line 368, in _send
BrokenPipeError: [Errno 32] Broken pipe
爆显存了
这种情况可以修改配置文件
vim /home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config/ovmm/rl_skill.yaml
修改num_environments
的数值,修改到自己的设备可以承受的了即可
5.2 AttributeError: ‘Box’ object has no attribute ‘n’
action_space.n + 1, self._n_prev_action
AttributeError: 'Box' object has no attribute 'n'
修改为action_space.shape[0]
5.3 ValueError: The API of run.py has changed to be compatible with hydra
hydra
读取配置错误,证明加载错了配置文件,我们验证仍然加载rl_skill
,既不是projects
下的yaml
,也不是habitat-lab
下的eval_ovmm.yaml
以下是hydra
在本机的搜索路径
Config search path:
provider=hydra, path=pkg://hydra.conf
provider=main, path=file:///home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config
provider=habitat, path=pkg://habitat.config
provider=habitat, path=pkg://habitat_baselines.config
provider=schema, path=structured://
根据说明,修改以下脚本,如果读者没有出现报错,不用对脚本做出调整
#/bin/bash
export MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quiet
set -x
python -u -m habitat_baselines.run \
--config-name ovmm/rl_skill.yaml \
habitat_baselines.evaluate=False benchmark/ovmm=gaze \
habitat_baselines.checkpoint_folder=data/new_checkpoints/ovmm/gaze
通过官方的文档我们发现只有用habitat_baselines来做训练的,而没有做评估的示例
我们的配置文件中确实有评估用的配置文件
官方教程中验证使用的是projects中的eval程序,这驴唇不对马嘴,通过对比/home/moresweet/home-robot/src/third_party/habitat-lab/habitat-baselines/habitat_baselines/config/ovmm和/home/moresweet/home-robot/projects/habitat_ovmm/configs中的配置文件发现风格迥异
故而配置文件一定不通用
5.4 验证时无限卡
无限循环,因此我们停止查看栈信息
程序卡在了这里,证明程序没有找到权重文件
打印一下查找路径
好像没什么问题,通过源码调试,作者的代码只浏览一层目录,但是我们的权重在new_checkpoints的二级目录里,所以我们的目录指定应该精确到new_checkpoints/ovmm/gaze
5.5 RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy
RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy:
Missing key(s) in state_dict: "action_distribution.linear.weight", "action_distribution.linear.bias".
Unexpected key(s) in state_dict: "net.prev_action_embedding.bias", "net.state_encoder.rnn.weight_ih_l1", "net.state_encoder.rnn.weight_hh_l1", "net.state_encoder.rnn.bias_ih_l1", "net.state_encoder.rnn.bias_hh_l1", "action_distribution.std", "action_distribution.mu_maybe_std.weight", "action_distribution.mu_maybe_std.bias".
size mismatch for net.prev_action_embedding.weight: copying a param with shape torch.Size([32, 5]) from checkpoint, the shape in current model is torch.Size([6, 32]).
size mismatch for net.visual_encoder.backbone.conv1.0.weight: copying a param with shape torch.Size([32, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([32, 6, 7, 7]).
size mismatch for net.visual_encoder.compression.0.weight: copying a param with shape torch.Size([102, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([26, 256, 3, 3]).
size mismatch for net.visual_encoder.compression.1.weight: copying a param with shape torch.Size([102]) from checkpoint, the shape in current model is torch.Size([26]).
size mismatch for net.visual_encoder.compression.1.bias: copying a param with shape torch.Size([102]) from checkpoint, the shape in current model is torch.Size([26]).
size mismatch for net.visual_fc.1.weight: copying a param with shape torch.Size([512, 2040]) from checkpoint, the shape in current model is torch.Size([512, 2080]).
模型文件加载的时候对不上,检查配置文件中的模型所在路径
注:filter是过滤掉,而不是满足条件保留
Reference
[1] Yenamandra S, Ramachandran A, Yadav K, et al. Homerobot: Open-vocabulary mobile manipulation[J]. arXiv preprint arXiv:2306.11565, 2023.
[2] home-robot主页https://github.com/facebookresearch/home-robot
[3] OVMM挑战赛专项文档https://github.com/facebookresearch/home-robot/blob/main/docs/challenge.md
[4] home-robot OVMM 示例项目https://github.com/facebookresearch/home-robot/blob/main/projects/habitat_ovmm/README.md
[5] habitat-baselines 说明文档https://github.com/facebookresearch/habitat-lab/tree/b727ca9f7123101aaedb737ca9ccc1b153525dd9/habitat-baselines
[6] habitat-lab hydra参数说明文档https://github.com/facebookresearch/habitat-lab/blob/b727ca9f7123101aaedb737ca9ccc1b153525dd9/habitat-lab/habitat/config/README.md