因办公机房异常停电,openstack实验环境集群无法再常启用,尝试用kolla-ansible工具,重启集群。
一、环境
[root@kolla-ansible-master ~]# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)
[root@kolla-ansible-master ~]# ansible --version
ansible 2.7.18
[root@kolla-ansible-master ~]# pip list | grep kolla-ansible
kolla-ansible 7.2.2.dev9
[root@kolla-ansible-master ~]# openstack --version
openstack 5.2.1
二、记录
1、状态
来电重启机器群后,部分openstack容器异常重启,集群不能正常工作
kolla-ansible-master:4000/kolla/centos-source-heat-engine:rocky "dumb-init --single-…" 15 months ago Up About a minute heat_engine
c07e5d01adce kolla-ansible-master:4000/kolla/centos-source-heat-api-cfn:rocky "dumb-init --single-…" 15 months ago Restarting (1) 1 second ago heat_api_cfn
88b7a106dcd8 kolla-ansible-master:4000/kolla/centos-source-heat-api:rocky "dumb-init --single-…" 15 months ago Restarting (1) Less than a second ago heat_api
82b5983614e0 kolla-ansible-master:4000/kolla/centos-source-neutron-server:rocky "dumb-init --single-…" 15 months ago Up About a minute neutron_server
feaf96f16403 kolla-ansible-master:4000/kolla/centos-source-nova-compute-ironic:rocky "dumb-init --single-…" 15 months ago Up About a minute nova_compute_ironic
cb9184ff5506 kolla-ansible-master:4000/kolla/centos-source-nova-novncproxy:rocky "dumb-init --single-…" 15 months ago Up About a minute nova_novncproxy
17bf7758070d kolla-ansible-master:4000/kolla/centos-source-nova-consoleauth:rocky "dumb-init --single-…" 15 months ago Up About a minute nova_consoleauth
619d66b56612 kolla-ansible-master:4000/kolla/centos-source-nova-conductor:rocky "dumb-init --single-…" 15 months ago Up About a minute nova_conductor
249b423c2728 kolla-ansible-master:4000/kolla/centos-source-nova-scheduler:rocky "dumb-init --single-…" 15 months ago Up About a minute nova_scheduler
beace5f229e2 kolla-ansible-master:4000/kolla/centos-source-nova-api:rocky "dumb-init --single-…" 15 months ago Restarting (1) 5 seconds ago nova_api
2、检查问题
检查日志和容器发现nova-api异常,不断重器
“Restarting (1) 5 seconds ago nova_api”,
而其之下的服务运行正常。
3、尝试修复
3.1 停止虚拟机Server
[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | ACTIVE | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+--------+-----------------------+--------+---------+
[root@kolla-ansible-master ~]# openstack server stop demo1
3.2 停止Nova服务
[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05 stop --tags nova
Stop Kolla containers : ansible-playbook -i ./multinode05 -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla --tags nova /usr/share/kolla-ansible/ansible/stop.yml
PLAY [all] ******************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************
ok: [localhost]
ok: [compute01]
ok: [compute03]
ok: [compute02]
ok: [network01]
ok: [controller01]
PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01 : ok=1 changed=0 unreachable=0 failed=0
compute02 : ok=1 changed=0 unreachable=0 failed=0
compute03 : ok=1 changed=0 unreachable=0 failed=0
controller01 : ok=1 changed=0 unreachable=0 failed=0
localhost : ok=1 changed=0 unreachable=0 failed=0
network01 : ok=1 changed=0 unreachable=0 failed=0
3.3 重启Nova
[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05 deploy --tags nova
PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01 : ok=42 changed=0 unreachable=0 failed=0
compute02 : ok=42 changed=0 unreachable=0 failed=0
compute03 : ok=42 changed=0 unreachable=0 failed=0
controller01 : ok=56 changed=2 unreachable=0 failed=0
localhost : ok=2 changed=0 unreachable=0 failed=0
network01 : ok=2 changed=0 unreachable=0 failed=0
4.4 重启虚拟机
[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | SHUTOFF | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
[root@kolla-ansible-master ~]# openstack server start demo1
[root@kolla-ansible-master ~]#
写在最后:生产环境下,断电异常的概率极小,日常以替换某个设备、主机为主。实验环境下,也完全可以重新部署,这里仅记录一种修复集群的思路。