nova VirtualInterfaceCreateException (by quqi99)

作者:张华 发表于:2022-09-01
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

问题

虚机有时候会报下列错误:

nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
...
2022-07-19 13:50:32.084 147039 WARNING nova.virt.libvirt.driver [req-7b6da117-d40c-4bac-9f82-50c4266e1617 66d3188e9f24466f8d9c3905f178d12a ca6332100f1d42e4aa94aa2c37f243e4 - d476d9f579154d49961c29942a76d1c0 d476d9f579154d49961c29942a76d1c0] [instance: 6fb0fc5d-4fa3-435c-9f60-b9953d11adb7] Timeout waiting for [('network-vif-plugged', '864b25d9-276b-4644-916a-7637762b93e2')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

初步分析

没问题的虚机有下列日志:

Provisioning for port 2e96ade5-3a66-4045-b631-597995c07d5b completed by entity DHCP. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133
Provisioning for port 2e96ade5-3a66-4045-b631-597995c07d5b completed by entity L2. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133

有问题的虚机有下列日志, 它比上面的少了’completed by entity L2. provisioning_complete '.

For every failed VM (due to the vif plugged event timeout) neutron server log contain only following
Provisioning for port 864b25d9-276b-4644-916a-7637762b93e2 completed by entity DHCP. provisioning_complete /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:133

这样它就不会像下列这样将port设置为ACTIVE(https://docs.openstack.org/neutron/latest/contributor/internals/provisioning_blocks.html), 然后会引起nova端报的超时问题.

Transition to ACTIVE for port object 864b25d9-276b-4644-916a-7637762b93e2 will not be triggered until provisioned by entity L2. add_provisioning_component /usr/lib/python3/dist-packages/neutron/db/provisioning_blocks.py:73

neutron-l2-agent有时会花47分钟来完成一个循环.

2022-08-04 19:24:53.758 95691 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5184b270-3ac4-4777-a974-8d629599cd88 - - - - -] Agent rpc_loop - iteration:2935733 started
2022-08-04 20:11:40.496 95691 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-5184b270-3ac4-4777-a974-8d629599cd88 - - - - -] Agent rpc_loop - iteration:2935733 completed. Processed ports statistics: {'regular': {'added': 0, 'updated': 2, 'removed': 2}}. Elapsed:2806.738

下列脚本能将所有neutron-l2-agent的循环时间给列出来(下面只贴出了部分输出, 它显示有时候循环花很长时间):

$ for i in `seq 2 22`; do echo "neutron-openvswitch-agent.log.$i.gz: " ; zgrep "iteration.* completed" neutron-openvswitch-agent.log.$i.gz | awk '{ print $(NF) }' | cut -d ":" -f2 | sort -gr | head -n 6; done
neutron-openvswitch-agent.log.2.gz:
...
neutron-openvswitch-agent.log.5.gz:
2806.738
2803.673
1410.152
1410.006
925.405
925.316

在上面2022-08-04 19:24:53 - 2022-08-04 20:11:40这段时间也都能看到所报的错误,从而验证所报的错误应该就是这个循环造成的.

$zgrep "Timeout waiting for \[('network-vif-plugged'" nova-compute.log.5.gz                 
2022-08-04 19:30:54.076 147039 WARNING nova.virt.libvirt.driver [req-acfb3bd2-565f-48e7-91e8-f109446895c1 f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - 6113214bdbed491c880c23149c25b7cb 6113214bdbed491c880c23149c25b7cb] [instance: 94d4ab46-799d-44fc-9991-9a57ec18e950] Timeout waiting for [('network-vif-plugged', '43b7f269-0eae-44c4-990f-53c98743e12b')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds
...
2022-08-04 20:10:06.211 147039 WARNING nova.virt.libvirt.driver [req-5b035412-a797-453a-b3e8-098dead294dd f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - 6113214bdbed491c880c23149c25b7cb 6113214bdbed491c880c23149c25b7cb] [instance: 19da6ffc-dcc1-4620-b3b2-2018a3355b8b] Timeout waiting for [('network-vif-plugged', '71190281-53ce-4558-82cb-0240e71b9245')] for instance with vm_state building and task_state spawning.: eventlet.timeout.Timeout: 300 seconds

进一步分析 - focus在neutron-l2-agent端

这段时间neutron-l2-agent在做什么呢?它在更新SG.

2022-07-28 21:10:34.988 95691 INFO neutron.agent.securitygroups_rpc [req-0be98a3b-1674-486c-aeb4-67d49e413757 f7ca0fcc30cf47c0b60b260bab9dd7bd e14e421176884fb492a71e1c9be13fa3 - - -] Security group member upd
ated {'1f4da925-7e78-4e7c-ac07-8eae09fc2fda', '4de2b2bf-5302-4cf4-96de-949ea78ca8eb'}

通过下列方式要来neutron DB.

pass=`juju run --unit mysql/0 'leader-get root-password'`
juju run --unit mysql/0 "mysqldump -u root --password=$pass --single-transaction --skip-lock-tables --set-gtid-purged=OFF --databases neutron --quick --result-file=/tmp/neutron.sql"

分析到有的SG中979上的LB, 有高达1994个active ports (This is for security group:4d15a8fa-3adc-4750-87da-c3d26902dbca network:4a57f171-1e6b-4ae4-9ec5-2203144817b7 (lb-mgmt-net).):

select sgb.security_group_id, p.network_id, n.name, count(sgb.port_id) from securitygroupportbindings sgb join ports p on p.id=sgb.port_id join networks n on n.id=p.network_id group by 1,2 order by 4 desc limit 10;
+--------------------------------------+--------------------------------------+-------------------+--------------------+
| security_group_id                    | network_id                           | name              | count(sgb.port_id) |
+--------------------------------------+--------------------------------------+-------------------+--------------------+
| 4d15a8fa-3adc-4750-87da-c3d26902dbca | 4a57f171-1e6b-4ae4-9ec5-2203144817b7 | lb-mgmt-net       |               1994 |
| 7f9e022e-f12d-4526-8e11-b09f5ed2e0e5 | 4a8c719c-04e7-490c-b7d2-b99e79e7e79f | network_lb_az3    |                241 |

select count(id) from load_balancer where provisioning_status='ACTIVE';
+-----------+
| count(id) |
+-----------+
|       979 |
+-----------+
...

相关代码如下:

#https://github.com/openstack/neutron/blob/stable/ussuri/neutron/agent/securitygroups_rpc.py#L214
    def security_groups_member_updated(self, security_groups):
        LOG.info("Security group "
                 "member updated %r", security_groups)
        self._security_group_updated(
            security_groups,
            'security_group_source_groups',
            'sg_member')
            
#https://github.com/openstack/neutron/blob/stable/ussuri/neutron/agent/securitygroups_rpc.py#L206
    def security_groups_rule_updated(self, security_groups):
        LOG.info("Security group "
                 "rule updated %r", security_groups)
        self._security_group_updated(
            security_groups,
            'security_groups',
            'sg_rule')

这个是FW rpc的实现机制:
https://github.com/openstack/neutron/blob/stable/ussuri/doc/source/contributor/internals/openvswitch_firewall.rst#firewall-api-calls
https://github.com/openstack/neutron/blob/stable/ussuri/doc/source/contributor/internals/rpc_callbacks.rst

There are two main calls performed by the firewall driver in order to either create or update a port with security groups - prepare_port_filter and update_port_filter. Both methods rely on the security group objects that are already defined in the driver and work similarly to their iptables counterparts. The definition of the objects will be described later in this document. prepare_port_filter must be called only once during port creation, and it defines the initial rules for the port. When the port is updated, all filtering rules are removed, and new rules are generated based on the available information about security groups in the driver.

Security group rules can be defined in the firewall driver by calling update_security_group_rules, which rewrites all the rules for a given security group. If a remote security group is changed, then update_security_group_members is called to determine the set of IP addresses that should be allowed for this remote security group. Calling this method will not have any effect on existing instance ports. In other words, if the port is using security groups and its rules are changed by calling one of the above methods, then no new rules are generated for this port. update_port_filter must be called for the changes to take effect.

All the machinery above is controlled by security group RPC methods, which mean the firewall driver doesn't have any logic of which port should be updated based on the provided changes, it only accomplishes actions when called from the controller.

回顾SG知识

客户是在用openvswitch而不是iptables作为FW driver, 所以这里与是否用ipset无关.之前和SG和flow有关的blog:

  • OpenStack Security - https://zhhuabj.blog.csdn.net/article/details/78435072
  • Debug OpenvSwitch - https://blog.csdn.net/quqi99/article/details/111831695
    那里面提到了这么一个lp bug (https://bugs.launchpad.net/neutron/+bug/1907491), 它说了这么一个问题,为一个VM创建一个具有remote SG的SG时,the conjunctive flows that match the remote-group’s member IPs are created,但当删除fix-ip port时,这人conjunctive flow却不会被删除(patch(https://review.opendev.org/c/openstack/neutron/+/766775/1/neutron/agent/linux/openvswitch_firewall/firewall.py)匹配流时使用ovs_lib.COOKIE_ANY就能找到并删除了)
# Run the following commands as demo user
# Create a server so get an active port associated with the default security group
$ openstack server list
+--------------------------------------+---------+--------+----------------------------------------------------------------------+--------------------------+----------+
| ID                                   | Name    | Status | Networks                                                             | Image                    | Flavor   |
+--------------------------------------+---------+--------+----------------------------------------------------------------------+--------------------------+----------+
| a07caed7-6cff-4e7c-bf6b-eef572934e55 | test-vm | ACTIVE | private=10.0.0.8, fdfd:3244:7253:0:f816:3eff:fe5c:4498, 192.168.0.83 | cirros-0.5.1-x86_64-disk | m1.small |
+--------------------------------------+---------+--------+----------------------------------------------------------------------+--------------------------+----------+

# Create ingress rules using remote-ip and default group as the remote-group
$ openstack security group rule create --remote-ip 8.8.8.8/32 --proto tcp --dst-port 80 default
$ openstack security group rule create --remote-group default --proto tcp --dst-port 443 default

$ openstack security group rule list
+--------------------------------------+-------------+-----------+------------+------------+-----------+--------------------------------------+----------------------+--------------------------------------+
| ID                                   | IP Protocol | Ethertype | IP Range   | Port Range | Direction | Remote Security Group                | Remote Address Group | Security Group                       |
+--------------------------------------+-------------+-----------+------------+------------+-----------+--------------------------------------+----------------------+--------------------------------------+
| 349fc8a0-ee67-4ab3-98c9-87091682fca2 | tcp         | IPv4      | 8.8.8.8/32 | 80:80      | ingress   | None                                 | None                 | 9508b291-181b-43ea-9635-dc293c0a2399 |
| 66bba154-a035-40a7-86f6-ddfbf772526b | tcp         | IPv4      | 0.0.0.0/0  | 443:443    | ingress   | 9508b291-181b-43ea-9635-dc293c0a2399 | None                 | 9508b291-181b-43ea-9635-dc293c0a2399 |
+--------------------------------------+-------------+-----------+------------+------------+-----------+--------------------------------------+----------------------+--------------------------------------+

# Create another port with fixed ip and associates with the default security group
$ openstack port create --network private --fixed-ip subnet=9d50b062-8699-4fc3-a250-1c0b4147357a test-port
$ openstack port show 7271f67c-cfac-42bc-abc1-388e8e4db3ab -c fixed_ips -c security_group_ids
+--------------------+-----------------------------------------------------------------------------------------------------+
| Field              | Value                                                                                               |
+--------------------+-----------------------------------------------------------------------------------------------------+
| fixed_ips          | ip_address='10.0.0.40', subnet_id='9d50b062-8699-4fc3-a250-1c0b4147357a'                            |
|                    | ip_address='fdfd:3244:7253:0:f816:3eff:fef3:ee1e', subnet_id='abfcc07a-6a61-4e09-8232-76d162f2b341' |
| security_group_ids | 9508b291-181b-43ea-9635-dc293c0a2399                                                                |
+--------------------+-----------------------------------------------------------------------------------------------------+

# verify the flows are generated for the ips:
sudo ovs-ofctl dump-flows br-int | grep "8.8.8.8"
 cookie=0x2677522778c1e14b, duration=324.328s, table=82, n_packets=0, n_bytes=0, idle_age=12295, priority=77,ct_state=+est-rel-rpl,tcp,reg5=0x16,nw_src=8.8.8.8,tp_dst=80 actions=output:22
 cookie=0x2677522778c1e14b, duration=324.328s, table=82, n_packets=0, n_bytes=0, idle_age=12295, priority=77,ct_state=+new-est,tcp,reg5=0x16,nw_src=8.8.8.8,tp_dst=80 actions=ct(commit,zone=NXM_NX_REG6[0..15]),output:22,resubmit(,92)

$ sudo ovs-ofctl dump-flows br-int | grep "10.0.0.40"
 cookie=0xde9c1cb2e3da8a74, duration=65.219s, table=82, n_packets=0, n_bytes=0, idle_age=65, priority=73,ct_state=+est-rel-rpl,ip,reg6=0x1,nw_src=10.0.0.40 actions=conjunction(38,1/2)
 cookie=0xde9c1cb2e3da8a74, duration=65.219s, table=82, n_packets=0, n_bytes=0, idle_age=65, priority=73,ct_state=+new-est,ip,reg6=0x1,nw_src=10.0.0.40 actions=conjunction(39,1/2)

# Delete the remote-ip rule and the flows are gone:
$ openstack security group rule delete 349fc8a0-ee67-4ab3-98c9-87091682fca2
$ sudo ovs-ofctl dump-flows br-int | grep "8.8.8.8"
(Empty)

# Unset the fixed-ip from the seperate port and flows are not deleted:
$ openstack port unset --fixed-ip ip-address='10.0.0.40',subnet=9d50b062-8699-4fc3-a250-1c0b4147357a 7271f67c-cfac-42bc-abc1-388e8e4db3ab
$ sudo ovs-ofctl dump-flows br-int | grep "10.0.0.40"
 cookie=0xde9c1cb2e3da8a74, duration=65.219s, table=82, n_packets=0, n_bytes=0, idle_age=65, priority=73,ct_state=+est-rel-rpl,ip,reg6=0x1,nw_src=10.0.0.40 actions=conjunction(38,1/2)
 cookie=0xde9c1cb2e3da8a74, duration=65.219s, table=82, n_packets=0, n_bytes=0, idle_age=65, priority=73,ct_state=+new-est,ip,reg6=0x1,nw_src=10.0.0.40 actions=conjunction(39,1/2)

可能触发的bug

上面的lp bug 1907491在使用exeve来调用ovs-ofctl来删除conjunctive flow, 如果有很多port那就会有很多fixed-ip那么remote SG里有很多rules, 这样删除起来估计就会花很长时间.所以也就有了lp bug 1975674(https://bugs.launchpad.net/neutron/+bug/1975674),它来批量删除conjunctive flow, 也就是:

self._delete_flows(deferred=False, **flow)

得改成:

self._delete_flows(**flow)

这个看起来可能性非常大.客户在用focal (neutron-common=2:16.4.2-0ubuntu1)也就是ussuri, patch在stable/ussuri中,却不在16.4.2中:

hua@t440p:/bak/openstack/neutron$ git tag --contains 30ef996f8aa0b0bc57a280690871f1081946ffee
hua@t440p:/bak/openstack/neutron$ git branch -r --contains 30ef996f8aa0b0bc57a280690871f1081946ffee
  origin/stable/ussuri

另一个非常有可能的lp bug是 - https://bugs.launchpad.net/neutron/+bug/1813703 , 它考虑的是更通过的如RPC client与server超时的问题.

重现问题

通过下列方法重现了问题. 首先迅速搭建openstack环境:

./generate-bundle.sh --name focal --series focal --num-compute 1  --use-stable-charms --run
./tools/vault-unseal-and-authorise.sh
./configure
source novarc

然后像伪造octavia的o-hm0一样,不创建虚机,只是’neutron port-create’创建port,然后’ovs-vsctl add-port’手工创建port,这样这些port也会根据server端的数据来创建流.

1, create 300 ports binding to one host

#openstack security group rule create --remote-ip 8.8.8.8/32 --proto tcp --dst-port 80 default
#openstack security group rule create --remote-group default --proto tcp --dst-port 443 default
#openstack security group rule list
HOST='juju-43efce-focal-9.cloud.sts'
NETWORK_ID=$(openstack network show private -cid -fvalue)
PROJECT_ID=$(openstack project show --domain admin_domain admin -f value -c id)
SECGRP_ID=$(openstack security group list --project ${PROJECT_ID} | awk '/default/ {print $2}')
openstack quota set $PROJECT_ID --ram 262144 --cores 200 --instances 100 --secgroup-rules 2000 --secgroups 200  --ports 500 --gigabytes 2000 --volumes 60
for i in {1..300}
do
	neutron port-create --name test-large-scale-port-$i \
	  --security-group $SECGRP_ID \
	  --device-owner testing:scale \
	  --binding:host_id=$HOST $NETWORK_ID
done

2, create 500 - 1000 security-group rules to group <security_group_id>

#https://specs.openstack.org/openstack/neutron-specs/specs/victoria/address-groups-support-in-security-group-rule.html
NEUTRON_IP=$(juju status neutron-api/0 |awk '/ACTIVE/ {print $3}')
SCHEMA="http"
export AUTH_TOKEN="$(openstack token issue -c id -f value)"
for i in {3000..4000}
do
cat << EOF |tee tmp.json
{
  "security_group_rule": {
    "direction": "ingress",
    "protocol": "tcp",
    "ethertype": "IPv4",
    "port_range_min": "3000",
    "port_range_max": "$i",
    "security_group_id": "$SECGRP_ID"
  }
}
EOF
curl -g -i -X POST ${SCHEMA}://${NEUTRON_IP}:9696/v2.0/security-group-rules \
  -H "User-Agent: python-neutronclient" -H "Content-Type: application/json" \
  -H "Accept: application/json" -H "X-Auth-Token: ${AUTH_TOKEN}" \
  -d @tmp.json
done

3, Run the following commands on compute host to setup the port to the host <compute_node_host_name>

#openstack port list -fvalue |grep test-large-scale-port > port_list && juju scp ./port_list nova-compute/0:~
for p in `cat port_list |awk '{print $1}'`
do
    mac=`grep $p port_list |awk '{print $3}'`
    ip_addr=`grep $p port_list |awk '{print $7}' |awk -F\' '{print $2}'`
    dev_id=`echo $p |cut -b 1-11`
    dev_name="tp-$dev_id"
    echo "========"$mac"======"$ip_addr"======"$dev_id"======="$dev_name
    ovs-vsctl  --may-exist add-port br-int ${dev_name} -- set Interface ${dev_name} type=internal \
    -- set Interface ${dev_name} external-ids:attached-mac="${mac}" \
    -- set Interface ${dev_name} external-ids:iface-id="${p}" \
    -- set Interface ${dev_name} external-ids:iface-status=active
    sleep 0.2
    ip link set dev ${dev_name} address ${mac}
    ip addr add ${ip_addr} dev ${dev_name}
    ip link set ${dev_name} up
done

4, flow verify

# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep conjunction |grep 192 |wc -l
504
# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep conjunction |grep 192 |head -n1
 cookie=0x556a0a63a181157b, duration=54.344s, table=82, n_packets=0, n_bytes=0, priority=70,ct_state=+est-rel-rpl,ip,reg6=0x1,nw_src=192.168.21.177 actions=conjunction(8,1/2)
# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep '192.168.21.177'
 cookie=0x556a0a63a181157b, duration=80.843s, table=82, n_packets=0, n_bytes=0, priority=70,ct_state=+est-rel-rpl,ip,reg6=0x1,nw_src=192.168.21.177 actions=conjunction(8,1/2)
 cookie=0x556a0a63a181157b, duration=80.844s, table=82, n_packets=0, n_bytes=0, priority=70,ct_state=+new-est,ip,reg6=0x1,nw_src=192.168.21.177 actions=conjunction(9,1/2)
# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep 'conjunction(8,2/2)' |head -n1
 cookie=0x556a0a63a181157b, duration=220.995s, table=82, n_packets=0, n_bytes=0, priority=70,ct_state=+est-rel-rpl,ip,reg5=0x8 actions=conjunction(8,2/2)

5, agent verify, compute-compute agent is down

$ juju status nova-compute/0 |grep down
9        down   10.5.1.45  8b0ba05a-0438-46d0-a523-4000c30054ac  focal   nova  ACTIVE

#underlying
$ source ~/novarc
$ nova list |grep juju-43efce-focal-9
| 8b0ba05a-0438-46d0-a523-4000c30054ac | juju-43efce-focal-9      | ACTIVE  | -          | Running     | zhhuabj_admin_net=10.5.1.45              |

$ openstack network agent list
+--------------------------------------+--------------------+-------------------------------+-------------------+-------+-------+---------------------------+
| ID                                   | Agent Type         | Host                          | Availability Zone | Alive | State | Binary                    |
+--------------------------------------+--------------------+-------------------------------+-------------------+-------+-------+---------------------------+
| 11f1ed0c-1bfc-4155-ad06-af96e78f53fe | L3 agent           | juju-43efce-focal-7           | nova              | :-)   | UP    | neutron-l3-agent          |
| 282206fe-2ee0-4946-9120-5792d65b5802 | DHCP agent         | juju-43efce-focal-7           | nova              | :-)   | UP    | neutron-dhcp-agent        |
| 34bdf0f7-fd63-4ef9-b499-3c9a4e427f7b | DHCP agent         | juju-43efce-focal-9.cloud.sts | nova              | XXX   | UP    | neutron-dhcp-agent        |
| 632c482d-94ff-4edd-a318-cefb43be12b0 | Metadata agent     | juju-43efce-focal-9.cloud.sts | None              | XXX   | UP    | neutron-metadata-agent    |
| 6c7fcd2b-9be5-4316-ae29-87767e0f348a | Open vSwitch agent | juju-43efce-focal-7           | None              | :-)   | UP    | neutron-openvswitch-agent |
| 96e0d8ba-f371-4b98-ba0f-0da0218531ee | Metadata agent     | juju-43efce-focal-7           | None              | :-)   | UP    | neutron-metadata-agent    |
| cff16c66-0ebf-488d-ad8b-a6bb9035e582 | Open vSwitch agent | juju-43efce-focal-9.cloud.sts | None              | XXX   | UP    | neutron-openvswitch-agent |
| f08b55d1-e177-4598-aaaf-0426ee5f1368 | Metering agent     | juju-43efce-focal-7           | None              | :-)   | UP    | neutron-metering-agent    |
+--------------------------------------+--------------------+-------------------------------+-------------------+-------+-------+---------------------------+


5, 在重启(nova reboot --hard juju-43efce-focal-9)计算节点后,这时才能登录进去,看到了:

root@juju-43efce-focal-9:/home/ubuntu# zgrep "iteration.* completed" /var/log/neutron/neutron-openvswitch-agent.log* | awk '{ print $(NF) }' | cut -d ":" -f2 | sort -gr | head -n 6
191.011
2.566
1.528
0.856
0.166
0.156

测试patch

直接修改源码:

#https://opendev.org/openstack/neutron/commit/30ef996f8aa0b0bc57a280690871f1081946ffee
#vim /usr/lib/python3/dist-packages/neutron/agent/linux/openvswitch_firewall/firewall.py
systemctl restart neutron-openvswitch-agent

然后重新运行:

for p in `cat port_list |awk '{print $1}'`
do
    mac=`grep $p port_list |awk '{print $3}'`
    ip_addr=`grep $p port_list |awk '{print $7}' |awk -F\' '{print $2}'`
    dev_id=`echo $p |cut -b 1-11`
    dev_name="tp-$dev_id"
    echo "========"$mac"======"$ip_addr"======"$dev_id"======="$dev_name
    ovs-vsctl  --may-exist add-port br-int ${dev_name} -- set Interface ${dev_name} type=internal \
    -- set Interface ${dev_name} external-ids:attached-mac="${mac}" \
    -- set Interface ${dev_name} external-ids:iface-id="${p}" \
    -- set Interface ${dev_name} external-ids:iface-status=active
    sleep 0.2
    ip link set dev ${dev_name} address ${mac}
    ip addr add ${ip_addr} dev ${dev_name}
    ip link set ${dev_name} up
done

此时计算节点没有down, 且不能正常创建虚机"./tools/instance_launch.sh 1 focal" (没有IP了,需先删除一个port才能创建虚机:openstack port delete test-large-scale-port-103),此时,又在neutron-server.log里看到dhcp-agent DOWN了

换个方法重现

上面的方法没有重现问题,继续用下列方法来试图重现lp bug: https://bugs.launchpad.net/neutron/+bug/1975674

 Create a VM with security group A
  - Add a rule to security group A allowing access from a remote security group B
  - Add a large number or ports to security group B (e.g. 2000)
    - The respective ovs flows will be added
  - Delete the VM
    - The ovs flows will be removed

./generate-bundle.sh --name focal --series focal --num-compute 1 --use-stable-charms --run
neutron security-group-create ssh
neutron security-group-create web
neutron security-group-rule-create --direction ingress --protocol tcp  --port-range-min 22 --port-range-max 22 ssh
#Allow TCP port 80 access from IP addresses, specified as IP subnet 0.0.0.0/0 in CIDR notation.
neutron security-group-rule-create --direction ingress --protocol tcp  --port-range-min 80 --port-range-max 80 web
#Allow TCP port 22 and 23 addresses from other security groups (ssh) to access the specified port
neutron security-group-rule-create --direction ingress --protocol tcp  --port-range-min 22 --port-range-max 22 --remote-group-id ssh web
neutron security-group-rule-create --direction ingress --protocol tcp  --port-range-min 23 --port-range-max 23 --remote-group-id ssh web

#Add a large number or ports to security group ssh (e.g. 2000)
NEUTRON_IP=$(juju status neutron-api/0 |awk '/ACTIVE/ {print $3}')
SCHEMA="http"
export AUTH_TOKEN="$(openstack token issue -c id -f value)"
NETWORK_ID=$(openstack network show private -cid -fvalue)
PROJECT_ID=$(openstack project show --domain admin_domain admin -f value -c id)
SECGRP_ID=$(openstack security group list --project ${PROJECT_ID} | awk '/ssh/ {print $2}')
openstack quota set $PROJECT_ID --ram 262144 --cores 200 --instances 100 --secgroup-rules 2000 --secgroups 200  --ports 500 --gigabytes 2000 --volumes 60
for i in {2000..4000}
do
cat << EOF |tee tmp.json
{
  "security_group_rule": {
    "direction": "ingress",
    "protocol": "tcp",
    "ethertype": "IPv4",
    "port_range_min": "2000",
    "port_range_max": "$i",
    "security_group_id": "$SECGRP_ID"
  }
}
EOF
curl -g -i -X POST ${SCHEMA}://${NEUTRON_IP}:9696/v2.0/security-group-rules \
  -H "User-Agent: python-neutronclient" -H "Content-Type: application/json" \
  -H "Accept: application/json" -H "X-Auth-Token: ${AUTH_TOKEN}" \
  -d @tmp.json
done

#create a test VM with security group 'web'
openstack keypair create --public-key ~/.ssh/id_rsa.pub mykey
openstack server create --wait --image focal --flavor m1.small --key-name mykey --nic net-id=$NETWORK_ID --security-group web i1

可惜没测试成功, 删除’openstack server delete i1’时下列flow也能成功被删除.

# ovs-ofctl -O OpenFlow13 dump-flows br-int |grep '192.168.21'
 cookie=0x1a6d745e5627d342, duration=90.519s, table=71, n_packets=6, n_bytes=252, priority=95,arp,reg5=0x4,in_port=4,dl_src=fa:16:3e:88:8c:d5,arp_spa=192.168.21.58 actions=resubmit(,94)
 cookie=0x1a6d745e5627d342, duration=90.519s, table=71, n_packets=328, n_bytes=38542, priority=65,ip,reg5=0x4,in_port=4,dl_src=fa:16:3e:88:8c:d5,nw_src=192.168.21.58 actions=ct(table=72,zone=NXM_NX_REG6[0..15])
 cookie=0x1a6d745e5627d342, duration=90.518s, table=71, n_packets=0, n_bytes=0, priority=80,udp,reg5=0x4,in_port=4,dl_src=fa:16:3e:88:8c:d5,nw_src=192.168.21.58,tp_src=68,tp_dst=67 actions=resubmit(,73)

猜你喜欢

转载自blog.csdn.net/quqi99/article/details/126643630