-
Notifications
You must be signed in to change notification settings - Fork 61
Description
Describe the bug
In an environment where OpenStack Neutron and Kube-OVN share the same OVN Northbound Database, restarting the kube-ovn-controller causes critical Neutron Security Group rules (ACLs) to be removed.
It appears that when kube-ovn-controller initializes, it performs a synchronization or garbage collection on OVN resources. During this process, it clears the acls list of Port Groups managed by Neutron (e.g., neutron_pg_drop), presumably because it does not recognize them as Kube-OVN managed resources.
As a result, OpenStack Security Groups stop functioning, and traffic filtering is no longer enforced.
To Reproduce
Steps to reproduce the behavior:
- Verify that Neutron ACLs exist in the neutron_pg_drop Port Group using ovn-nbctl.
- restart the Kube-OVN controller: kubectl rollout restart deployment/kube-ovn-controller -n kube-system.
- Check the neutron_pg_drop Port Group again.
- Observe that the acls field is now empty []
Observed Behavior (Logs)
Before Restart (Normal State) The Port Group neutron_pg_drop contains ACL UUIDs managed by Neutron.
$ ovn-nbctl list port-group | grep drop -A 3 -B 4
_uuid : cb7e49e8-de3d-4416-b6c5-7fc0f9b2106a
acls : [27819530-5460-4c04-8fee-6c75725e9abb, b5bd7b84-a35a-4a59-ba78-c2c3ce6f566c]
external_ids : {}
name : neutron_pg_drop
ports : [5d07d35f-7ce6-4f87-b055-7efa12decee0, d794dc79-299a-43e4-a063-82e93a321d3f, fd136564-2bbc-413b-8d22-15136bf30abb]
The ACLs indicate they are Neutron-managed:
$ ovn-nbctl list acl | grep external | more
external_ids : {"neutron:security_group_rule_id"="99e5c663-98c5-47ea-9368-e77c11eef66c"}
external_ids : {"neutron:security_group_rule_id"="b9d11705-fbee-4ed6-bf66-3d0791b2ed23"}
external_ids : {"neutron:security_group_rule_id"="edc79494-8551-46e5-bd75-da79b14989a7"}
external_ids : {"neutron:security_group_rule_id"="cc0514d0-8d1c-46e3-ac5e-50a242efd6a9"}
external_ids : {"neutron:security_group_rule_id"="b8b07ce3-82ab-4448-a0c4-a9ffe4cb6e23"}
external_ids : {"neutron:security_group_rule_id"="5ab1853f-8eb8-4564-bc52-b2a1b85d7ec1"}
external_ids : {"neutron:security_group_rule_id"="c4450b66-1fd6-44b0-b08c-3444fe9aaa7e"}
external_ids : {"neutron:security_group_rule_id"="0ce808eb-b376-496a-8e00-927f04320b1f"}
external_ids : {"neutron:security_group_rule_id"="3eecef51-30eb-4dbf-af66-e73ea3c7eca0"}
external_ids : {"neutron:security_group_rule_id"="ab8c5897-2625-410f-a50b-f48cabfdacc2"}
...
Triggering Restart
$ kubectl rollout restart deployment/kube-ovn-controller -n kube-system
deployment.apps/kube-ovn-controller restarted
After Restart (Bugged State) The acls list in the Port Group becomes empty. The Port Group itself remains, but rules are stripped.
$ ovn-nbctl list port-group | grep drop -A 2 -B 4
_uuid : cb7e49e8-de3d-4416-b6c5-7fc0f9b2106a
acls : []
external_ids : {}
name : neutron_pg_drop
ports : [5d07d35f-7ce6-4f87-b055-7efa12decee0, d794dc79-299a-43e4-a063-82e93a321d3f, fd136564-2bbc-413b-8d22-15136bf30abb]
Also, some ACLs seem to be overwritten with Kube-OVN default metadata:
$ ovn-nbctl list acl | grep external | more
external_ids : {parent=ovn.sg.kubeovn_deny_all}
$
Server (please complete the following information):
- OS: Ubuntu 24.04
Additional context
I have tested this behavior across different versions and confirmed the following:
v1.13.14 and below: This issue does not occur. Neutron ACLs are preserved correctly after restart.
v1.13.15: This issue starts occurring.
It appears to be a regression introduced in version 1.13.15.
Related Bug kubeovn/kube-ovn#5995