Skip to content

fix: populate per-network HCG for vrouters after 2026.1 upgrade#2073

Merged
skrobul merged 1 commit into
mainfrom
ovn-router-fixes-post-2026.1
Jun 16, 2026
Merged

fix: populate per-network HCG for vrouters after 2026.1 upgrade#2073
skrobul merged 1 commit into
mainfrom
ovn-router-fixes-post-2026.1

Conversation

@skrobul

@skrobul skrobul commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

After upgrading to neutron 28 (2026.1), attaching a subnet to a router whose external gateway is on a vxlan-type network leaves the per-network unified HA_Chassis_Group (neutron-<network_id>) empty. External (baremetal) ports with OVN LSP type=external on that network reference this HCG; with it empty, no chassis owns them and ARP/routing breaks for all baremetal nodes on the network.

Root cause: for a vxlan-type external gateway neutron pins the Logical_Router to one chassis via options:chassis and creates a per-router HCG (neutron-<router_id>) carrying that chassis, but never sets ha_chassis_group on the gateway LRP. neutron's link_network_ha_chassis_group (called on internal LRP creation) bails at its if not gw_lrps[0].ha_chassis_group: return guard, so it never copies the chassis into neutron-<network_id>.

Fix: subscribe to ROUTER_INTERFACE/AFTER_CREATE at PRIORITY_DEFAULT+1000 (after OVN's handler has created the LRP) with a new handler link_vxlan_network_ha_chassis_group.

The handler:

  1. Detects a vxlan-type gateway by checking that the per-router HCG (neutron-<router_id>) exists and has non-empty ha_chassis; VLAN/FLAT type gateways have no router HCG and continue to be handled by neutron unchanged, even though we are not using them.
  2. Calls ovn_utils.sync_ha_chassis_group_network_unified to populate the per-network HCG (neutron-<network_id>) from the router HCG chassis — this is what fixes external/baremetal ports that already reference it.
  3. Anchors the internal router-interface LRP (lrp-<port_id>) to the same unified network HCG so the router port is also correctly owned.

Both operations run in a single OVN NB transaction. The create_port_postcommit hook is unaffected — the internal LRP does not exist at that point (it is created later, on the same AFTER_CREATE event OVN processes first).

…upgrade

After upgrading to neutron 28 (2026.1), attaching a subnet to a router whose
external gateway is on a vxlan-type network leaves the per-network unified
HA_Chassis_Group (neutron-<network_id>) empty. External (baremetal) ports with
OVN LSP type=external on that network reference this HCG; with it empty, no
chassis owns them and ARP/routing breaks for all baremetal nodes on the network.

Root cause: for a vxlan-type external gateway neutron pins the Logical_Router to
one chassis via options:chassis and creates a per-router HCG
(neutron-<router_id>) carrying that chassis, but never sets ha_chassis_group on
the gateway LRP. neutron's link_network_ha_chassis_group (called on internal LRP
creation) bails at its `if not gw_lrps[0].ha_chassis_group: return` guard, so
it never copies the chassis into neutron-<network_id>.

Fix: subscribe to ROUTER_INTERFACE/AFTER_CREATE at PRIORITY_DEFAULT+1000 (after
OVN's handler has created the LRP) with a new handler
link_vxlan_network_ha_chassis_group. The handler:

1. Detects a vxlan-type gateway by checking that the per-router HCG
    (neutron-<router_id>) exists and has non-empty ha_chassis; VLAN/FLAT gateways
    have no router HCG and continue to be handled by neutron unchanged.
2. Calls ovn_utils.sync_ha_chassis_group_network_unified to populate the
    per-network HCG (neutron-<network_id>) from the router HCG chassis — this is
    what fixes external/baremetal ports that already reference it.
3. Anchors the internal router-interface LRP (lrp-<port_id>) to the same unified
    network HCG so the router port is also correctly owned.

Both operations run in a single OVN NB transaction. The create_port_postcommit
hook is unaffected — the internal LRP does not exist at that point (it is
created later, on the same AFTER_CREATE event OVN processes first).
@skrobul skrobul force-pushed the ovn-router-fixes-post-2026.1 branch from 49aa5e6 to 71e7448 Compare June 16, 2026 16:53
@skrobul skrobul changed the title fix: workaround vxlan gateway HCG not linked to LRP after 2026.1 upgrade fix: populate per-network HCG for vrouters after 2026.1 upgrade Jun 16, 2026
@skrobul skrobul marked this pull request as ready for review June 16, 2026 17:00
@skrobul skrobul requested a review from a team June 16, 2026 17:00

@mfencik mfencik left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@skrobul skrobul added this pull request to the merge queue Jun 16, 2026
Merged via the queue into main with commit b062e4b Jun 16, 2026
61 of 63 checks passed
@skrobul skrobul deleted the ovn-router-fixes-post-2026.1 branch June 16, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants