I am glad that the TAC has managed to find a solution. Thanks for posting the solution which may come in handy for future similar issues with CX.
------------------------------
Daniel Ruiz
-----------------------
Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support.
Check
https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC.
------------------------------
Original Message:
Sent: Jan 20, 2025 04:13 AM
From: tommyd
Subject: AOS-CX breaking changes between 10.11 and 10.13
I've opened a support case for this and the managed to find a solution which seems to be working (in a LAB) . I am fortunate enough to have access to another four 8325 waiting for a deployment so I could build a LAB.
Their solution was to disable default route recursive lookup:
vrf prod no route recursive-lookup default-route ipv4
------------------------------
-- tommyd
Original Message:
Sent: Jan 17, 2025 12:41 PM
From: DavidS
Subject: AOS-CX breaking changes between 10.11 and 10.13
We had a similar experience. Upgrading two 6410 switches with an EVPN connection between them from version 10.11.1021 to version 10.13.1060. Once we upgraded, we started having issues with Domain Controllers with replication connections across the EVPN that were failing. Our Nimble storage partners were no longer in sync as well. Pings were successful and the switch interfaces did not show any dropped packets or errors, but sporadic connection drops between servers and other devices were showing up in reports.
We rolled back to the 10.11.1021 version and all of the issues went away. I am curious if you found a fix or a newer 10.13 version that fixed your issues?
Original Message:
Sent: Sep 27, 2024 08:28 AM
From: tommyd
Subject: AOS-CX breaking changes between 10.11 and 10.13
I have two VSX pairs of 8325 switches working in two datacenters on OS version 10.11.0001. There is BGP EVPN runnig on them, several VLAN stretched between DCs, some servers (including ESXi hosts) and bunch of external connections to WAN routers. Recently I've tried to upgrade to 10.13.1040 and failed in some interesting ways.
After the upgrade some random things lose communication. It seems that all ARPs, MACs and required routes are present both in the l2tp evpn address family in the underlay as well as in overlay ipv4 but no communication between some random parts of the network. In one case I could not even ping switch SVI from a VM despite MAC and ARP present on the switch. Rebooting the switches back to 10.11 restores everything.
Could you suggest some troubleshooting steps and ideas what to try to fix the config for 10.13. I have only couple of hours late in the night every week to two to try out something
Below a simplified diagram with switch connections and most of cases of external connections.
------------------------------
-- tommyd
------------------------------