Wired Intelligent Edge

 View Only
  • 1.  AOS-CX breaking changes between 10.11 and 10.13

    Posted Sep 27, 2024 08:28 AM

    I have two VSX pairs of 8325 switches working in two datacenters on OS version 10.11.0001.  There is BGP EVPN runnig on them, several VLAN stretched between DCs, some servers (including ESXi hosts)  and bunch of external connections to WAN routers. Recently I've tried to upgrade to 10.13.1040 and failed in some interesting ways.  

    After the upgrade some random things lose communication. It seems that all ARPs, MACs and required routes are present both in the l2tp evpn address family in the underlay as well as in overlay ipv4 but no communication between some random parts of the network.  In one case I could not even ping switch SVI from a VM despite  MAC and ARP present on the switch. Rebooting the switches back to 10.11 restores everything.  

    Could you suggest some troubleshooting steps and ideas what to try to fix the config for 10.13.  I have only couple of hours late in the night every week to two  to try out something  

    Below a simplified diagram with switch connections and most of cases of external connections. 

    simplified diagram


    ------------------------------
    -- tommyd
    ------------------------------


  • 2.  RE: AOS-CX breaking changes between 10.11 and 10.13

    Posted Sep 27, 2024 10:13 AM

    How was the upgrade procedure done?
    Did you make use of "vsx upgrade-software" command? Did upgrades of VSX clusters (or single switches) in LAB1 and LAB2 overlap in time?

     

    Reason asking:

    • Checking 10.13.1050 release notes, it says (under 'resolved issues'):
      • Symptom: Traffic loss is observed in VXLAN tunnels.
      • Scenario: This issue is observed when the VSX partner and its directly connected VXLAN VTEP peer are rebooted simultaneously.

    Without the info asked for given, looking at your drawing, this might be a possible scenario ...
    In that case, I'd recommend to re-run the upgrade (possibly to 10.13.1050) by means of "vsx upgrade-software", one VSX cluster at a time.

    (May be worth checking VSX config beforehand: "vsx-config mate", "show vsx brief", "show vsx status" ...)

     

     

    Just in case, be prepared to capture the support files during error condition on all four 8325 switches.

    With that, you have got all the data available, TAC might ask for, when you open a ticket ...
    And given the complexity (yes, EVPN-VXLAN with VSX is a complex setup, even with four switches only), I'd suggest to contact TAC if the problem should persist.

     

     

     






  • 3.  RE: AOS-CX breaking changes between 10.11 and 10.13

    Posted Jan 18, 2025 06:47 AM

    We had a similar experience. Upgrading two 6410 switches with an EVPN connection between them from version 10.11.1021 to version 10.13.1060. Once we upgraded, we started having issues with Domain Controllers with replication connections across the EVPN that were failing. Our Nimble storage partners were no longer in sync as well. Pings were successful and the switch interfaces did not show any dropped packets or errors, but sporadic connection drops between servers and other devices were showing up in reports.

    We rolled back to the 10.11.1021 version and all of the issues went away. I am curious if you found a fix or a newer 10.13 version that fixed your issues? 




  • 4.  RE: AOS-CX breaking changes between 10.11 and 10.13

    Posted Jan 20, 2025 04:13 AM

    I've opened a support case for this and the managed to find a solution which seems to be working (in a LAB) . I am fortunate enough to have access to another four 8325 waiting for a deployment so I could  build a LAB. 

    Their solution was to disable default route recursive lookup:

    vrf prod
       no route recursive-lookup default-route ipv4


    ------------------------------
    -- tommyd
    ------------------------------



  • 5.  RE: AOS-CX breaking changes between 10.11 and 10.13

    Posted Jan 20, 2025 04:57 AM

    I am glad that the TAC has managed to find a solution. Thanks for posting the solution which may come in handy for future similar issues with CX.



    ------------------------------
    Daniel Ruiz
    -----------------------
    Any opinions expressed here are solely my own and not necessarily that of Hewlett Packard Enterprise or Aruba Networks.
    If you have urgent issues, always contact your Aruba partner, distributor, or Aruba TAC Support.
    Check https://www.arubanetworks.com/support-services/contact-support/ for how to contact Aruba TAC.
    ------------------------------