Wired Intelligent Edge

 View Only
Expand all | Collapse all

Loose connetion to ESXi if VXS Slave is beeing rebooted

This thread has been viewed 26 times
  • 1.  Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 26, 2025 10:05 AM

    Hi

    We have 2x 8320 (TL.10.15.1020) as VSX Cluster.
    On that cluster each ESXi server has been connected to each VSX Switch with a standard Switch without LACP and LAG
    If i shutdown Port on the Slave VSX which leads to the ESXi -> Everything runs further.
    But if I reboot to switch (or SW-Upgrade) some VMs are offline until the switch is up again.

    I tried already this on the slave without beeing successful:
    vsx shutdown-on-split

    Config of these Ports:

    interface 1/1/8
        description xxx-esx01
        no shutdown
        mtu 9198
        no routing
        vlan trunk native 1
        vlan trunk allowed all
    

    any idea how to solve that loss of connectivity?

    THanks



  • 2.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 02:26 AM

    Hi

     

    I remember that I had a similar situation with a pair of 8320s, too.

     

    To make sure, can you please elaborate a bit more on the loss of traffic flow when rebooting?

    • When exactly do your VM lose their network connection? Is it while shutting down the switch, during the entire reboot (so approx.. 4-5min) or only after the switch comes back for a about 30 seconds? Please try to be very exact with your statement here.
    • What happens if you reboot the VSX primary? Is the behavior the same as with the reboot of VSX secondary?
    • How is your vSwitch / port group on the ESXi server configured? What algorithm did you select for traffic distribution and how are your adapters (vmnicX) configured on the port group (e.g. active, standby, unused, etc.)?

     

    Additional questions:

    • Do you have a VSX keep alive link configured?
    • Can you please show me the output of "show run vsx"?
    • Out of curiosity: Why do you run software 10.15.xxxx (SSR release)?

     

     

    Regards,

    Thomas

     

     






  • 3.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 02:37 AM

    Hi

    • When exactly do your VM lose their network connection? Is it while shutting down the switch, during the entire reboot (so approx.. 4-5min) or only after the switch comes back for a about 30 seconds? Please try to be very exact with your statement here.
      • -> As soon as the VSX connection goes down, so quite before the slave shut down and before rebooting. it comes up when the VSX link is up and in-sync

    • What happens if you reboot the VSX primary? Is the behavior the same as with the reboot of VSX secondary?
      • not sure anymore, I think they are gone too

    • How is your vSwitch / port group on the ESXi server configured? What algorithm did you select for traffic distribution and how are your adapters (vmnicX) configured on the port group (e.g. active, standby, unused, etc.)?

     

    Additional questions:

    • Do you have a VSX keep alive link configured?
      • yes it is a dedicated link for that
    • Can you please show me the output of "show run vsx"?
      • vsx
            inter-switch-link lag 53
            role primary
            keepalive peer 1.1.1.2 source 1.1.1.1 vrf KA
        interface lag 53
            description VSX-SLAVE-SWITCH
            no shutdown
            no routing
            vlan trunk native 1 tag
            vlan trunk allowed all
            lacp mode active
        
        ... no further interfaced config showed here ...
    • Out of curiosity: Why do you run software 10.15.xxxx (SSR release)?
      • Thats because we got that as latest, but it happend also with the last previous releases 10.14..



  • 4.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 03:03 AM

    Hmm, that does not sound very similar to my problem I once had. The one I had was right after the reboot when the links already came back up but the forwarding on the VSX secondary was not yet working/ready.

     

    So, my (wild) guess would be that you may have a general forwarding problem with your ESXi server which is not necessarily connected to the reboot of your switch.

     

    Therefore, I have additional questions:

    • Did you ever try to just disconnect/shutdown vmnic0 on the host in question? Did the VMs move to vmnic2 and continue to have access to your network?
    • Did you ever try to just disconnect/shutdown vmnic2 on the host in question? Did the VMs move to vmnic0 and continue to have access to your network?
    • Where are the layer3 interfaces for your VLANs your VMs use? Can you ping the gateway address (active-gateway, HSRP, etc.) when VSX secondary is down (please test from a VM and if possible, also from a physical host)?

     

     

    Regards,

    Thomas

     

     






  • 5.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 03:09 AM
    • Did you ever try to just disconnect/shutdown vmnic0 on the host in question? Did the VMs move to vmnic2 and continue to have access to your network?
    • Did you ever try to just disconnect/shutdown vmnic2 on the host in question? Did the VMs move to vmnic0 and continue to have access to your network?
      • What I've done is I shut down the port on the Slave VSX Switch, that caused no interruption. (so vmnic was then also down in the link)
    • Where are the layer3 interfaces for your VLANs your VMs use? Can you ping the gateway address (active-gateway, HSRP, etc.) when VSX secondary is down (please test from a VM and if possible, also from a physical host)?
      • There are on a Firewall cluster, I have to test that, If I can ping the GW



  • 6.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 03:47 AM

    This sounds like you need to use the vsx shutdown-on-split configuration. However, you wrote you've already done that. 

    One thing I noticed in the shared VSX configuration is that I don't see the VSX system-mac configured. Is this not configured in the VSX context? If not, this could result in strange behaviors. Please have a look into the VSX best practices guide: https://support.hpe.com/hpesc/public/docDisplay?docId=a00094242en_us&docLocale=en_US

    Page 39/40: connecting server NICs

    page 41/42 : shutdown on split

    Page 46: Virtual MAC



    ------------------------------
    Willem Bargeman
    Systems Engineer Aruba
    ACEX #125
    ------------------------------



  • 7.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 04:03 AM

    Hi @willembargeman

    I looked into the system-mac here https://arubanetworking.hpe.com/techdocs/AOS-CX/10.07/HTML/5200-7888/Content/VSX_cmds/sys-mac-10.htm

    And that sounds it could be the cause.

    What I can't read out there for each VSX Pair, I need different system Mac's right?
    VSX1: system-mac 02:01:00:00:01:00
    VSX2: system-mac 02:01:00:00:02:00




  • 8.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 04:14 AM

    Hi @sysram,

    Within a VSX setup you need to use the same MACs. So both VSX switches can use for example: 02:00:00:00:01:00

    A second VSX setup/pair needs to use a different system MAC.



    ------------------------------
    Willem Bargeman
    Systems Engineer Aruba
    ACEX #125
    ------------------------------



  • 9.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 08:11 AM

    Hi @willembargeman

    I configured now the virtual system-mac on both:

    star-sw02# show vsx status
    VSX Operational State
    ---------------------
      ISL channel             : In-Sync
      ISL mgmt channel        : operational
      Config Sync Status      : In-Sync
      NAE                     : peer_reachable
      HTTPS Server            : peer_reachable
    
    Attribute           Local               Peer
    ------------        --------            --------
    ISL link            lag53               lag53
    ISL version         2                   2
    System MAC          02:01:00:00:01:00   02:01:00:00:01:00
    Platform            8320                8320
    Software Version    TL.10.15.1020       TL.10.15.1020
    Device Role         secondary           primary
    

    and on both switches has the split config for the ESX Port:

    interface 1/1/8
        description ram-esx01
        no shutdown 
        mtu 9198
        vsx shutdown-on-split
    

    Then I reloaded the secondary switch, and still the VMs have lost the connectivity quickly after reboot command until it was up again.

    I checked also the Firewall Ports where the GW is for the VMs. They are on the same switches but are configured with a multi-chassis lag and LACP. So they don't need the shutdown-on-split command.

    any more ideas?




  • 10.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 08:23 AM

    Did you check on the ESXi side if the physical port goes down? Are this blade servers or 'normal' servers directly connected to the switch? If the physical interface on the switch goes down you should see the same on the ESXi side. If the server side doesn't detect the interface down state traffic is still send to that interface. 



    ------------------------------
    Willem Bargeman
    Systems Engineer Aruba
    ACEX #125
    ------------------------------



  • 11.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 09:13 AM

    @willembargeman you nailed it!!

    Two old HPE ESXi server -> NIC down
    The new one which I test -> NIC stays up... WTF never thought about that

    So in the end I changed now this on the vswitch and now The connection stays alive.
    But I had before the same settings as on the old servers which were working.

    So thanks all for the help!!




  • 12.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 10:20 AM

    Be careful with beacon probing! You need at least 3 NICs to make this work well, see also:

    https://knowledge.broadcom.com/external/article/324536/what-is-beacon-probing.html

     

    So, you should better find a way to replicate the adapter status such that if your switch is down that ESXi server knows. So, do you have a blade chassis or similar?

     

     

    Regards,

    Thomas

     

     






  • 13.  RE: Loose connetion to ESXi if VXS Slave is beeing rebooted

    Posted Jun 27, 2025 08:50 AM

    I'd wonder whether you see the failing VM's mac address moving to the other switch's interface (e.g. int 1/1/8). Can you check that?
    So, look for a mac address (usually starting with 00:50:56) of a VM on the interface of VSX secondary towards the ESXi server.

    Then reboot the VSX secondary and query the MAC address table of the VSX primary. Make sure you find the VM in question. Perhaps, it's advisable to run a ping to an IP address in the network from the VM, so you have constant outbound traffic and a good chance of the switch learning the VM's mac address quickly. Depending on how productive your environment is, you may for a moment also power-down the VSX secondary to get a bit more time.

     

    If you don't observe the mac address moving to the other switch, you have a problem on your ESXi server or somewhere in between on another networking component.

     

    Regards,

    Thomas