Wired Intelligent Edge

 View Only
  • 1.  Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 18, 2023 11:45 AM

    Hi Guys,

    On one of my VSX Pair the config sync does not work any more.
    I can see it with  mclag interfaces and snmp.

    Any way to restart only the config sync process without interupting service?

    VSX Pair affected:
    Primary:

    sdsw-rz-r08# sh debug
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    module           sub_module                      severity   vlan  port       ip                                            mac               instance          vrf                              table                            column                           client                           ubt_zone
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    ospfv2           all                             debug      ----- -----      -----                                         -----             -----             -----                            -----                            -----                            -----                            -----
    vsx              vsx_sync                        debug      ----- -----      -----                                         -----             -----             -----                            -----                            -----                            -----                            -----
    
    sdsw-rz-r08#
    2023-04-18T17:26:18.606074+0200 hpe-restd[3840] <INFO> Event|4602|LOG_INFO|AMM|-|Authentication succeeded for user manager in session ai5l0bu8YheLUEvMKQMW7w==
    2023-04-18T17:26:18.606709+0200 hpe-restd[3840] <INFO> Event|4655|LOG_INFO|AMM|-|User manager logged in from 172.31.200.250 through REST session
    2023-04-18T17:26:26.285970+0200 hpe-restd[3840] <INFO> Event|4608|LOG_INFO|AMM|-|Authorization allowed for user manager, for resource SessionMgmt, with action POST
    2023-04-18T17:26:26.286650+0200 hpe-restd[3840] <INFO> Event|4657|LOG_INFO|AMM|-|User manager logged out of REST session from 172.31.200.250
    
    sdsw-rz-r08# conf t
    sdsw-rz-r08(config)# int lag 35 multi-chassis
    sdsw-rz-r08(config-lag-if)# do sh run int lag 35
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peer
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-rz-r08(config-lag-if)# interface lag 35 multi-chassis
    sdsw-rz-r08(config-lag-if)# description Blupp
    sdsw-rz-r08(config-lag-if)# do sh run int lag 35
    interface lag 35 multi-chassis
        no shutdown
        description Blupp
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peer
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peer
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-rz-r08(config-lag-if)#     description VMs-Server11
    sdsw-rz-r08(config-lag-if)# end
    
    sdsw-rz-r08# ch dif run st
    No difference in configs.
    sdsw-rz-r08#
    
    sdsw-rz-r08# sh run vsx
    vsx
        system-mac 0a:00:0c:02:01:00
        inter-switch-link lag 256
        role primary
        keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global bgp control-plane-acls copp-policy dhcp-relay dhcp-server dns icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vsx-global
    
    sdsw-rz-r08# sh vsx brief
    ISL State                              : In-Sync
    Device State                           : Peer-Established
    Keepalive State                        : Keepalive-Established
    Device Role                            : Primary
    Number of Multi-chassis LAG interfaces : 34
    
    sdsw-rz-r08# sh vsx status
    VSX Operational State
    ---------------------
      ISL channel             : In-Sync
      ISL mgmt channel        : operational
      Config Sync Status      : In-Sync
      NAE                     : peer_reachable
      HTTPS Server            : peer_reachable
    
    Attribute           Local               Peer
    ------------        --------            --------
    ISL link            lag256              lag256
    ISL version         2                   2
    System MAC          0a:00:0c:02:01:00   0a:00:0c:02:01:00
    Platform            8325                8325
    Software Version    GL.10.09.1040       GL.10.09.1040
    Device Role         primary             secondary
    
    sdsw-rz-r08#
    

    Secondary:

    sdsw-rz-r09# sh debug
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    module           sub_module                      severity   vlan  port       ip                                            mac               instance          vrf                              table                            column                           client                           ubt_zone
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    vsx              vsx_sync                        debug      ----- -----      -----                                         -----             -----             -----                            -----                            -----                            -----                            -----
    sdsw-rz-r09#
    
    sdsw-rz-r09# sh run vsx
    vsx
        system-mac 0a:00:0c:02:01:00
        inter-switch-link lag 256
        role secondary
        keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global bgp control-plane-acls copp-policy dhcp-relay dhcp-server dns icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vsx-global
    
    sdsw-rz-r09# sh vsx brief
    ISL State                              : In-Sync
    Device State                           : Peer-Established
    Keepalive State                        : Keepalive-Established
    Device Role                            : Secondary
    Number of Multi-chassis LAG interfaces : 34
    
    sdsw-rz-r09# sh vsx status
    VSX Operational State
    ---------------------
      ISL channel             : In-Sync
      ISL mgmt channel        : operational
      Config Sync Status      : In-Sync
      NAE                     : peer_reachable
      HTTPS Server            : peer_reachable
    
    Attribute           Local               Peer
    ------------        --------            --------
    ISL link            lag256              lag256
    ISL version         2                   2
    System MAC          0a:00:0c:02:01:00   0a:00:0c:02:01:00
    Platform            8325                8325
    Software Version    GL.10.09.1040       GL.10.09.1040
    Device Role         secondary           primary
    
    sdsw-rz-r09#
    
    


    On a different but similar configued VSX Pair it works fine, and on this VSX Pair you can see a log entry on the secondary member.
    Primary:

    sdsw-bu-r02# sh debug
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    module           sub_module                      severity   vlan  port       ip                                            mac               instance          vrf                              table                            column                           client                           ubt_zone
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    ospfv2           all                             debug      ----- -----      -----                                         -----             -----             -----                            -----                            -----                            -----                            -----
    vsx              vsx_sync                        debug      ----- -----      -----                                         -----             -----             -----                            -----                            -----                            -----                            -----
    
    sdsw-bu-r02# conf t
    sdsw-bu-r02(config)# int lag 35 multi-chassis
    sdsw-bu-r02(config-lag-if)# do sh run int lag 35
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11-ESXFTS0020
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-bu-r02(config-lag-if)# do sh run int lag 35 vsx-peer
    interface lag 35 multi-chassis
        no shutdown
        description VMs-Server11-ESXFTS0020
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-bu-r02(config-lag-if)# interface lag 35 multi-chassis
    sdsw-bu-r02(config-lag-if)# description Blupp
    sdsw-bu-r02(config-lag-if)# do sh run int lag 35 vsx-peer
    interface lag 35 multi-chassis
        no shutdown
        description Blupp
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        lacp fallback
        spanning-tree root-guard
        spanning-tree tcn-guard
        spanning-tree port-type admin-edge
        exit
    sdsw-bu-r02(config-lag-if)#     description VMs-Server11-ESXFTS0020
    sdsw-bu-r02(config-lag-if)# end
    sdsw-bu-r02# ch diff run startup-config
    No difference in configs.
    sdsw-bu-r02#
    

    Secondary:

    sdsw-bu-r03#
    2023-04-18T17:36:02.990077+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database
    2023-04-18T17:36:07.592797+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database
    2023-04-18T17:36:12.200009+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database
    2023-04-18T17:36:16.788945+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database
    
    sdsw-bu-r03#
    


    Thanks and kind regards

    Robert



  • 2.  RE: Aruba CX 8325: vsx config sync does not work any more
    Best Answer

    Posted Apr 20, 2023 10:02 AM

    Hi Robert,

    I have seen some sync issues already and as far as I remember such issues were resolved in 10.09.1050. Here is what I have just found in Release Notes (bug ID 236992):
    https://www.arubanetworks.com/techdocs/AOS-CX/10.09/RN/rn_8320_10-09-1050.pdf

    You will probably find some related messages on the switch from the vsx-syncd daemon like: "Long time without updates, check if remote connection is stalled." (not sure if such messages are logged in the event log, but definitely in /var/log/messages)

    I am not going to dive into details how to implement the workaround as it requires some shell commands, but I will leave this public document here (shell needs to be used with appropriate privilege level):
    https://www.arubanetworks.com/techdocs/AOS-CX/Consolidated_RNs/HTML-6300-6400/Content/10_06/10_06_0160/10-06-0160_f.htm

    Another recommendation would be to upgrade the devices as version 10.09 is already considered unsupported:
    https://www.arubanetworks.com/support-services/end-of-life/#product=aos-cx-switching-software

    Regards
    Stanislav




  • 3.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 02:10 AM
    Edited by r.grossmann Apr 21, 2023 02:12 AM

    Hi Stanislav,

    thanks for your helpful response. 

    Stupid as I am I didn't review the resolved issues of the more recent firmware releases.

    Update to 10.10.xxxx is planned for June or July, as we have to start all CX Switches.
    We did noticed a massive network outage during the last update in July 2022. Of all things our main VSX pair (Core-Layer DC Location RZ) causes the problem (using the vsx upgrade script) interrupting services and reachability for around 20-30 Seconds.

    As all Switches should be running on the same SW, I now used the linked workaround. Thanks for linking to the 6400 document, where the needed command is listed.

    start-shell
    sudo systemctl restart vsx-syncd 

    Afterwards changes to snmp and mclag has been propagated to the VSX secondary.




  • 4.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 02:56 AM
    Hi|

    "Of all things our main VSX pair (Core-Layer DC Location RZ) causes the problem (using the vsx upgrade script) interrupting services and reachability for around 20-30 Seconds."

    This is strange.

    In a properly configured environment where all connected peers are properly "dual homed" against each node of the VSX Cluster (I mean: traditional LCAP based LAG on a Peer <--> VSX LAG - also known as Multi-Chassis LAG - on the VSX pair), peers and (at least - one at time - one of the node forming) the VSX cluster are always in continuous communication (no ping lost, assured reachability), that's always true and it is true especially during a manual or automated VSX software update/upgrade procedure.

    During a software update/upgrade, it's expected to have first the Secondary VSX node then the Primary VSX node to alternatively reboot so communications with upstream/downstream peers are briefly interrupted during each node unavailability (generally a node reboot lasts less than 4 minutes), that's pretty normal BUT, since peers should generally be connected via LACP (or even Non-Protocol) based aggregated links to both VSX nodes, it's also totally normal to have communications flowing without interruptions to the other (still) running VSX node.

    Doesn't it happen in your case?





  • 5.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 03:43 AM

    All devices to the Core Layer (CSW = Core Switches) are connected Dual-Homed with LACP.
    Connected Devices: other VSX Pairs (SDSW = Server Distribution, CDSW = Client Distribution, Access = old Aruba-OS 5406 ESX Server Distribution, Checkpoint Firewall, Out of Band Management (routed))

    What I have noticed and told the Aruba support in an other case, is that on the main VSX Pair the Spanning tree counters looks suspicious.
    I do not expect BPDU-RX on the ISL lag256 on the primary or BPDU-TX on the secondary member.
    On all other VSX Pairs only the VSX primary transmits BPDU over the VSL lag256

    I'm not sure, but I guess the network has gone down with the reboot of the secondary for 20-30s and with thr rebbot of the primary for 8-10s.

    CSW-RZ is located in DC Location RZ and is the spanning-tree root (spanning-tree priority 1):

    Primary VSX Pair DC location RZ
    csw-rz-r08# sh spanning-tree |i lag256
    Port         Role           State      Cost           Priority   Type             BPDU-Tx    BPDU-Rx    TCN-Tx     TCN-Rx
    lag256       Designated     Forwarding 1              64         P2P              11474584   11463341   2          0
    
    csw-rz-r08# sh spanning-tree vsx-peer  |i lag256
    Port         Role           State      Cost           Priority   Type             BPDU-Tx    BPDU-Rx    TCN-Tx     TCN-Rx
    lag256       Designated     Forwarding 1              64         P2P              11474652   11463097   0          2


    CSW-BU is located in DC Location BU and is the the secondary spanning-tree root (spanning-tree priority 2):

    csw-bu-r02# sh spanning-tree |i lag256
    Port         Role           State      Cost           Priority   Type             BPDU-Tx    BPDU-Rx    TCN-Tx     TCN-Rx
    lag256       Designated     Forwarding 1              64         P2P              11474207   14         2          0
    
    csw-bu-r02# sh spanning-tree vsx-peer |i lag256
    Port         Role           State      Cost           Priority   Type             BPDU-Tx    BPDU-Rx    TCN-Tx     TCN-Rx
    lag256       Designated     Forwarding 1              64         P2P              79         11463106   0          2
    



  • 6.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 09:22 AM

    My BPDU counts on our 8325 VSX pair looks like your first switch.  Dont mind the large TCN on the switches, I am still testing and this is not production yet.  Either we both have the same issue with STP on our VSX or something is wrong with STP on the csw-bu-r VSX config.

    On your second switches that has different BPDU Tx/Rx, do you have vsx-sync mclag-interfaces or vsx-sync global enabled?  What does the config for your interface lag256 look like.  When running show spanning-detail, does both your VSX switches have the same Bridge ID?

    S40-MDF-AGG-01# show spanning-tree
    Spanning tree status      : Enabled Protocol: MSTP
    
    MST0
      Root ID    Priority   : 0
                 MAC-Address: 02:00:00:00:10:00
                 This bridge is the root
                 Hello time(in seconds):2  Max Age(in seconds):20
                 Forward Delay(in seconds):15
    
      Bridge ID  Priority  : 0
                 MAC-Address: 02:00:00:00:10:00
                 Hello time(in seconds):2  Max Age(in seconds):20
                 Forward Delay(in seconds):15
    
    Port         Role           State      Cost           Priority   Type             BPDU-Tx    BPDU-Rx    TCN-Tx     TCN-Rx
    ------------ -------------- ---------- -------------- ---------- ---------------- ---------- ---------- ---------- ----------
    lag256       Designated     Forwarding 1              64         P2P              1283082    1283037    104        24
    
    Number of topology changes    : 269
    Last topology change occurred : 591282 seconds ago
    
    S40-MDF-AGG-01# show run vsx
    vsx
        system-mac 02:00:00:00:10:00
        inter-switch-link lag 256
        role primary
        keepalive peer 172.31.10.253 source 172.31.10.252 vrf KEEPALIVE
        linkup-delay-timer 600
        vsx-sync vsx-global
    interface lag 256
        description ISL
        no shutdown
        no routing
        vlan trunk native 2
        vlan trunk allowed all
        lacp mode active
    



  • 7.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 10:16 AM

    Hi MFlowers,

    We do have 9 VSX Pairs (2x CSW Core, 2x SDSW Server Distribution, 2x CDSW Client Distribution, 2x SASW Server Access, 1x WSW WAN).
    Only on the STP Root Pais the RX BPDU Counter is increasing on the primary member.

    Maybe thats normal behaviour, but it is the only point I'm not aware of.

    Base config for all VSX Pairs is the same, of course with different system-mac, priorities etc. per VSX Pair.
    Most vsx-sync including vsx-global and mclag-interfaces are enabled.

    VSX CSW-RZ (Rack 08 primary):

    csw-rz-r08# sh run int lag 255
    interface lag 255
        no shutdown
        vrf attach KEEPALIVE
        description VSX keepalive
        ip address 1.1.1.0/31
        lacp mode active
        exit
    csw-rz-r08# sh run int lag 256
    interface lag 256
        no shutdown
        description ISL link
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        exit
    csw-rz-r08# sh run int 1/1/12
    interface 1/1/12
        no shutdown
        mtu 9198
        description VSX keepalive physical link
        lag 255
        exit
    csw-rz-r08# sh run int 1/1/55
    interface 1/1/55
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-rz-r08# sh run int 1/1/56
    interface 1/1/56
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-rz-r08# sh spanning-tree |i MAC
                 MAC-Address: 0a:00:0c:01:01:00
                 MAC-Address: 0a:00:0c:01:01:00
    csw-rz-r08# sh spanning-tree |i lag256
    lag256       Designated     Forwarding 1              64         P2P              11486474   11475221   2          0
    csw-rz-r08# sh run vsx
    vsx
        system-mac 0a:00:0c:01:01:00
        inter-switch-link lag 256
        role primary
        keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global

    VSX CSW-RZ (Rack 08 secondary):

    csw-rz-r09# sh run int lag 255
    interface lag 255
        no shutdown
        vrf attach KEEPALIVE
        description VSX keepalive
        ip address 1.1.1.1/31
        lacp mode active
        exit
    csw-rz-r09# sh run int lag 256
    interface lag 256
        no shutdown
        description ISL link
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        exit
    csw-rz-r09# sh run int 1/1/12
    interface 1/1/12
        no shutdown
        mtu 9198
        description VSX keepalive physical link
        lag 255
        exit
    csw-rz-r09# sh run int 1/1/55
    interface 1/1/55
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-rz-r09# sh run int 1/1/56
    interface 1/1/56
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-rz-r09# sh spanning-tree |i MAC
                 MAC-Address: 0a:00:0c:01:01:00
                 MAC-Address: 0a:00:0c:01:01:00
    csw-rz-r09# sh spanning-tree |i lag256
    lag256       Designated     Forwarding 1              64         P2P              11486522   11474951   0          2
    csw-rz-r09# sh run vsx
    vsx
        system-mac 0a:00:0c:01:01:00
        inter-switch-link lag 256
        role secondary
        keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global


    VSX CSW-BU (Rack 02 primary):

    csw-bu-r02# sh run int lag255
    interface lag 255
        no shutdown
        vrf attach KEEPALIVE
        description VSX keepalive
        ip address 1.1.1.0/31
        lacp mode active
        exit
    csw-bu-r02# sh run int lag256
    interface lag 256
        no shutdown
        description ISL link
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        exit
    csw-bu-r02# sh run int 1/1/12
    interface 1/1/12
        no shutdown
        mtu 9198
        description VSX keepalive physical link
        lag 255
        exit
    csw-bu-r02# sh run int 1/1/55
    interface 1/1/55
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-bu-r02# sh run int 1/1/56
    interface 1/1/56
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-bu-r02# sh spanning-tree |i MAC
                 MAC-Address: 0a:00:0c:01:01:00
                 MAC-Address: 0a:00:0c:01:02:00
    csw-bu-r02# sh spanning-tree |i lag256
    lag256       Designated     Forwarding 1              64         P2P              11486032   14         2          0
    csw-bu-r02#
    csw-bu-r02# sh run vsx
    vsx
        system-mac 0a:00:0c:01:02:00
        inter-switch-link lag 256
        role primary
        keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global

    VSX CSW-BU (Rack 03 secondary):

    csw-bu-r03# sh run int lag255
    interface lag 255
        no shutdown
        vrf attach KEEPALIVE
        description VSX keepalive
        ip address 1.1.1.1/31
        lacp mode active
        exit
    csw-bu-r03# sh run int lag256
    interface lag 256
        no shutdown
        description ISL link
        no routing
        vlan trunk native 998 tag
        vlan trunk allowed all
        lacp mode active
        exit
    csw-bu-r03# sh run int 1/1/12
    interface 1/1/12
        no shutdown
        mtu 9198
        description VSX keepalive physical link
        lag 255
        exit
    csw-bu-r03# sh run int 1/1/55
    interface 1/1/55
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-bu-r03# sh run int 1/1/56
    interface 1/1/56
        no shutdown
        mtu 9198
        description ISL physical link
        lag 256
        exit
    csw-bu-r03# sh spanning-tree |i MAC
                 MAC-Address: 0a:00:0c:01:01:00
                 MAC-Address: 0a:00:0c:01:02:00
    csw-bu-r03#
    csw-bu-r03# sh spanning-tree |i lag256
    lag256       Designated     Forwarding 1              64         P2P              79         11474916   0          2
    csw-bu-r03# sh run vsx
    vsx
        system-mac 0a:00:0c:01:02:00
        inter-switch-link lag 256
        role secondary
        keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE
        vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global


    We also encounter BPDU Starvation on all VSX Pairs except STP Root CSW-RZ. And I do not know why. All Uplinks are clear and far not fully utilized.
    The Starvation leads to STP Root changes on the Aruba-CX Switches:

    csw-bu-r02# sh logg -r |i starv
    2023-04-20T12:02:55.739975+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2023-03-20T08:01:46.739926+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2023-02-27T15:06:39.739903+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2023-02-10T03:12:34.740079+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2023-01-11T11:19:23.739910+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-12-22T23:11:30.739982+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-12-07T19:07:17.739925+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-11-22T23:24:40.739820+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-11-19T12:49:29.739941+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-10-28T15:46:14.739884+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-10-13T23:15:17.739979+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-09-27T05:58:30.739904+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-09-22T09:34:59.739880+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-09-19T20:33:50.739909+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    2022-09-13T23:38:29.739947+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
    
    csw-bu-r02# sh logg -r |i Root
    2023-04-20T12:02:55.743663+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2023-04-20T12:02:55.740079+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2023-03-20T08:01:46.743644+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2023-03-20T08:01:46.740020+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2023-02-27T15:06:39.743832+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2023-02-27T15:06:39.740027+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2023-02-10T03:12:34.744337+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2023-02-10T03:12:34.740201+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2023-01-11T11:19:23.743424+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2023-01-11T11:19:23.740008+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-12-22T23:11:30.743856+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-12-22T23:11:30.740093+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-12-07T19:07:17.743600+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-12-07T19:07:17.740032+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-11-22T23:24:40.743153+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-11-22T23:24:40.739896+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-11-19T12:49:29.743782+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-11-19T12:49:29.740032+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-10-28T15:46:14.743883+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-10-28T15:46:14.739996+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-10-13T23:15:17.743835+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-10-13T23:15:17.740250+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-09-27T05:58:30.743275+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-09-27T05:58:30.740205+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-09-22T09:34:59.744184+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-09-22T09:34:59.740118+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-09-19T20:33:50.743507+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-09-19T20:33:50.740150+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
    2022-09-13T23:38:29.743378+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
    2022-09-13T23:38:29.740181+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST  - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00

    There is an own Thread for the starvation, but without any solution:
    https://community.arubanetworks.com/discussion/aruba-cx

    There is no physical loop anymore, with exception of the WAN Switches (WSW). This VSX Pair is connected to all of the Core Switches. So there are two redundant pathes to the Core Network with blocking the link to the CSW-BU.




  • 8.  RE: Aruba CX 8325: vsx config sync does not work any more

    Posted Apr 21, 2023 05:03 PM

    Why is tcn-guard enabled on the mc-lag between to wsw switch and your two core switches?

    interface lag 48 multi-chassis
        no shutdown
        description wsw
        no routing
        vlan trunk native 998
        vlan trunk allowed all
        lacp mode active
        spanning-tree tcn-guard


    I thought about the BDU Rx/Tx and I think I understand why it is happening.  The Root bridge will be the one to send config BDPUs on the network.  Both VSX switches (CSW-RZ) are the Root Bridge and both will send BDPUs.   This doesnt fully make sense as there should be a STP operational primary and STP operational secondary.  Maybe the operational primary works correctly for non root swtiches but not the root bridge?

    https://www.arubanetworks.com/techdocs/AOS-CX/10.09/HTML/vsx/Content/Chp_STP/how-stp-wor-wit-vsx-10.htm