Hi MFlowers,
We do have 9 VSX Pairs (2x CSW Core, 2x SDSW Server Distribution, 2x CDSW Client Distribution, 2x SASW Server Access, 1x WSW WAN).
Only on the STP Root Pais the RX BPDU Counter is increasing on the primary member.
Maybe thats normal behaviour, but it is the only point I'm not aware of.
Base config for all VSX Pairs is the same, of course with different system-mac, priorities etc. per VSX Pair.
Most vsx-sync including vsx-global and mclag-interfaces are enabled.
VSX CSW-RZ (Rack 08 primary):
csw-rz-r08# sh run int lag 255
interface lag 255
no shutdown
vrf attach KEEPALIVE
description VSX keepalive
ip address 1.1.1.0/31
lacp mode active
exit
csw-rz-r08# sh run int lag 256
interface lag 256
no shutdown
description ISL link
no routing
vlan trunk native 998 tag
vlan trunk allowed all
lacp mode active
exit
csw-rz-r08# sh run int 1/1/12
interface 1/1/12
no shutdown
mtu 9198
description VSX keepalive physical link
lag 255
exit
csw-rz-r08# sh run int 1/1/55
interface 1/1/55
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-rz-r08# sh run int 1/1/56
interface 1/1/56
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-rz-r08# sh spanning-tree |i MAC
MAC-Address: 0a:00:0c:01:01:00
MAC-Address: 0a:00:0c:01:01:00
csw-rz-r08# sh spanning-tree |i lag256
lag256 Designated Forwarding 1 64 P2P 11486474 11475221 2 0
csw-rz-r08# sh run vsx
vsx
system-mac 0a:00:0c:01:01:00
inter-switch-link lag 256
role primary
keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE
vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global
VSX CSW-RZ (Rack 08 secondary):
csw-rz-r09# sh run int lag 255
interface lag 255
no shutdown
vrf attach KEEPALIVE
description VSX keepalive
ip address 1.1.1.1/31
lacp mode active
exit
csw-rz-r09# sh run int lag 256
interface lag 256
no shutdown
description ISL link
no routing
vlan trunk native 998 tag
vlan trunk allowed all
lacp mode active
exit
csw-rz-r09# sh run int 1/1/12
interface 1/1/12
no shutdown
mtu 9198
description VSX keepalive physical link
lag 255
exit
csw-rz-r09# sh run int 1/1/55
interface 1/1/55
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-rz-r09# sh run int 1/1/56
interface 1/1/56
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-rz-r09# sh spanning-tree |i MAC
MAC-Address: 0a:00:0c:01:01:00
MAC-Address: 0a:00:0c:01:01:00
csw-rz-r09# sh spanning-tree |i lag256
lag256 Designated Forwarding 1 64 P2P 11486522 11474951 0 2
csw-rz-r09# sh run vsx
vsx
system-mac 0a:00:0c:01:01:00
inter-switch-link lag 256
role secondary
keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE
vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global
VSX CSW-BU (Rack 02 primary):
csw-bu-r02# sh run int lag255
interface lag 255
no shutdown
vrf attach KEEPALIVE
description VSX keepalive
ip address 1.1.1.0/31
lacp mode active
exit
csw-bu-r02# sh run int lag256
interface lag 256
no shutdown
description ISL link
no routing
vlan trunk native 998 tag
vlan trunk allowed all
lacp mode active
exit
csw-bu-r02# sh run int 1/1/12
interface 1/1/12
no shutdown
mtu 9198
description VSX keepalive physical link
lag 255
exit
csw-bu-r02# sh run int 1/1/55
interface 1/1/55
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-bu-r02# sh run int 1/1/56
interface 1/1/56
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-bu-r02# sh spanning-tree |i MAC
MAC-Address: 0a:00:0c:01:01:00
MAC-Address: 0a:00:0c:01:02:00
csw-bu-r02# sh spanning-tree |i lag256
lag256 Designated Forwarding 1 64 P2P 11486032 14 2 0
csw-bu-r02#
csw-bu-r02# sh run vsx
vsx
system-mac 0a:00:0c:01:02:00
inter-switch-link lag 256
role primary
keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE
vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global
VSX CSW-BU (Rack 03 secondary):
csw-bu-r03# sh run int lag255
interface lag 255
no shutdown
vrf attach KEEPALIVE
description VSX keepalive
ip address 1.1.1.1/31
lacp mode active
exit
csw-bu-r03# sh run int lag256
interface lag 256
no shutdown
description ISL link
no routing
vlan trunk native 998 tag
vlan trunk allowed all
lacp mode active
exit
csw-bu-r03# sh run int 1/1/12
interface 1/1/12
no shutdown
mtu 9198
description VSX keepalive physical link
lag 255
exit
csw-bu-r03# sh run int 1/1/55
interface 1/1/55
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-bu-r03# sh run int 1/1/56
interface 1/1/56
no shutdown
mtu 9198
description ISL physical link
lag 256
exit
csw-bu-r03# sh spanning-tree |i MAC
MAC-Address: 0a:00:0c:01:01:00
MAC-Address: 0a:00:0c:01:02:00
csw-bu-r03#
csw-bu-r03# sh spanning-tree |i lag256
lag256 Designated Forwarding 1 64 P2P 79 11474916 0 2
csw-bu-r03# sh run vsx
vsx
system-mac 0a:00:0c:01:02:00
inter-switch-link lag 256
role secondary
keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE
vsx-sync aaa acl-log-timer arp-security bfd-global control-plane-acls copp-policy dhcp-relay dhcp-server dns evpn icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vrrp vsx-global
We also encounter BPDU Starvation on all VSX Pairs except STP Root CSW-RZ. And I do not know why. All Uplinks are clear and far not fully utilized.
The Starvation leads to STP Root changes on the Aruba-CX Switches:
csw-bu-r02# sh logg -r |i starv
2023-04-20T12:02:55.739975+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2023-03-20T08:01:46.739926+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2023-02-27T15:06:39.739903+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2023-02-10T03:12:34.740079+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2023-01-11T11:19:23.739910+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-12-22T23:11:30.739982+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-12-07T19:07:17.739925+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-11-22T23:24:40.739820+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-11-19T12:49:29.739941+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-10-28T15:46:14.739884+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-10-13T23:15:17.739979+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-09-27T05:58:30.739904+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-09-22T09:34:59.739880+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-09-19T20:33:50.739909+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
2022-09-13T23:38:29.739947+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2008|LOG_INFO|AMM|1/1|CIST starved for a BPDU Rx on port lag53 from 4096:0a000c-010100
csw-bu-r02# sh logg -r |i Root
2023-04-20T12:02:55.743663+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2023-04-20T12:02:55.740079+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2023-03-20T08:01:46.743644+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2023-03-20T08:01:46.740020+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2023-02-27T15:06:39.743832+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2023-02-27T15:06:39.740027+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2023-02-10T03:12:34.744337+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2023-02-10T03:12:34.740201+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2023-01-11T11:19:23.743424+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2023-01-11T11:19:23.740008+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-12-22T23:11:30.743856+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-12-22T23:11:30.740093+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-12-07T19:07:17.743600+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-12-07T19:07:17.740032+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-11-22T23:24:40.743153+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-11-22T23:24:40.739896+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-11-19T12:49:29.743782+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-11-19T12:49:29.740032+01:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-10-28T15:46:14.743883+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-10-28T15:46:14.739996+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-10-13T23:15:17.743835+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-10-13T23:15:17.740250+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-09-27T05:58:30.743275+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-09-27T05:58:30.740205+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-09-22T09:34:59.744184+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-09-22T09:34:59.740118+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-09-19T20:33:50.743507+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-09-19T20:33:50.740150+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
2022-09-13T23:38:29.743378+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 8192: 0a:00:0c:01:02:00 to 4096: 0a:00:0c:01:01:00
2022-09-13T23:38:29.740181+02:00 csw-bu-r02 hpe-mstpd[2128]: Event|2006|LOG_INFO|AMM|1/1|CST - Root changed from 4096: 0a:00:0c:01:01:00 to 8192: 0a:00:0c:01:02:00
There is an own Thread for the starvation, but without any solution:
https://community.arubanetworks.com/discussion/aruba-cx
There is no physical loop anymore, with exception of the WAN Switches (WSW). This VSX Pair is connected to all of the Core Switches. So there are two redundant pathes to the Core Network with blocking the link to the CSW-BU.
Original Message:
Sent: Apr 21, 2023 09:21 AM
From: Mflowers@beta.team
Subject: Aruba CX 8325: vsx config sync does not work any more
My BPDU counts on our 8325 VSX pair looks like your first switch. Dont mind the large TCN on the switches, I am still testing and this is not production yet. Either we both have the same issue with STP on our VSX or something is wrong with STP on the csw-bu-r VSX config.
On your second switches that has different BPDU Tx/Rx, do you have vsx-sync mclag-interfaces or vsx-sync global enabled? What does the config for your interface lag256 look like. When running show spanning-detail, does both your VSX switches have the same Bridge ID?
S40-MDF-AGG-01# show spanning-treeSpanning tree status : Enabled Protocol: MSTPMST0 Root ID Priority : 0 MAC-Address: 02:00:00:00:10:00 This bridge is the root Hello time(in seconds):2 Max Age(in seconds):20 Forward Delay(in seconds):15 Bridge ID Priority : 0 MAC-Address: 02:00:00:00:10:00 Hello time(in seconds):2 Max Age(in seconds):20 Forward Delay(in seconds):15Port Role State Cost Priority Type BPDU-Tx BPDU-Rx TCN-Tx TCN-Rx------------ -------------- ---------- -------------- ---------- ---------------- ---------- ---------- ---------- ----------lag256 Designated Forwarding 1 64 P2P 1283082 1283037 104 24Number of topology changes : 269Last topology change occurred : 591282 seconds agoS40-MDF-AGG-01# show run vsxvsx system-mac 02:00:00:00:10:00 inter-switch-link lag 256 role primary keepalive peer 172.31.10.253 source 172.31.10.252 vrf KEEPALIVE linkup-delay-timer 600 vsx-sync vsx-globalinterface lag 256 description ISL no shutdown no routing vlan trunk native 2 vlan trunk allowed all lacp mode active
Original Message:
Sent: Apr 21, 2023 03:42 AM
From: r.grossmann
Subject: Aruba CX 8325: vsx config sync does not work any more
All devices to the Core Layer (CSW = Core Switches) are connected Dual-Homed with LACP.
Connected Devices: other VSX Pairs (SDSW = Server Distribution, CDSW = Client Distribution, Access = old Aruba-OS 5406 ESX Server Distribution, Checkpoint Firewall, Out of Band Management (routed))
What I have noticed and told the Aruba support in an other case, is that on the main VSX Pair the Spanning tree counters looks suspicious.
I do not expect BPDU-RX on the ISL lag256 on the primary or BPDU-TX on the secondary member.
On all other VSX Pairs only the VSX primary transmits BPDU over the VSL lag256
I'm not sure, but I guess the network has gone down with the reboot of the secondary for 20-30s and with thr rebbot of the primary for 8-10s.
CSW-RZ is located in DC Location RZ and is the spanning-tree root (spanning-tree priority 1):
Primary VSX Pair DC location RZcsw-rz-r08# sh spanning-tree |i lag256Port Role State Cost Priority Type BPDU-Tx BPDU-Rx TCN-Tx TCN-Rxlag256 Designated Forwarding 1 64 P2P 11474584 11463341 2 0csw-rz-r08# sh spanning-tree vsx-peer |i lag256Port Role State Cost Priority Type BPDU-Tx BPDU-Rx TCN-Tx TCN-Rxlag256 Designated Forwarding 1 64 P2P 11474652 11463097 0 2
CSW-BU is located in DC Location BU and is the the secondary spanning-tree root (spanning-tree priority 2):
csw-bu-r02# sh spanning-tree |i lag256Port Role State Cost Priority Type BPDU-Tx BPDU-Rx TCN-Tx TCN-Rxlag256 Designated Forwarding 1 64 P2P 11474207 14 2 0csw-bu-r02# sh spanning-tree vsx-peer |i lag256Port Role State Cost Priority Type BPDU-Tx BPDU-Rx TCN-Tx TCN-Rxlag256 Designated Forwarding 1 64 P2P 79 11463106 0 2
Original Message:
Sent: Apr 21, 2023 02:55 AM
From: parnassus
Subject: Aruba CX 8325: vsx config sync does not work any more
Hi|
"Of all things our main VSX pair (Core-Layer DC Location RZ) causes the problem (using the vsx upgrade script) interrupting services and reachability for around 20-30 Seconds."
This is strange.
In a properly configured environment where all connected peers are properly "dual homed" against each node of the VSX Cluster (I mean: traditional LCAP based LAG on a Peer <--> VSX LAG - also known as Multi-Chassis LAG - on the VSX pair), peers and (at least - one at time - one of the node forming) the VSX cluster are always in continuous communication (no ping lost, assured reachability), that's always true and it is true especially during a manual or automated VSX software update/upgrade procedure.
During a software update/upgrade, it's expected to have first the Secondary VSX node then the Primary VSX node to alternatively reboot so communications with upstream/downstream peers are briefly interrupted during each node unavailability (generally a node reboot lasts less than 4 minutes), that's pretty normal BUT, since peers should generally be connected via LACP (or even Non-Protocol) based aggregated links to both VSX nodes, it's also totally normal to have communications flowing without interruptions to the other (still) running VSX node.
Doesn't it happen in your case?
Original Message:
Sent: 4/21/2023 2:10:00 AM
From: r.grossmann
Subject: RE: Aruba CX 8325: vsx config sync does not work any more
Hi Stanislav,
thanks for your helpful response.
Stupid as I am I didn't review the resolved issues of the more recent firmware releases.
Update to 10.10.xxxx is planned for June or July, as we have to start all CX Switches.
We did noticed a massive network outage during the last update in July 2022. Of all things our main VSX pair (Core-Layer DC Location RZ) causes the problem (using the vsx upgrade script) interrupting services and reachability for around 20-30 Seconds.
As all Switches should be running on the same SW, I now used the linked workaround. Thanks for linking to the 6400 document, where the needed command is listed.
start-shell
sudo systemctl restart vsx-syncd
Afterwards changes to snmp and mclag has been propagated to the VSX secondary.
Original Message:
Sent: Apr 20, 2023 10:01 AM
From: snaydenov
Subject: Aruba CX 8325: vsx config sync does not work any more
Hi Robert,
I have seen some sync issues already and as far as I remember such issues were resolved in 10.09.1050. Here is what I have just found in Release Notes (bug ID 236992):
https://www.arubanetworks.com/techdocs/AOS-CX/10.09/RN/rn_8320_10-09-1050.pdf
You will probably find some related messages on the switch from the vsx-syncd daemon like: "Long time without updates, check if remote connection is stalled." (not sure if such messages are logged in the event log, but definitely in /var/log/messages)
I am not going to dive into details how to implement the workaround as it requires some shell commands, but I will leave this public document here (shell needs to be used with appropriate privilege level):
https://www.arubanetworks.com/techdocs/AOS-CX/Consolidated_RNs/HTML-6300-6400/Content/10_06/10_06_0160/10-06-0160_f.htm
Another recommendation would be to upgrade the devices as version 10.09 is already considered unsupported:
https://www.arubanetworks.com/support-services/end-of-life/#product=aos-cx-switching-software
Regards
Stanislav
Original Message:
Sent: Apr 18, 2023 11:45 AM
From: r.grossmann
Subject: Aruba CX 8325: vsx config sync does not work any more
Hi Guys,
On one of my VSX Pair the config sync does not work any more.
I can see it with mclag interfaces and snmp.
Any way to restart only the config sync process without interupting service?
VSX Pair affected:
Primary:
sdsw-rz-r08# sh debug----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------module sub_module severity vlan port ip mac instance vrf table column client ubt_zone----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ospfv2 all debug ----- ----- ----- ----- ----- ----- ----- ----- ----- -----vsx vsx_sync debug ----- ----- ----- ----- ----- ----- ----- ----- ----- -----sdsw-rz-r08#2023-04-18T17:26:18.606074+0200 hpe-restd[3840] <INFO> Event|4602|LOG_INFO|AMM|-|Authentication succeeded for user manager in session ai5l0bu8YheLUEvMKQMW7w==2023-04-18T17:26:18.606709+0200 hpe-restd[3840] <INFO> Event|4655|LOG_INFO|AMM|-|User manager logged in from 172.31.200.250 through REST session2023-04-18T17:26:26.285970+0200 hpe-restd[3840] <INFO> Event|4608|LOG_INFO|AMM|-|Authorization allowed for user manager, for resource SessionMgmt, with action POST2023-04-18T17:26:26.286650+0200 hpe-restd[3840] <INFO> Event|4657|LOG_INFO|AMM|-|User manager logged out of REST session from 172.31.200.250sdsw-rz-r08# conf tsdsw-rz-r08(config)# int lag 35 multi-chassissdsw-rz-r08(config-lag-if)# do sh run int lag 35interface lag 35 multi-chassis no shutdown description VMs-Server11 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peerinterface lag 35 multi-chassis no shutdown description VMs-Server11 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-rz-r08(config-lag-if)# interface lag 35 multi-chassissdsw-rz-r08(config-lag-if)# description Bluppsdsw-rz-r08(config-lag-if)# do sh run int lag 35interface lag 35 multi-chassis no shutdown description Blupp no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peerinterface lag 35 multi-chassis no shutdown description VMs-Server11 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-rz-r08(config-lag-if)# do sh run int lag 35 vsx-peerinterface lag 35 multi-chassis no shutdown description VMs-Server11 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-rz-r08(config-lag-if)# description VMs-Server11sdsw-rz-r08(config-lag-if)# endsdsw-rz-r08# ch dif run stNo difference in configs.sdsw-rz-r08#sdsw-rz-r08# sh run vsxvsx system-mac 0a:00:0c:02:01:00 inter-switch-link lag 256 role primary keepalive peer 1.1.1.1 source 1.1.1.0 vrf KEEPALIVE vsx-sync aaa acl-log-timer arp-security bfd-global bgp control-plane-acls copp-policy dhcp-relay dhcp-server dns icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vsx-globalsdsw-rz-r08# sh vsx briefISL State : In-SyncDevice State : Peer-EstablishedKeepalive State : Keepalive-EstablishedDevice Role : PrimaryNumber of Multi-chassis LAG interfaces : 34sdsw-rz-r08# sh vsx statusVSX Operational State--------------------- ISL channel : In-Sync ISL mgmt channel : operational Config Sync Status : In-Sync NAE : peer_reachable HTTPS Server : peer_reachableAttribute Local Peer------------ -------- --------ISL link lag256 lag256ISL version 2 2System MAC 0a:00:0c:02:01:00 0a:00:0c:02:01:00Platform 8325 8325Software Version GL.10.09.1040 GL.10.09.1040Device Role primary secondarysdsw-rz-r08#
Secondary:
sdsw-rz-r09# sh debug----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------module sub_module severity vlan port ip mac instance vrf table column client ubt_zone----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------vsx vsx_sync debug ----- ----- ----- ----- ----- ----- ----- ----- ----- -----sdsw-rz-r09#sdsw-rz-r09# sh run vsxvsx system-mac 0a:00:0c:02:01:00 inter-switch-link lag 256 role secondary keepalive peer 1.1.1.0 source 1.1.1.1 vrf KEEPALIVE vsx-sync aaa acl-log-timer arp-security bfd-global bgp control-plane-acls copp-policy dhcp-relay dhcp-server dns icmp-tcp internal-vlan-range l2-vlan-mac-cfg-mode lldp loop-protect-global mac-lockout mclag-interfaces mgmd-global neighbor policy-global qos-global route-map sflow-global snmp ssh stp-global time vsx-globalsdsw-rz-r09# sh vsx briefISL State : In-SyncDevice State : Peer-EstablishedKeepalive State : Keepalive-EstablishedDevice Role : SecondaryNumber of Multi-chassis LAG interfaces : 34sdsw-rz-r09# sh vsx statusVSX Operational State--------------------- ISL channel : In-Sync ISL mgmt channel : operational Config Sync Status : In-Sync NAE : peer_reachable HTTPS Server : peer_reachableAttribute Local Peer------------ -------- --------ISL link lag256 lag256ISL version 2 2System MAC 0a:00:0c:02:01:00 0a:00:0c:02:01:00Platform 8325 8325Software Version GL.10.09.1040 GL.10.09.1040Device Role secondary primarysdsw-rz-r09#
On a different but similar configued VSX Pair it works fine, and on this VSX Pair you can see a log entry on the secondary member.
Primary:
sdsw-bu-r02# sh debug----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------module sub_module severity vlan port ip mac instance vrf table column client ubt_zone----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ospfv2 all debug ----- ----- ----- ----- ----- ----- ----- ----- ----- -----vsx vsx_sync debug ----- ----- ----- ----- ----- ----- ----- ----- ----- -----sdsw-bu-r02# conf tsdsw-bu-r02(config)# int lag 35 multi-chassissdsw-bu-r02(config-lag-if)# do sh run int lag 35interface lag 35 multi-chassis no shutdown description VMs-Server11-ESXFTS0020 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-bu-r02(config-lag-if)# do sh run int lag 35 vsx-peerinterface lag 35 multi-chassis no shutdown description VMs-Server11-ESXFTS0020 no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-bu-r02(config-lag-if)# interface lag 35 multi-chassissdsw-bu-r02(config-lag-if)# description Bluppsdsw-bu-r02(config-lag-if)# do sh run int lag 35 vsx-peerinterface lag 35 multi-chassis no shutdown description Blupp no routing vlan trunk native 998 tag vlan trunk allowed all lacp mode active lacp fallback spanning-tree root-guard spanning-tree tcn-guard spanning-tree port-type admin-edge exitsdsw-bu-r02(config-lag-if)# description VMs-Server11-ESXFTS0020sdsw-bu-r02(config-lag-if)# endsdsw-bu-r02# ch diff run startup-configNo difference in configs.sdsw-bu-r02#
Secondary:
sdsw-bu-r03#2023-04-18T17:36:02.990077+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database2023-04-18T17:36:07.592797+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database2023-04-18T17:36:12.200009+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated database2023-04-18T17:36:16.788945+0200 vsx-syncd[10801] <INFO> Event|7602|LOG_INFO|AMM|-|Configuration sync update : VSX configuration-sync updated databasesdsw-bu-r03#
Thanks and kind regards
Robert