Don't know if these could be useful to you, below my ESXi side settings (VMware VSS "vSwitch1" which is the one carrying the VMs):
Properties
Standard switch vSwitch1
MTU 9000
Security
Promiscuous mode Reject
MAC address changes Reject
Forged transmits Reject
Traffic shaping
Average bandwidth --
Peak bandwidth --
Burst size --
Teaming and failover
Load balancing Route based on originating virtual port
Network failure detection Link status only
Notify switches Yes
Failback Yes
Active adapters vmnic2, vmnic5
Standby adapters --
Unused adapters --
Port Group <redacted> settings (VLAN ID 2030 in this specific case):
General
Network label <redacted>
VLAN ID 2030
Security
Promiscuous mode Reject
MAC address changes Reject
Forged transmits Reject
Traffic shaping
Average bandwidth --
Peak bandwidth --
Burst size --
Teaming and failover
Load balancing Route based on originating virtual port
Network failure detection Link status only
Notify switches Yes
Failback Yes
Active adapters vmnic2, vmnic5
Standby adapters --
Unused adapters --
Physical adapter settings (vmnic2):
Properties
Adapter Broadcom NetXtreme E-Series Advanced Dual-port 10Gb SFP+ Ethernet OCP 3.0 Adapter
Name vmnic2
Location PCI 0000:32:00.0
Driver bnxtnet
Status
Status Connected
Actual speed, Duplex 10 Gbit/s, Full Duplex
Configured speed, Duplex Auto negotiate
Networks <redacted>
SR-IOV
Status Not supported
Cisco Discovery Protocol
Cisco Discovery Protocol is not available on this physical network adapter
Link Layer Discovery Protocol
Link Layer Discovery Protocol is not available on this physical network adapter
Physical adapter setting (vmnic5), as above but with these differences:
Properties
Adapter Intel(R) Ethernet Controller X710 for 10GbE SFP+
Name vmnic5
Location PCI 0000:17:00.1
Driver i40en
These are the corresponding interface settings (1/1/28 on VSX Primary for vmnic2 and 1/1/28 on VSX Secondary for vmnic5):
interface 1/1/28
description DELL-R750-<redacted>-esxi03-s1p1-vmnic2-vSwitch1-A03
no shutdown
mtu 9198
no routing
vlan trunk native <redacted> tag
vlan trunk allowed <redacted>
spanning-tree bpdu-guard
spanning-tree port-type admin-edge
spanning-tree tcn-guard
loop-protect
loop-protect vlan <redacted>
interface 1/1/28
description DELL-R750-<redacted>-esxi03-s2p2-vmnic5-vSwitch1-A04
no shutdown
mtu 9198
no routing
vlan trunk native <redacted> tag
vlan trunk allowed <redacted>
spanning-tree bpdu-guard
spanning-tree port-type admin-edge
spanning-tree tcn-guard
loop-protect
loop-protect vlan <redacted>
Cheers, Davide.
-------------------------------------------
Original Message:
Sent: Sep 12, 2025 01:45 AM
From: Neonium
Subject: Problem with SFP+ line card in VSX stack
Which firmware version are you using? I already wrote something about setting up the switches in another post.
I can't configure much in the VSS.
Security
Promiscuous mode -> Reject
Mac address change -> Accept
Fake transmissions -> Accept
Traffic shaping is disabled
Teaming and failover
Load balancing -> Route based on the original virtual port
Network failure detection -> Connection status only
Notify switches -> Yes
Failback -> Yes
What I don't think I mentioned is that the two fiber optic cards each have 1 ISCSI and one LAN link. Each network card is patched to a switch. This has always allowed us to ensure that there can be no failure. With the 1G links as standby, it was actually impossible for a failure to occur.
Original Message:
Sent: Sep 11, 2025 06:09 PM
From: parnassus
Subject: Problem with SFP+ line card in VSX stack
Hi! pretty strange that a VMware VSS (no LACP) with just two active-active standalone SFP+ uplinks (again, absolutely no LACP involved due to VSS limitations) to VSX members generates this issue: we have two VSX (respectively 8320 and 8360 based) running in production with - if I understood what you described as a potentially problematic scenario - a similar setup to various ESXi hosts (each ESXi hosts owns many VSSes, each VSS owns many Port Groups for VLAN tagging to cope with corresponding tagging on VSX downlinks) and never experienced any failure in years, never (since AOS-CX 10.1 - yes! on 8320 VSX starting on 2018 - up to AOS-CX 10.13). Hundereds on VMs concurrently running, Terabytes and Terabytes of traffic has passed on those links. Never a single glitch.
Would be interesting to understand what are the VMware VSS settings and what are the corresponding exact configuration of involved interfaces on each VSX member.
Original Message:
Sent: 9/11/2025 4:15:00 PM
From: muhittin
Subject: RE: Problem with SFP+ line card in VSX stack
Hello,
Thank you for sharing more details.
What are the possible triggers when LACP is not present?
1- Since ESXi Standard vSwitch (VSS) does not support LACP, if you run two 10G uplinks simultaneously on two different 6400 (VSX pair) switches, a single "logical connection" does not form on the physical side.
2- The vSwitch pins each VM's traffic to one uplink; however, when the same VLAN exits through two different chassis, physical switches may see the same MAC on two different ports and experience MAC flapping.
3- The MAC table is not shared between chassis in VSX; therefore, the effect of "the same host's MAC rapidly switching between two chassis" becomes more pronounced.
4- This abnormal flow is a known trigger class that can cause switchd_agent2 to assert and crash with SIGABRT (signal 6) in some AOS-CX versions (depending on the exact build).
So why does the problem persistently occur on the Primary chassis?
1 - In the VSX architecture, certain control-plane/statistical tasks may be more intensive on the primary; the combination of heavy MAC churn + features can more easily lock up the agent on the primary line card.
You mentioned that you are not using LACP. If the "correct design" is not implemented, you may encounter the following:
1- A VSS connected to two different chassis with the same VLANs in active-active mode can spread BUM (broadcast/unknown/multicast) packets along both paths, even if it does not create a loop at L2. This produces STP TCN and flood moments.
2- Even if the iSCSI vSwitch is separate, when coming out of maintenance, iSCSI sessions and LAN ARPs can explode simultaneously and overload the line card.
On the CX side, the same crash signature can persist for years across multiple minor branches and appear under specific triggers (VSX + MAC churn); the fix may be in a different version or another build of 10.15.
What can be done for a healthy design without LACP?
1- On the VMware side, do not use two active-active 10G uplinks to two chassis at once. For each Port Group:
- Active uplink: Only one 10G (going to Primary)
- Standby uplink: The other 10G (going to Secondary).
- Leave the 1G ports as standby.
- You can balance the load by reversing this order on other ESXi hosts.
- This setting prevents the same VLAN from being active on two chassis at the same time, significantly reducing MAC flapping and flooding.
Of course, we recommend using LACP if possible.
There was also the iSCSI issue;
1- Separate vSwitch and port binding is correct; pin each vmkernel to a single physical NIC (the other remains standby).
2- Keep iSCSI uplinks on separate VLANs/Subnets and, if possible, separate physical paths; LACP is not required.
I hope TAC resolves the case soon. Please share the solution results with us.
Original Message:
Sent: Sep 11, 2025 01:11 PM
From: Neonium
Subject: Problem with SFP+ line card in VSX stack
I had actually ruled out a software bug, but your explanation sounds very plausible. However, I had it in different versions. If that were the case, the bug would have to have been in the firmware for a very long time, since the error occurred in a 10.15.x version.
But to add some more information, we do not use LACP in VMWare. We add 2 10G ports and 2 1G ports as standby to the vSwitch. In a separate vSwitch, we have 2 10G ports for ISCSI. The first vSwitch is for the LAN VLANs.
I opened a case earlier, but sometimes there are helpful tips in the community.