Eliminate the connection from the troubleshooting. Place a second controller at the site, setup LMS/B-LMS, perform the failover. If the issue doesn't present, then the WAN connection is almost certainly the issue.
FYI, that's the methodology that TAC should have presented in the first place. Remove the unsupported part of the deployment and see if the error occurs, work from there if issue still exists.
Original Message:
Sent: Feb 05, 2024 12:40 PM
From: victorlwt
Subject: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster
At the moment, they have 1 controller per site. They do plan to get another controller in the future. At the moment, their request is to have the controller at the other site serving as the backup controller for redundancy sake.
The logs do show missed heartbeats and rebootstraps are shown when I do a show ap debug counter. How can I convince customer that it is an mtu issue or troubleshoot what is causing the bootstrap?
Original Message:
Sent: Feb 05, 2024 11:13 AM
From: chulcher
Subject: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster
No, the better option is to place controllers on the LAN with the APs. If local controllers aren't possible, look at Instant AP or an AOS 10 AP only deployment. Remote AP will behave better across a WAN, since that is designed for that operation, but is only meant for a single AP installation at the remote site as no RF or client management is enabled on Remote AP.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Feb 05, 2024 10:19 AM
From: victorlwt
Subject: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster
Hi Carson,
Thanks for clearing my doubts on that LMS is a better option and the advice on the mtu.
Would it be better that I adjust the mtu?
https://community.arubanetworks.com/discussion/aruba-version-8-ap-mtu-size-change
or should I attempt to convert those campus APs to remote APs? I had another site with a similar issue with APs rebooting randomly. Converting to remote APs actually fixed the problem. I did try to convert but the APs couldn't even come up. Still trying to figure out what is wrong. Previous attempt I was using physical controllers and MM, I managed to get it up. Now I am using all virtual for controllers and MMs, still trying to figure out what is wrong. I created the vpn pool, converted to remote AP and used PSK for the Remote AP provisioning.
Original Message:
Sent: Feb 05, 2024 08:54 AM
From: chulcher
Subject: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster
- Don't use clustering over L3, that recommendation shouldn't have been made. Clustering is meant to be used in a layer 2 environment. If LMS/B-LMS isn't working over your L3 then L3 clustering is going to probably be even worse.
- Campus AP connectivity over WAN isn't a supported deployment. The AP expects a full Ethernet (1518) MTU as a minimum all the way to the controller and would prefer even more (1578 minimum). If they are in fact providing the full 1518 then the issue sounds like an unreliable or oversubscribed link.
------------------------------
Carson Hulcher, ACEX#110
Original Message:
Sent: Feb 04, 2024 11:48 PM
From: victorlwt
Subject: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster
hi all,
I was trying to setup a VMM with 2 physical MC recently. As the MC are at different location, we decided to use the LMS/backup LMS method for the AP failover. The AP was able to failover from primary LMS to backup LMS but the AP will keep rebootstrapping due to failed heartbeat. As the network between the 2 MCs are managed by another party, we could only seek their help to verify the MTU size. They reverted the MTU Size as 1500.
We logged a case with Aruba TAC and we tried increasing the bootstrap threshold. The APs still rebootstrap. In the end, Tac advised us to do clustering between both the controllers. Unfortunately, the tunnels do not seem to be able to negotiate,
Below are the logs captured when the controllers were unsuccessfully trying to form a cluster with one another.
===============================================================================
Feb 5 12:21:56 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646502 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:22:17 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646503 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:22:23 2024 cli[31060]: USER: admin has logged in from 10.0.0.100.
Feb 5 12:22:32 2024 cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr| cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
Feb 5 12:22:32 2024 fpapps[5449]: <399838> <5451> <WARN> |fpapps| Received MAP_DEL from IKE for default-ha-ipsecmapX.X.X.X (gw X.X.X.X) mapid 0 vlanid 0 load-balance 0
Feb 5 12:22:32 2024 cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr| cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
Feb 5 12:22:38 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646504 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:22:52 2024 KERNEL(TestAP@aruba-ap): [ 9126.643795] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
Feb 5 12:22:52 2024 KERNEL(TestAP@aruba-ap): [ 9126.679146] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
Feb 5 12:22:59 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646505 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:24:23 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646506 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:24:44 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646507 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:25:05 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646508 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:25:26 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646509 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:25:37 2024 cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr| cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
Feb 5 12:25:37 2024 fpapps[5449]: <399838> <5451> <WARN> |fpapps| Received MAP_DEL from IKE for default-ha-ipsecmapX.X.X.X (gw X.X.X.X) mapid 0 vlanid 0 load-balance 0
Feb 5 12:25:37 2024 cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr| cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
Feb 5 12:25:40 2024 cli[31471]: USER: admin has logged in from 10.0.0.100.
Feb 5 12:25:47 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646510 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
Feb 5 12:25:57 2024 KERNEL(TestAP@aruba-ap): [ 9311.694255] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
Feb 5 12:25:57 2024 KERNEL(TestAP@aruba-ap): [ 9311.724514] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
Feb 5 12:26:08 2024 isakmpd[5465]: <103103> <5465> <WARN> |ike| IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646511 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
(SGRHWLC01) #
(WLC) #show lc-cluster group-membership
Cluster Enabled, Profile Name = "TestCluster"
Redundancy Mode On
Heartbeat Threshold = 900 msec
AP Load Balancing: Enabled
Active AP Rebalance Threshold = 20%
Active AP Unbalance Threshold = 5%
Active AP Rebalance AP Count = 50
Active AP Rebalance Timer = 1 minutes
Cluster Info Table
------------------
Type IPv4 Address Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
peer Y.Y.Y.Y 10 N/A SECURE-TUNNEL-NEGOTIATING
self X.X.X.X 5 N/A ISOLATED (Leader)
(WLC) #show lc-cluster group-membership
Cluster Enabled, Profile Name = "TestCluster"
Redundancy Mode On
Heartbeat Threshold = 900 msec
AP Load Balancing: Enabled
Active AP Rebalance Threshold = 20%
Active AP Unbalance Threshold = 5%
Active AP Rebalance AP Count = 50
Active AP Rebalance Timer = 1 minutes
Cluster Info Table
------------------
Type IPv4 Address Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
peer Y.Y.Y.Y 10 N/A SECURE-TUNNEL-NEGOTIATING
self X.X.X.X 5 N/A ISOLATED (Leader)
(WLC) #show lc-cluster group-membership
Cluster Enabled, Profile Name = "TestCluster"
Redundancy Mode On
Heartbeat Threshold = 900 msec
AP Load Balancing: Enabled
Active AP Rebalance Threshold = 20%
Active AP Unbalance Threshold = 5%
Active AP Rebalance AP Count = 50
Active AP Rebalance Timer = 1 minutes
Cluster Info Table
------------------
Type IPv4 Address Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
peer Y.Y.Y.Y 10 N/A SECURE-TUNNEL-NEGOTIATING
self X.X.X.X 5 N/A ISOLATED (Leader)
(WLC) #show lc-cluster group-membership
Cluster Enabled, Profile Name = "TestCluster"
Redundancy Mode On
Heartbeat Threshold = 900 msec
AP Load Balancing: Enabled
Active AP Rebalance Threshold = 20%
Active AP Unbalance Threshold = 5%
Active AP Rebalance AP Count = 50
Active AP Rebalance Timer = 1 minutes
Cluster Info Table
------------------
Type IPv4 Address Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
peer Y.Y.Y.Y 10 N/A DISCONNECTED
self X.X.X.X 5 N/A ISOLATED (Leader)
(WLC) #show lc-cluster group-membership
Cluster Enabled, Profile Name = "TestCluster"
Redundancy Mode On
Heartbeat Threshold = 900 msec
AP Load Balancing: Enabled
Active AP Rebalance Threshold = 20%
Active AP Unbalance Threshold = 5%
Active AP Rebalance AP Count = 50
Active AP Rebalance Timer = 1 minutes
Cluster Info Table
------------------
Type IPv4 Address Priority Connection-Type STATUS
---- --------------- -------- --------------- ------
peer Y.Y.Y.Y 10 N/A DISCONNECTED
self X.X.X.X 5 N/A ISOLATED (Leader)
===============================================================================
While the tac is going thru the logs, i would like to enquire the folowing.
1) Are there any ways to fixed the LMS failover or at least diagnose the root cause?
2) Why does the cluster tunnel not form? Any other ways to troubleshoot. The APs are in bridged mode, thus no other VLANs are trunked from the controllers to the switches except for the management VLAN,
==============================================================
(WLC) #show lc-cluster vlan-probe status
Cluster VLAN Probe Status
-------------------------
Type IPv4 Address REQ-SENT REQ-FAIL ACK-SENT ACK-FAIL REQ-RCVD ACK-RCVD VLAN_FAIL CONN-TYPE START/STOP
---- --------------- -------- -------- -------- -------- -------- -------- --------- --------- ----------
peer Y.Y.Y.Y 0 0 0 0 0 0 0 N/A 0/ 49
(SGBLDWLC01) #show crypto isakmp sa
ISAKMP SA Active Session Information
------------------------------------
Initiator IP Responder IP Flags Start Time Private IP Peer ID
------------ ------------ ----- ---------- ---------- -------------
X.X.X.X VMM IP address i-v2-p Feb 5 11:42:39 - IPV4_ADDR:VMM IP address
Flags: i = Initiator; r = Responder
m = Main Mode; a = Agressive Mode; v2 = IKEv2; P = exchange PPK
p = Pre-shared key; c = Certificate/RSA Signature; e = ECDSA Signature
x = XAuth Enabled; y = Mode-Config Enabled; E = EAP Enabled
3 = 3rd party AP; C = Campus AP; R = RAP; Ru = Custom Certificate RAP; I = IAP
V = VIA; S = VIA over TCP; l = uplink load-balance
Total ISAKMP SAs: 1
(WLC) #show crypto ipsec sa
IPSEC SA (V2) Active Session Information
-----------------------------------
Initiator IP Responder IP SPI(IN/OUT) Flags Start Time Inner IP Ipsec-map
------------ ------------ ---------------- ----- --------------- -------- ---------
X.X.X.X VMM IP address 9ba24a00/2dcf8c00 UT2 Feb 5 11:42:40 - default-local-conductor-ipsecmap
Flags: T = Tunnel Mode; E = Transport Mode; U = UDP Encap
L = L2TP Tunnel; N = Nortel Client; C = Client; 2 = IKEv2
l = uplink load-balance
Total IPSEC SAs: 1
(WLC) #
==============================================================
Thanks.
Regards,
Victor