Wireless Access

 View Only
  • 1.  ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 04, 2024 11:49 PM

    hi all,

    I was trying to setup a VMM with 2 physical MC recently. As the MC are at different location, we decided to use the LMS/backup LMS method for the AP failover. The AP was able to failover from primary LMS to backup LMS but the AP will keep rebootstrapping due to failed heartbeat. As the network between the 2 MCs are managed by another party, we could only seek their help to verify the MTU size. They reverted the MTU Size as 1500.

    We logged a case with Aruba TAC and we tried increasing the bootstrap threshold. The APs still rebootstrap. In the end, Tac advised us to do clustering between both the controllers. Unfortunately, the tunnels do not seem to be able to negotiate,

    Below are the logs captured when the controllers were unsuccessfully trying to form a cluster with one another.

    ===============================================================================

    Feb  5 12:21:56 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646502 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:22:17 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646503 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:22:23 2024  cli[31060]: USER: admin has logged in from 10.0.0.100.
    Feb  5 12:22:32 2024  cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr|  cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
    Feb  5 12:22:32 2024  fpapps[5449]: <399838> <5451> <WARN> |fpapps|  Received MAP_DEL from IKE for default-ha-ipsecmapX.X.X.X (gw X.X.X.X) mapid 0 vlanid 0 load-balance 0
    Feb  5 12:22:32 2024  cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr|  cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
    Feb  5 12:22:38 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646504 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:22:52 2024  KERNEL(TestAP@aruba-ap): [ 9126.643795] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
    Feb  5 12:22:52 2024  KERNEL(TestAP@aruba-ap): [ 9126.679146] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
    Feb  5 12:22:59 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646505 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:24:23 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646506 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:24:44 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646507 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:25:05 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646508 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:25:26 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646509 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:25:37 2024  cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr|  cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
    Feb  5 12:25:37 2024  fpapps[5449]: <399838> <5451> <WARN> |fpapps|  Received MAP_DEL from IKE for default-ha-ipsecmapX.X.X.X (gw X.X.X.X) mapid 0 vlanid 0 load-balance 0
    Feb  5 12:25:37 2024  cluster_mgr[5937]: <352302> <5937> <ERRS> |cluster_mgr|  cluster_gsm_delete_target_from_cluster_rep_key, cluster_rep_key not present, target: X.X.X.X
    Feb  5 12:25:40 2024  cli[31471]: USER: admin has logged in from 10.0.0.100.
    Feb  5 12:25:47 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646510 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    Feb  5 12:25:57 2024  KERNEL(TestAP@aruba-ap): [ 9311.694255] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
    Feb  5 12:25:57 2024  KERNEL(TestAP@aruba-ap): [ 9311.724514] wlan: [0:E:NSS] [nss-wifili]: peer security message failed error = 63
    Feb  5 12:26:08 2024  isakmpd[5465]: <103103> <5465> <WARN> |ike|   IKE SA Deletion: IKE2_delSa peer:X.X.X.X:500 id:2730646511 errcode:ERR_IKESA_EXPIRED saflags:0x1 arflags:0x0
    (SGRHWLC01) #

    (WLC) #show lc-cluster group-membership

    Cluster Enabled, Profile Name = "TestCluster"
    Redundancy Mode On
    Heartbeat Threshold = 900 msec
    AP Load Balancing: Enabled
    Active AP Rebalance Threshold = 20%
    Active AP Unbalance Threshold = 5%
    Active AP Rebalance AP Count = 50
    Active AP Rebalance Timer = 1 minutes
    Cluster Info Table
    ------------------
    Type IPv4 Address    Priority Connection-Type STATUS
    ---- --------------- -------- --------------- ------
    peer     Y.Y.Y.Y       10             N/A SECURE-TUNNEL-NEGOTIATING
    self     X.X.X.X        5             N/A ISOLATED (Leader)
    (WLC) #show lc-cluster group-membership

    Cluster Enabled, Profile Name = "TestCluster"
    Redundancy Mode On
    Heartbeat Threshold = 900 msec
    AP Load Balancing: Enabled
    Active AP Rebalance Threshold = 20%
    Active AP Unbalance Threshold = 5%
    Active AP Rebalance AP Count = 50
    Active AP Rebalance Timer = 1 minutes
    Cluster Info Table
    ------------------
    Type IPv4 Address    Priority Connection-Type STATUS
    ---- --------------- -------- --------------- ------
    peer     Y.Y.Y.Y       10             N/A SECURE-TUNNEL-NEGOTIATING
    self     X.X.X.X        5             N/A ISOLATED (Leader)
    (WLC) #show lc-cluster group-membership

    Cluster Enabled, Profile Name = "TestCluster"
    Redundancy Mode On
    Heartbeat Threshold = 900 msec
    AP Load Balancing: Enabled
    Active AP Rebalance Threshold = 20%
    Active AP Unbalance Threshold = 5%
    Active AP Rebalance AP Count = 50
    Active AP Rebalance Timer = 1 minutes
    Cluster Info Table
    ------------------
    Type IPv4 Address    Priority Connection-Type STATUS
    ---- --------------- -------- --------------- ------
    peer     Y.Y.Y.Y       10             N/A SECURE-TUNNEL-NEGOTIATING
    self     X.X.X.X        5             N/A ISOLATED (Leader)

    (WLC) #show lc-cluster group-membership

    Cluster Enabled, Profile Name = "TestCluster"
    Redundancy Mode On
    Heartbeat Threshold = 900 msec
    AP Load Balancing: Enabled
    Active AP Rebalance Threshold = 20%
    Active AP Unbalance Threshold = 5%
    Active AP Rebalance AP Count = 50
    Active AP Rebalance Timer = 1 minutes
    Cluster Info Table
    ------------------
    Type IPv4 Address    Priority Connection-Type STATUS
    ---- --------------- -------- --------------- ------
    peer     Y.Y.Y.Y       10             N/A DISCONNECTED
    self     X.X.X.X        5             N/A ISOLATED (Leader)
    (WLC) #show lc-cluster group-membership

    Cluster Enabled, Profile Name = "TestCluster"
    Redundancy Mode On
    Heartbeat Threshold = 900 msec
    AP Load Balancing: Enabled
    Active AP Rebalance Threshold = 20%
    Active AP Unbalance Threshold = 5%
    Active AP Rebalance AP Count = 50
    Active AP Rebalance Timer = 1 minutes
    Cluster Info Table
    ------------------
    Type IPv4 Address    Priority Connection-Type STATUS
    ---- --------------- -------- --------------- ------
    peer     Y.Y.Y.Y       10             N/A DISCONNECTED
    self     X.X.X.X        5             N/A ISOLATED (Leader)

    ===============================================================================

    While the tac is going thru the logs, i would like to enquire the folowing.

    1) Are there any ways to fixed the LMS failover or at least diagnose the root cause?

    2) Why does the cluster tunnel not form?  Any other ways to troubleshoot. The APs are in bridged mode, thus no other VLANs are trunked from the controllers to the switches except for the management VLAN,

    ==============================================================
    (WLC) #show lc-cluster vlan-probe status

    Cluster VLAN Probe Status
    -------------------------
    Type IPv4 Address    REQ-SENT REQ-FAIL ACK-SENT ACK-FAIL REQ-RCVD ACK-RCVD VLAN_FAIL CONN-TYPE START/STOP
    ---- --------------- -------- -------- -------- -------- -------- -------- --------- --------- ----------
    peer     Y.Y.Y.Y        0        0        0        0        0        0         0       N/A     0/  49
    (SGBLDWLC01) #show crypto isakmp sa

    ISAKMP SA Active Session Information
    ------------------------------------
    Initiator IP                            Responder IP                            Flags     Start Time        Private IP                              Peer ID
    ------------                            ------------                            -----     ----------        ----------                              -------------
    X.X.X.X                                 VMM IP address                          i-v2-p    Feb  5 11:42:39     -                                     IPV4_ADDR:VMM IP address    

    Flags: i = Initiator; r = Responder
           m = Main Mode; a = Agressive Mode; v2 = IKEv2; P = exchange PPK
           p = Pre-shared key; c = Certificate/RSA Signature; e =  ECDSA Signature
           x = XAuth Enabled; y = Mode-Config Enabled; E = EAP Enabled
           3 = 3rd party AP; C = Campus AP; R = RAP;  Ru = Custom Certificate RAP; I = IAP
           V = VIA; S = VIA over TCP; l = uplink load-balance

    Total ISAKMP SAs: 1
    (WLC) #show crypto ipsec sa


    IPSEC SA (V2) Active Session Information
    -----------------------------------
    Initiator IP                              Responder IP                              SPI(IN/OUT)        Flags Start Time        Inner IP                                   Ipsec-map
    ------------                              ------------                              ----------------   ----- ---------------   --------                                   ---------
    X.X.X.X                                   VMM IP address                                9ba24a00/2dcf8c00  UT2   Feb  5 11:42:40     -                                        default-local-conductor-ipsecmap

    Flags: T = Tunnel Mode; E = Transport Mode; U = UDP Encap
           L = L2TP Tunnel; N = Nortel Client; C = Client; 2 = IKEv2
           l = uplink load-balance

    Total IPSEC SAs: 1
    (WLC) #

    ==============================================================

    Thanks.

    Regards,

    Victor



  • 2.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 05:47 AM

    Could of things I'd be checking, if this is an L3 cluster is the RAP whitelist correct on both controllers? As it it maybe configured on the Mobility Conductor but is it actually populating to the MD's? What about all the perimeter devices such as a NAT, routing, ACLs etc are these all correct?

    Have you also tried removing the swapping the LMS/BLMS in the first instance to see if the RAPs do appear on the 2nd cluster when defined as the LMS?




  • 3.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 08:26 AM

    Hi Craig,

    Thanks for taking the time to reply.

    i am doing Campus AP in bridge mode. When testing with lms and backup lms, I did make the backup lms as the primary lms, still the AP rebootstrap. This is actually my preferred choice. Since it didn't work, I logged a case with tac and was asked to try with L3 clustering.


    When testing with the L3 clustering, both the controllers just show "SECURE-TUNNEL-NEGOTIATING" then "DISCONNECTED" when running the verification "show LC-cluster group-membership".

    it seems like both method doesn't work for me.

    I just realized I had typed in the wrong inform about my MC, both are virtual instead of physical. I wonder if it has anything to do with certificates.




  • 4.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 08:55 AM
    1. Don't use clustering over L3, that recommendation shouldn't have been made.  Clustering is meant to be used in a layer 2 environment.  If LMS/B-LMS isn't working over your L3 then L3 clustering is going to probably be even worse.
    2. Campus AP connectivity over WAN isn't a supported deployment.  The AP expects a full Ethernet (1518) MTU as a minimum all the way to the controller and would prefer even more (1578 minimum).  If they are in fact providing the full 1518 then the issue sounds like an unreliable or oversubscribed link.


    ------------------------------
    Carson Hulcher, ACEX#110
    ------------------------------



  • 5.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 10:19 AM

    Hi Carson,

    Thanks for clearing my doubts on that LMS is a better option and the advice on the mtu. 

    Would it be better that I adjust the mtu?

    https://community.arubanetworks.com/discussion/aruba-version-8-ap-mtu-size-change

    or should I attempt to convert those campus APs to remote APs? I had another site with a similar issue with APs rebooting randomly. Converting to remote APs actually fixed the problem. I did try to convert but the APs couldn't even come up. Still trying to figure out what is wrong. Previous attempt I was using physical controllers and MM, I managed to get it up. Now I am using all virtual for controllers and MMs, still trying to figure out what is wrong. I created the vpn pool, converted to remote AP and used PSK for the Remote AP provisioning. 




  • 6.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 11:13 AM

    No, the better option is to place controllers on the LAN with the APs.  If local controllers aren't possible, look at Instant AP or an AOS 10 AP only deployment.  Remote AP will behave better across a WAN, since that is designed for that operation, but is only meant for a single AP installation at the remote site as no RF or client management is enabled on Remote AP.



    ------------------------------
    Carson Hulcher, ACEX#110
    ------------------------------



  • 7.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 12:41 PM

    At the moment, they have 1 controller per site. They do plan to get another controller in the future. At the moment, their request is to have the controller at the other site serving as the backup controller for redundancy sake. 

    The logs do show missed heartbeats and rebootstraps are shown when I do a show ap debug counter. How can I convince customer that it is an mtu issue or troubleshoot what is causing the bootstrap?




  • 8.  RE: ArubaAOS 8 AP LMS rebootstrapping and L3 cluster

    Posted Feb 05, 2024 12:53 PM

    Eliminate the connection from the troubleshooting.  Place a second controller at the site, setup LMS/B-LMS, perform the failover.  If the issue doesn't present, then the WAN connection is almost certainly the issue.

    FYI, that's the methodology that TAC should have presented in the first place.  Remove the unsupported part of the deployment and see if the error occurs, work from there if issue still exists.



    ------------------------------
    Carson Hulcher, ACEX#110
    ------------------------------