Hi
During the weekend I had an incident at one of my customers where the primary domain controller for LDAP didn't respond in a normal way.
The Primary Active Domain controller didn't responded and I couldn't search the OU structure. But ClearPass never failed over to the backup domain controller. When I manually changed the configuration to the backup domain controller FQDN as the primary server, function was restored.
The AD source has the default failover timeout of 10 seconds.
I have never seen this type of behavior on any other ClearPass cluster during more than 12 years of work with ClearPass and multiple deployments. If the primary domain controller has become unavailable the backup has been utilized instead without any issues. Thus I'm a bit confused what happened here and why ClearPass didn't started to utilize the second domain controller when no answers was received from the primary.
I don't have any pcap or DEBUG logs, as I focused on solving the issue instead of deeper root cause analysis of why the failover from ClearPass side didn't take place.
My questions are:
- In case of an unavailable primary domain controller, will ClearPass wait for a reply for every request until the Timeout has been reached, or is the function more like in a switch where a RADIUS server can be marked as dead and subsequent requests are sent direct to the secondary?
- In case of TACACS, it looks like at least Cisco switches have a TACACS timeout of 10 seconds, if the failover time is 10 seconds the switch will drop the request attempt before ClearPass fails over to the secondary domain controller. Any suggestions of a good timeout towards the Active Directory? In this environment I have separate clusters for TACACS and RADIUS and normal processing time for TACACS is below 50 ms.
I'm thinking of adjusting the Active Directory server timeout from 10 seconds to maybe 3 seconds. Feedback and thoughts on this adjustment is appreciated
- What is the exact triggers for a failover to the backup domain controllers?
------------------------------
Best Regards
Jonas Hammarbäck
MVP Guru, ACEX, ACDX #1600, ACCX #1335, ACX-Network Security
Aranya AB
If you find my answer useful, consider giving kudos and/or mark as solution
------------------------------