During configuration and testing of the vSphere Authentication Proxy we discovered some problems that make the use of the Authentication Proxy very unreliable within multi Domain Controller environments. In our case we are using the Authentication Proxy in conjunction with vSphere Auto Deploy but the problem also occurs in scenario’s without Auto Deploy.
Our tests show the following behavior in 9 out of 10 test runs:
- Whenever applying the Host Profile (which states that the ESXi Host needs to be added to the domain with use of the Authentication Proxy) we get the error message stating: “Cannot complete login due to an incorrect user name or password.”
- Manually add the ESXi Host to the Domain by entering the domain and Authentication Proxy fails with the same error message.
As stated, most of the time the process fails but sometimes it does work correctly. What’s going on?
Extensive troubleshooting shows us some interesting findings within the Active Directory and basically this is what happens:
- The vSphere Authentication Proxy Account (CAM User) creates a Computer Account in the Active Directory. This is done on a random DC;
- The vSphere Authentication Proxy gives the ESXi Host a Secure Channel Key;
- The ESXi Host requests a Kerberos ticket for authentication towards the Active Directory. This is done on a random DC;
- Majority of our tests fail because the ESXi Host is requesting a Kerberos ticket from another DC on which the vSphere Authentication Proxy created the initial Computer Account.
The time it takes for Intra-site Replication to complete is default (Microsoft) set to 15 seconds. The time it takes to complete step 1,2 and 3 is nearly 1 to 2 seconds, so by doing the math we can conclude that indeed most of the time the Authentication Proxy will fail.
Some technical details on how we troubleshooted this:
1 – First of all add the ESXi Host to the Active Directory with use of the Authentication Proxy.
This action will most likely result in the “Cannot complete login due to an incorrect user name or password”
2 – Checking the creation of the Computer Account.
When querying DC02 (the PDC) we see that the ESXI01 Computer Account is created on DC01
c:\>repadmin /showobjmeta DC02 “CN=ESXI01,CN=Computers,DC=xxxxx,DC=lan”
Loc.USN Originating DSA Org.USN Org.Time/Date Ver Attribute 48549544 FirstSite\DC01 49321521 2013-04-16 10:32:37 1 whenCreated
3 – Checking the existence of the Computer Account on another DC
When querying DC03 (The Computer Account is primarily created on DC01) 5 seconds after the Computer Account is created we discovered that we got an error message:
c:\>repadmin /showobjmeta DC03 “CN=ESXI01,CN=Computers,DC=xxxxx,DC=lan”
DsReplicaGetInfo() failed with status 8333 (0x208d): Directory object not found.
This is also confirmed by the Security Log on DC03 which shows an Audit Failure for the ESXi Host
The Result Code “0x6” refers to:
0x6 – KDC_ERR_C_PRINCIPAL_UNKNOWN: Client not found in Kerberos database
This error can occur if the domain controller cannot find the account name in Active Directory.
3 Common Issues and Potential Solutions:
1) The actual account does not exist.
Solution: Verify that the name is in the Active Directory and/or verify that Active Directory replication is current.
2) A new account has been created and has not yet replicated to the KDC that the client is using for authentication.
Solution: Force replication between the domain controllers.
3) The user’s account has expired and the Enforce user logon restrictions Group Policy object (GPO) setting is enabled.
Solution: Determine whether the account has expired, and change the expiration date as needed.
4 – Repeating step 3, 10 seconds later.
Repeating step 3, 10 seconds later, does give a successful result. This clearly indicates that timing is an issue here and that probably the Authentication Proxy isn’t aware of the fact that Intra-site Replication needs to be finished first before the Computer Account is visible on all Domain Controllers.
Update 26-04-2013: Only two days after writing this article VMware released vSphere 5.1 Update 1 which solves this problem as described in the release notes:
Attempt to join an ESXi host to a domain using vSphere Authentication Proxy service (CAM service) might fail for the first time due to delay in DC replication
The automated process to add an ESXi host for the first time using vSphere Authentication Proxy service (CAM service) might fail with an error message similar to the following:
Cannot complete login due to an incorrect user name or password
This issue occurs when the Authentication Proxy service creates the account on one Domain Controller and the ESXi host communicates with another Domain Controller resulting in a Domain Controller replication delay. Thus the ESXi host fails to connect to the Domain Controller (DC) with the newly created account.
This issue does not occur when the ESXi host is added directly to the Active Directory domain without using the Authentication Proxy service.
This issue is resolved in this release.