Hello,
We have been troubleshooting an issue that prevents our vCenter server from connecting to some of our remote hosts. This has impacted 2 different vCenter servers running 5.1 and 5.5 on Windows Server 2008 R2 and 2012 R2.
Process leading to the error
- We are able to add hosts to a data center after a host reboot or fresh vCenter install
- If our primary data center MPLS goes down (maintenance or otherwise) we lose connectivity to all remote hosts
- One data center is able to reconnect without issue. This particular data center is our secondary data center
- No other remote sites are able to reconnect
Troubleshooting
- Disabled IPv6 across VMware infrastructure (Windows Servers, ESXi hosts)
- Increased handshakeTimeoutMs to 120000
- Restarted management network
- Cleared ARP table
- Lockdown mode is disabled
Notes
- We have a single ESX 4.1 host that is able to reconnect without issue (has only experienced one disconnect, but came back without issue unlike the 5.5 counterpart)
- We're able to connect to the hosts via vSphere console and SSH without issue
- The network team is troubleshooting the issue as well, but we've not been able to rule out VMware as the culprit
Logs
vpxd
2014-09-24T14:00:14.785-05:00 [05920 warning 'Default'] Failed to connect socket; <io_obj p:0x000000000d10a128, h:3876, <TCP '0.0.0.0:0'>, <TCP '10.x.x.16:443'>>, e: system:10060(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)
2014-09-24T14:00:14.785-05:00 [05920 error 'HttpConnectionPool-000001'] [ConnectComplete] Connect failed to <cs p:000000000ee4c730, TCP:xxxesxi01.xxx.com:443>; cnx: (null), error: class Vmacore::SystemException(A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)
2014-09-24T14:00:14.785-05:00 [05852 error 'httphttpUtil' opID=6159800D-000000AB-d6] [HttpUtil::ExecuteRequest] Error in sending request - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
2014-09-24T14:00:14.785-05:00 [05852 error 'vpxdvpxdHostAccess' opID=6159800D-000000AB-d6] [VpxdHostAccess::Connect] Failed to discover version: vim.fault.HttpFault
2014-09-24T14:00:14.786-05:00 [05852 info 'commonvpxLro' opID=6159800D-000000AB-d6] [VpxLRO] -- FINISH task-internal-5070 -- datacenter-31 -- vim.Datacenter.queryConnectionInfo --
2014-09-24T14:00:14.786-05:00 [05852 info 'Default' opID=6159800D-000000AB-d6] [VpxLRO] -- ERROR task-internal-5070 -- datacenter-31 -- vim.Datacenter.queryConnectionInfo: vim.fault.NoHost:
--> Result:
--> (vim.fault.NoHost) {
--> dynamicType = <unset>,
--> faultCause = (vmodl.MethodFault) null,
--> name = "xxxesxi01.xxx.com",
--> msg = "",
--> }
--> Args:
-->
Connection error
Call "Datacenter.QueryConnectionInfo" for object "XXX" on vCenter Server "VCENTER" failed.
Thanks
Removed network details Message was edited by: OptimalZ06