I’ve been harping about this for years, but a couple of recent customer situations have emphasized the importance of correct time/NTP configuration for all of your vSphere components.
In one situation, incorrect time configuration almost completely broke a Nutanix Storage cluster! There was no problem with Nutanix or VMware, but rather networking challenges prevented the correct use of NTP by both vSphere and Nutanix CVM, and the situation required the complete shutdown of the cluster to correct.
In another situation, incorrect time on ESXi Hosts prevented VMware Fault Tolerance (VMware FT – now supporting up to 8 vCPU & 128GB vRAM per FT protected Virtual Machine (Enterprise Plus)) from working at all. It seems that a small difference in time on the ESXi Hosts from the datastore time stamp would prevent FT failover.
Lastly, the default properties of VMware Tools will make the ESXi Host of the Virtual Machine the authoritative time server for that VM! The current time on an ESXi Host will take precedence over Active Directory, NTP or any other configurations unless you take specific steps to prevent VMware Tools Time Synchronization. That means, if you have DRS enabled or use vMotion, time will jump around on your VMs and cause disconnections from transactional systems such as Email and Database.
Incorrect use or configuration of time in a vSphere Environment is probably one of the most common things I find when exposed to a new environment, so I have outlined the steps you need to take to avoid problems.
- Start by disabling VMware Tools time synchronization for your Domain Controllers and other NTP Server Daemons. This will let those VMs accurately synchronize time with NTP Servers you have specified.
- Here are the instructions from VMware: https://kb.vmware.com/s/article/1189
- Accurately set NTP servers for your ESXi Hosts. Use either Stratum 1/2 NTP servers or a Network Time Server like the Sonoma CDMA Network Time Server
- Test your ESXi time configuration. An ESXi Host will not create an error if you specify an invalid time source, if the time source is not reachable on the network, or if port 123 is blocked on the Firewall. In other words, you don’t know if NTP is working unless you test it! TO test NTP, open a SSH console to each ESXi Host and run the command: ntpq -p Your results should look like this:
Columns you should be aware of are:
|st||Stratum of NTP server you are actually connecting to. Stratum 1 &2 are considered ideal.|
|t||Type of NTP, most commonly (u) Unicast|
|when||The (when) column should normally be less than the (poll) interval. (when) may occasionally exceed (poll) due to availability or network issues, however if (when) exceeds (poll) for all time servers it means that the NTP servers were resolvable by DNS, but not contactable on NTP Port 123|
|jitter||Jitter represents difference in milliseconds between samples. High (jitter) NTP servers should be replaced. Overall high (jitter) means that NTP is struggling with other time synchronization methods, such as a Linux VM using VMware Tools time synchronization and also having NTP servers specified. – Use only one form of time synchronization at a time.|