Ntpd silently exiting if time is substantially off

dhatz

Apparently the "tinker panic 0" change does fix the issue afterall:

ntpd doesn't exit anymore if time diff > 1000s, and
eventually it re-syncs time (didn't monitor it closely enough to see how fast it does it)

jimp

I committed the fix, should be in the next snapshot.

gogol

I committed the fix, should be in the next snapshot.

My opinion is that NTP was not intended for use on a virtual machine and this setting should be an option as a workaround.
I know NTP is still under development but it is not very secure with missing "restrict" lines.

johnpoz

"My opinion is that NTP was not intended for use on a virtual machine"

Funny how this is not vmwares opinion ;)

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
Timekeeping best practices for Linux guests

Note: VMware recommends you to use NTP instead of VMware Tools periodic time synchronization. NTP is an industry standard and ensures accurate time keeping in your guest. You may have to open the firewall (UDP 123) to allow NTP traffic.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1318
Timekeeping best practices for Windows, including NTP

Windows Version Recommended Time Sync Utility
Windows 2008 w32time or NTP
Windows Vista w32time or NTP
Windows 2003 w32time or NTP
Windows XP NTP
Windows 2000 NTP

As to security concerns - the configuration tab allows you to restrict which interfaces it will listen on. I don't see it much of a concern if you let your time server serve time to your local lan ;)

I am sure as the addition of it gets more mature that more detailed configuration like specific restricts would be coming - worse case you can always modify the ntpd.conf in /etc/var if your really paranoid.

gogol

@johnpoz:

Funny how this is not vmwares opinion ;)

That is understandable from their point of view.
I don't say I made a study of the NTP protocol but I read the newsgroup and am a Pool member (so I want to serve time on WAN). As far as I understand the NTP protocol it is made for 24/7, even under VMware ;)
It is very good that PfSense switched to the latest version of the NTP protocol because it is actively developed.

I switched to PfSense as week ago and am now studying the behavior of NTP on my box. It certainly behaves well!
I know now how to change the settings in /etc/inc/system.inc (and not /var/etc/ntpd.conf!; overwritten at ntpd restart), but these are also not permanent. That is how I have done it for now.

gerdesj

My personal experience of time sync on vmware and physical over several years has yielded the following:

Set the ESXis to sync via ntp to five sources
Windows DCs - use vm guest tools to sync to the host they are on
Windows non DCs - leave at defaults, ie sync to the PDC emulator
Unix style systems (*BSD, Linux et al) - sync via ntpd to the hosts

I watch timesync on around 500 odd systems around the country (UK) via Nagios and they all agree on time to within the last one or two milli-seconds depending on OS (Unix is best, Windows worst, if you count a milli-second drift on a VM as "bad").

I have not had to restart either ntpd or "windows time" in a very long … time using these rules.

With a manually configured ntpd I use tinker panic 0 to avoid a 30 second drift being considered "insane". I also use iburst on the server lines to get a much quicker initial sync, and [ssh] I see PF does as well.

I used to use three pool systems but found that after a few weeks/months time would start to drift. Since using five ({0,1,2,3}.pool.ntp.org and 0.uk.pool.ntp.org) I have not seen that behaviour on any system I manage in at least the last four years.

Cheers
Jon

dhatz

I didn't find the time yet to monitor ntpd closely during a VM suspend cycle, but empirically I can say it can take quite a long time for ntpd to sync, e.g. it's been over 1hr since I restarted the suspended VM, yet ntpd still hasn't corrected the system time:

ntpq -p
remote refid st t when poll reach delay offset jitter

cache.asda.gr 131.188.3.221 2 u 392 512 377 16.808 3499206 1870404
stitch.fr.zerol 192.93.2.20 2 u 399 512 377 72.440 3499206 1870404
noc.be.it2go.eu 193.190.230.65 2 u 160 512 377 89.826 3499206 1322575

jimp

We usually rely on ntpdate to make the big changes, and then let ntpd handle keeping the clock in line over time. If you restart ntpd (or save the settings, iirc) it will stop ntpd, run ntpdate, then restart ntpd.

It can take a long time for ntpd to recover from a large skew, since it will only step the clock by 0.128 second increments. This can be adjusted with the step parameter to the tinker config option, but I recall setting that larger had negative effects.

Though -x to ntpd might help…

-x Normally, the time is slewed if the offset is less than the step
threshold, which is 128 ms by default, and stepped if above the
threshold. This option sets the threshold to 600 s, which is
well within the accuracy window to set the clock manually. Note:
Since the slew rate of typical Unix kernels is limited to 0.5
ms/s, each second of adjustment requires an amortization interval
of 2000 s. Thus, an adjustment as much as 600 s will take almost
14 days to complete. This option can be used with the -g and -q
options. See the tinker command for other options. Note: The
kernel time discipline is disabled with this option.

dhatz

Btw ntpd finally sync'ed time, apparently in one big step, but it took ~1.5hr after the VM was resumed from yesterday's suspend. No message whatsoever in /var/log/ntpd.log

dhatz

As a test, I'm currently running ntpd with the following two lines added:

server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10

server says that the local system clock is a timeserver. fudge says that this server is stratum 10. If you are connected to the Internet then you are likely using timeservers who are more l33t than stratum 10 what time it is, and these servers are used because they have lower stratum and thus; higher priority

However, if you are disconnected from the Internet then they are unavailable and you're left with the local clock. Using fudge to say that the local clock is stratum 10 makes ntp use the local clock when no timeservers are available. This is good because it makes sure you can disconnect your box from the Internet without getting your clock screwed.

gogol

@dhatz:

As a test, I'm currently running ntpd with the following two lines added:

If you are trying to run ntpd isolated you should use "orphan mode" in this version of ntpd.

http://www.eecis.udel.edu/~mills/ntp/html/orphan.html

Ntpd silently exiting if time is substantially off

ntpq -p remote refid st t when poll reach delay offset jitter

ntpq -p
remote refid st t when poll reach delay offset jitter