NTP time sync issue
-
I find that unlikely and most likely a delay in where your reading the time… Why don't you just check with w32tm or ntp if that is what your running on your laptop..
This will tell you exactly how far off your laptop is off from its ntp server..
No delay in reading, because the Window time was displayed at the right bottom corner of Window task bar, while the pfSense ntp status: server time was display at the middle of the browser, I can see the 2 times at the same time: 3 seconds difference.
William
-
dude.. Really…
How are you syncing time?? just windows, or are you running ntp actually...
-
dude.. Really…
How are you syncing time?? just windows, or are you running ntp actually...
I'm syncing Windows time with pfSense, I set the Windows internet time server to 192.168.1.1, the pfSense LAN interface, My pfSense box run as a NTP server. from what I saw is that Windows time is faster than pfSense NTP server time. not sure why, anyway, here is:
-
Hmmm, All of a sudden, Windows time is now synced with pfSense time, very odd.
EDIT: thats because my wife stopped watching online movies.
-
Here is what I found, which can be reproducable. I have Snort and pfBlockerNG installed on the pfSense box btw. Whenever there is heavy internet activities involved, e.g. my wife was watching online movies, the pfSense NTP server time was displayed in couple of seconds slower than all the connected devices whose times were synced with it, which made me thinking that time were not synced for each other, until all the heavy internet activities were stopped.
This is not good, I thought the system is multi-threaded, every service should be working properly. A couple of seconds delay is HUGE.
Is it just the NTP widget issue or the entire NTP service issue?
-
Same things happens here, I have Win 7 x64 Internet time set to pfsense 2.2.6
C:\>w32tm /stripchart /computer:172.24.xx.xxx Suivi de 172.24.xx.xxx[172.24.xx.xxx:123]. L'heure actuelle est 2015-12-28 00:16:50. 00:16:50 d:+00.0008824s o:+01.9393273s [ | * ] 00:16:52 d:+00.0008465s o:+01.9389158s [ | * ] 00:16:54 d:-00.0000914s o:+01.9393138s [ | * ] 00:16:56 d:-00.0000847s o:+01.9394665s [ | * ] 00:16:58 d:-00.0000822s o:+01.9392438s [ | * ] 00:17:00 d:-00.0000751s o:+01.9392295s [ | * ] 00:17:02 d:-00.0000769s o:+01.9392350s [ | * ] 00:17:04 d:-00.0001045s o:+01.9392911s [ | * ] 00:17:06 d:-00.0001090s o:+01.9392888s [ | * ] 00:17:08 d:-00.0001038s o:+01.9392729s [ | * ] 00:17:10 d:-00.0001074s o:+01.9392196s [ | * ] 00:17:12 d:+00.0000000s o:+00.0000000s [ * ] 00:17:14 erreur: 0x800705B4 00:17:17 erreur: 0x800705B4 00:17:20 erreur: 0x800705B4 00:17:23 d:+00.0010001s o:-00.0005000s [ * ] 00:17:25 erreur: 0x800705B4 00:17:28 erreur: 0x800705B4 00:17:31 erreur: 0x800705B4 00:17:34 d:+00.0030002s o:-00.0015001s [ * ] 00:17:36 erreur: 0x800705B4
-
For starters windows might not actually be trying to "sync" out of the box, it might just be setting the time every so often.. I you want it to "sync" up its clock with an ntp server then you really need to actually make sure that its doing that or install ntp.
As to clock being off when machine is under heavy load? Is pfsense a VM? Or running on hardware?
You do understand ntp isn't actually the clock on the system, it just adjusts the clock to run faster or slower to keep accurate time based on another clock it checks time against.. Ntp checks the time against another clock, and tells the system clock hey your running a bit fast, hey your running a bit slow lets make some tiny adjustments so you keep the same time as this really accurate clock I am checking against.
The ntp service checks this time every so often, when it first starts it will check more often then after its been running a while.. Notice the 1024 in my output that is how often its asking the time..
The stripchart you were running isn't any sort of sync just checking hey what is the time on that computer based on my time and showing you the offset and delay.. That you were getting errors is a bad sign for sure… This other guy is just show ing th is clock is off by 1.9 seconds.. Your saying when you get pfsense get busy its time is off? Maybe -- again ntp is not the actual CLOCK of the system..
-
Are you (the folks having issues) comparing the windows time to the "time" in the pfSense gui or are you actually doing ntp commands on the pfSense box (console or ssh)? Things like ntpq -pn, ntpstat, etc? It's highly likely that a GUI processs may need to go and poll on a periodic basis so there could be a discrepancy there (a heavy load may slow down updates to the GUI). Then there is "how does Windows actually update/set it's time"? It may not be as nice as NTP, so it could be bigger jumps less often.
-
By default, Windows only syncs once a week with the configured timeserver. That's a lot of time for drift and skew to make changes. When you said there was a 3 second offset, at that time, when was the last time your client was synced?
-
@mer:
Are you (the folks having issues) comparing the windows time to the "time" in the pfSense gui or are you actually doing ntp commands on the pfSense box (console or ssh)?
I am comparing the windows time to the "time" on the pfSense gui, which is Dashboard->NTP Status->Server time.
As to clock being off when machine is under heavy load? Is pfsense a VM? Or running on hardware?
Pfsense, not a VM, it was streaming online movies from outside world.
When you said there was a 3 second offset, at that time, when was the last time your client was synced?
I synced/updated the Windows time just a minute before, the time was synced successfully, both times were the same, and kept the same for a period of times. But as long as there is any heavy internet activities, then pfSense clock (Server time) on webgui suddenly were off, slower than the clients' times, made it looks like both were no longer synced, until no heavy loads, it (pfSense clock) then recovered by its own.
EDIT: Just minutes ago, I re-opened the NTP status screen at pfSense Dashboard, the pfSense clock is now 2 seconds faster than my Window clients without any heavy loads, so I'm really confused to which time is accurate.
-
from the console or ssh into the pfSense box get the output of the command "ntpq -pn" (I think there is a way to actually enter commands from the WebGui, I just don't have it in front of me right now).
That will give information on the state of NTP sync on the pfSense box: what peers are playing, offsets, jitter, etc. There may be a command called "ntpstat" that gives a quick summary of synchronized, to what server and how close the time is to that server.
If the pfSense NTP is synchronized to a server and the Windows box is showing a different time I'd be inclined to believe the pfSense box as correct, but only what you see from the ntpq/ntpstat commands, not the GUI (to eliminate any update delays of the GUI)
-
Using putty:
-
@mer:
from the console or ssh into the pfSense box get the output of the command "ntpq -pn" (I think there is a way to actually enter commands from the WebGui, I just don't have it in front of me right now).
I'd use:
ntpq -c 'rl' -wpIf that causes an error, leave out the w. It's best to paste the results as text using the 'Code' feature (the # above the editor). The values of sys_jitter, clk_jitter and clk_wander are of particular interest. It would also be useful to know where you are (roughly) and what WAN connection(s) you have.
Any time displayed via the dashboard plugin is subject to so many possible delays as to make it worthless for debugging purposes.
Windows' time synchronisation is designed for low load on the time servers, not a good quality lock to network time. Windows' internal timing is subject to some granularity issues (and therefore has a quality ceiling) prior to Windows 8.
One important issue is that the timing components in the machines we use are poorly temperature compensated. If a pfSense machine has fluctuating load, its internal temperature can fluctuate all over the place. Unless you have a pulse-per-second source feeding into the machine (such as a GPS receiver) or are synchronising every 16 seconds to a local Stratum 1 server (do not do this with a Stratum 1 on the Internet - it's extremely antisocial), it will take a long time for ntp's PLL to settle down after any temperature related disturbances.
Better quality timing components are available, but as most users don't care about the quality of their timing, there is no incentive for the manufacturers to pay the significant additional costs when building mass market hardware. The manufacturers care about every penny on the bill of materials, so are not interested in paying substantially more for temperature compensated or ovenised oscillators.
-
@mer:
from the console or ssh into the pfSense box get the output of the command "ntpq -pn" (I think there is a way to actually enter commands from the WebGui, I just don't have it in front of me right now).
I'd use:
ntpq -c 'rl' -wpIf that causes an error, leave out the w. It's best to paste the results as text using the 'Code' feature (the # above the editor). The values of sys_jitter, clk_jitter and clk_wander are of particular interest. It would also be useful to know where you are (roughly) and what WAN connection(s) you have.
Any time displayed via the dashboard plugin is subject to so many possible delays as to make it worthless for debugging purposes.
Windows' time synchronisation is designed for low load on the time servers, not a good quality lock to network time. Windows' internal timing is subject to some granularity issues (and therefore has a quality ceiling) prior to Windows 8.
One important issue is that the timing components in the machines we use are poorly temperature compensated. If a pfSense machine has fluctuating load, its internal temperature can fluctuate all over the place. Unless you have a pulse-per-second source feeding into the machine (such as a GPS receiver) or are synchronising every 16 seconds to a local Stratum 1 server (do not do this with a Stratum 1 on the Internet - it's extremely antisocial), it will take a long time for ntp's PLL to settle down after any temperature related disturbances.
Better quality timing components are available, but as most users don't care about the quality of their timing, there is no incentive for the manufacturers to pay the significant additional costs when building mass market hardware. The manufacturers care about every penny on the bill of materials, so are not interested in paying substantially more for temperature compensated or ovenised oscillators.
ntpq -c 'rl' -wp:
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync, version="ntpd 4.2.8p4@1.3265-o Mon Oct 26 14:28:17 UTC 2015 (1)", processor="amd64", system="FreeBSD/10.1-RELEASE-p25", leap=00, stratum=3, precision=-22, rootdelay=65.209, rootdisp=77.293, refid=159.203.8.72, reftime=da2c0742.c3cc6660 Mon, Dec 28 2015 14:08:50.764, clock=da2c08e2.c4fed519 Mon, Dec 28 2015 14:15:46.769, peer=59118, tc=9, mintc=3, offset=-1.970223, frequency=21.851, sys_jitter=2.309164, clk_jitter=2.943, clk_wander=0.201 remote refid st t when poll reach delay offset jitter ============================================================================== *159.203.8.72 192.5.41.209 2 u 416 512 377 25.338 -1.970 2.309 +zero.gotroot.ca 30.114.5.31 2 u 272 512 377 61.560 -9.818 2.231 -ntp.tranzeo.com 206.108.0.132 2 u 342 512 377 20.040 -9.387 2.651 +penguin.hopcount.ca 200.98.196.212 2 u 385 512 377 15.905 -7.855 3.722
Living in Toronto, Canada, and using Rogers internet cable at 250/20 mbps.
-
It looks as if you are using relatively local NTP servers to your physical location, which is good. You are using a cable modem, which can lead to high jitter, but c. 2.5ms system jitter isn't the end of the world.
Precision -22 suggests you might be using the on-processor TSC as your timing source.
Post the results of:
sysctl kern.timecounter.choice kern.timecounter.hardware
which will show the timecounter choices and weights, also the timecounter your system is currently using.My experience of running NTP servers on bare metal FreeBSD installations is that the HPET is often a more stable timing source. If you want to give this a go, add a line to /boot/loader.conf.local that reads:
kern.timecounter.tc.HPET.quality=5000You'll need to create /boot/loader.conf.local if it doesn't already exist. Once you've made the change, delete the ntpd.drift file (its contents are invalidated by the change of timing source) using:
pkill ntpd ; rm /var/db/ntpd.drift
then reboot. After the system has been running for at least 12 hours, re-run the command I gave in the previous post. Hopefully clk_jitter is significantly lower than the 2.9ms in your earlier output. -
Post the results of:sysctl kern.timecounter.choice kern.timecounter.hardware
sysctl kern.timecounter.choice kern.timecounter.hardware kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) i8254(0) HPET(950) dummy(-1000000) kern.timecounter.hardware: TSC-low
-
-
delete the ntp.drift file
you meant: ntpd.drift ? which I deleted.
That is the file I meant. I've edited the earlier post accordingly.
As I thought, your system had chosen TSC (well TSC-low, though the difference is not material here) as its timecounter. If you repeat that command having made the change I suggested to /boot/loader.conf.local, you should find the quality figure after HPET is now 5000 and that kern.timecounter.hardware is now HPET. It will be interesting to see whether that proves to have lower jitter (clk_jitter) and at least as good short-term stability (clk_wander) as TSC.
It may take 24 hours for things to settle down as ntpd had no drift file value to start from.
It is worth turning on pfSense's ntp RRD graphs in Services -> NTP, though I would strongly recommend you apply the patch in https://redmine.pfsense.org/issues/4423 first.
-
The offset spikes are correlated with changing system load.
time2.google.com seems to be going to crap.
$ ntpq -c 'rl' -wp associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync, version="ntpd 4.2.8p4@1.3265-o Mon Oct 26 14:28:17 UTC 2015 (1)", processor="amd64", system="FreeBSD/10.1-RELEASE-p24", leap=00, stratum=3, precision=-22, rootdelay=31.673, rootdisp=42.253, refid=216.239.38.15, reftime=da2c4aef.ab39b7b0 Mon, Dec 28 2015 17:57:35.668, clock=da2c5259.63466f54 Mon, Dec 28 2015 18:29:13.387, peer=10249, tc=9, mintc=3, offset=0.429836, frequency=-25.102, sys_jitter=0.149028, clk_jitter=0.241, clk_wander=0.005 remote refid st t when poll reach delay offset jitter ============================================================================== +ra.steadfastdns.net 216.86.146.46 2 u 140 512 377 14.474 0.305 0.356 +rb.steadfastdns.net 216.86.146.46 2 u 254 512 377 14.848 0.267 0.442 -dns1.steadfast.net 216.86.146.46 2 u 299 512 377 15.048 0.428 0.346 +time1.google.com 120.249.107.194 2 u 290 512 377 24.103 0.586 0.206 -time2.google.com 217.167.3.118 2 u 527 512 377 34.978 -2.121 2.197 +time3.google.com 46.254.142.6 2 u 234 512 377 37.441 0.580 0.462 *time4.google.com 112.106.149.195 2 u 327 512 377 24.257 0.455 0.290
-
If you repeat that command having made the change I suggested to /boot/loader.conf.local, you should find the quality figure after HPET is now 5000 and that kern.timecounter.hardware is now HPET.
It seemed no change, same as before, I did the reboot:
[2.2.6-RELEASE][root@router.home]/root: sysctl kern.timecounter.choice kern.timecounter.hardware kern.timecounter.choice: TSC-low(1000) ACPI-safe(850) i8254(0) HPET(950) dummy(-1000000) kern.timecounter.hardware: TSC-low
and my /boot/loader.conf.local:
ahci_load="YES" kern.timecounter.tc.HPET.quality=5000
EDIT: just checked, the ntpd.drift was auto created again, and ntpq -c 'rl' -wp:
[2.2.6-RELEASE][root@router.home]/root: ntpq -c 'rl' -wp associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync, version="ntpd 4.2.8p4@1.3265-o Mon Oct 26 14:28:17 UTC 2015 (1)", processor="amd64", system="FreeBSD/10.1-RELEASE-p25", leap=00, stratum=2, precision=-22, rootdelay=13.651, rootdisp=539.738, refid=206.108.0.131, reftime=da2c62f2.f7edec27 Mon, Dec 28 2015 20:40:02.968, clock=da2c667c.d9dcdc22 Mon, Dec 28 2015 20:55:08.851, peer=26673, tc=8, mintc=3, offset=18.733259, frequency=14.160, sys_jitter=2.707991, clk_jitter=4.381, clk_wander=0.158 remote refid st t when poll reach delay offset jitter ============================================================================== *ntp1.torix.ca .PPS. 1 u 108 256 377 13.651 18.733 3.819 +ns509831.ip-167-114-101.net 192.95.25.79 3 u 247 256 377 38.462 21.008 2.539 +zero.gotroot.ca 30.114.5.31 2 u 45 256 377 63.321 18.919 5.488 -ntp3.torix.ca .PPS. 1 u 250 256 377 12.037 17.734 3.188