OpenNTPD works well few days and then not sync
-
It is possible this is utterly irrelevant, but the "bad file descriptor" reports happen soon after the increasingly unlikely clock adjustment values start getting into territory where the number of seconds could be overflowing 32 bits.
The reported clock adjustments don't seem to be taking effect and there is no indication any of the ntpd processes notice.
May 12 09:57:36 pfsense2 ntpd[51007]: adjusting local clock by 1064411700.356701s
May 12 09:58:07 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1064411738.686966 delay 0.000582, next query 31s
May 12 09:58:38 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1064411738.697569 delay 0.000529, next query 31s
May 12 09:59:09 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1064411738.708118 delay 0.000622, next query 31s
May 12 09:59:40 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1064411738.718725 delay 0.000528, next query 33s
May 12 10:00:14 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1064411738.724996 delay 0.000608, next query 34s
May 12 10:00:14 pfsense2 ntpd[51007]: adjusting local clock by 1221456052.912002s
May 12 10:00:48 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.081884 delay 0.000559, next query 33s
May 12 10:01:21 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.093124 delay 0.000612, next query 31s
May 12 10:01:52 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.103718 delay 0.000561, next query 33s
May 12 10:02:25 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.107547 delay 0.000482, next query 30s
May 12 10:02:55 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.102751 delay 0.000563, next query 33s
May 12 10:03:29 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.092541 delay 0.000505, next query 32s
May 12 10:04:01 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -1221456092.094274 delay 0.000611, next query 31s
May 12 10:04:01 pfsense2 ntpd[51007]: adjusting local clock by 157044313.117055s
May 12 10:04:32 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -157044353.444915 delay 0.000585, next query 32s
May 12 10:05:04 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -157044353.455843 delay 0.000539, next query 31s
May 12 10:05:35 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -157044353.466427 delay 0.000568, next query 30s
May 12 10:06:05 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -157044353.476696 delay 0.000624, next query 31s
May 12 10:06:37 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset -157044353.482255 delay 0.000475, next query 32s
May 12 10:06:37 pfsense2 ntpd[51007]: adjusting local clock by -907367426.381670s
May 12 10:07:08 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.221527 delay 0.000543, next query 31s
May 12 10:07:39 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.210924 delay 0.000544, next query 33s
May 12 10:08:12 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.199629 delay 0.000552, next query 33s
May 12 10:08:46 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.187986 delay 0.000479, next query 30s
May 12 10:09:16 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.177740 delay 0.000549, next query 31s
May 12 10:09:47 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.167141 delay 0.000569, next query 32s
May 12 10:10:19 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.156223 delay 0.000628, next query 31s
May 12 10:10:49 pfsense2 ntpd[51062]: reply from 192.168.211.173: offset 907367385.150583 delay 0.000552, next query 30s
May 12 10:10:49 pfsense2 ntpd[51007]: adjusting local clock by -2128823517.229217s
May 12 10:11:19 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: No such file or directory
May 12 11:04:27 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 11:58:35 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 12:52:54 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 13:44:00 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 14:37:21 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 15:32:00 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 16:25:03 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 17:19:13 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 18:13:09 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 19:03:46 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 19:53:51 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 20:47:39 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 21:39:25 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 22:31:07 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor
May 12 23:25:18 pfsense2 ntpd[51062]: recvmsg control format 192.168.211.173: Bad file descriptor -
As an extra data point, this is definitely an OpenNTPD issue as it's even happening on 2.0.2 images I'm building and testing. My ALIX has been up ~16hrs and is logging that its clock is being reset by around -1400000 seconds.
We're discussing dropping OpenNTPD in favor of FreeBSD's even though we've hesitated in the past. We'll have to work around its limitations in other ways, but it's better than having a wonky clock.
-
"We're discussing dropping OpenNTPD"
Sweet!!!ย No more having to edit the ntp.conf every update to pfsense and manually start ntpd.
-
Current snaps should be switched to ntpd, works fine for me so far on a vm.
Some notes that I mentioned to other devs about the switch:
1. The boot-time behavior needs some observation, not sure if what I'm doing there is sufficient. I pass -g to ntpd which allows a large first skew, but I'm not sure if it's fast enough to sync at boot. We may need to bring back the call to ntpdate in rc before launching.
2. FreeBSD's ntpd has no concept of selective binding/listening, it listens on all by default, and we can't use its restrict keywords because they are not compatible with *.pool.ntp.org - the restrict keywords can't handle multiple A records being returned. As such, I removed Services > OpenNTPD since the options were now irrelevant.
3. I added a status page at Status > NTP that parses the ntpq output
Adding on 3, I committed a fix today to clean the page up and make it a little better, but that's not in a snapshot yet.
-
That may be possible. If you pkg_add -r that and it works with those keywords, it would certainly be worth adding in.
FYI- checked in a bunch more fixes this afternoon for various issues I found, it's fairly solid now.
If we can get the interface bindings to go that'll be the icing on the cake.
As an added bonus, this ntpd support ipv6, openntpd did not.
-
Just a little html error need to be corrected in status_ntpd.php :
Tally code
Server
Ref ID
Stratum
Type
When
Poll
Reach
Delay
Offset
Jittershould be :
Tally code
Server
Ref ID
Stratum
Type
When
Poll
Reach
Delay
Offset
JitterBTW, ntpd works well and stay synchronized since one hour. I will check for the long term. Thanks !!!
-
That's already been corrected.
-
A question about a somewhat related topic: When there is a large time skew, is it possible to sync time without rebooting pfsense (which invokes ntpdate on startup) ?
I run pfsense in a VM and suspend/hibernate the host PC every evening, which results in several hours time diff when the PC wakes up the next morning. Unfortunately in this scenario ntpd won't sync to the correct time, even after pfsense has been running for several hours.
I could create a cron job to periodically run ntpdate -s xyz, but maybe pfsense offers a better way which I haven't found yet.
-
On new snaps (not out yet) or 2.0.2 you can run /usr/local/sbin/ntpdate_sync_once.sh
That will kill ntpd, run ntpdate to force a sync, then restart ntpd.I suppose I could add a button to Status > NTP to invoke that.
-
On latest Snap, nano i386, D510MO.ย Would not stay synced without use of JimP script.
Changed /var/etc/ntpd.conf.
On server lines, removed iburst, set minpoll 6 maxpoll 6.Staying within 1 ms for 12 hours now.
-
The values after the server line were pulled from FreeBSD's defaults. Not sure why yours needed to be changed that much.
What was the clock doing when you said it wouldn't stay in sync?
We may need to have a settings page to tweak those values if the defaults don't work for everyone.
-
maxpoll 6 is pretty aggressive.. This is talking to ntp server every 64 seconds.. This really should not be required, normally maxpoll will work its way up to 9, or 1024 seconds.
Normally you should not have to make changes to these settings.
-
OK, locking this topic for now so it doesn't attract more random ntp issues, this thread originally was for OpenNTP losing sync which is very far away from tweaking ntpd.
I may split some older posts from this topic off into separate threads since there are still interesting things in them, but they have diverged from the original topic.