NTP problems after upgrade to 2.2-RC (Jan 08)

skywalker

I have finally upgraded two boxes from 2.1.5 to 2.2-RC and everything works a expected except NTP.
I don't know what really has changed related to NTP, but I can't get it working as before.
The ntp daemon is running and remotely reachable, but the clients don't expect the time.


$ ntpq -c peers 10.2.1.1
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+time.ostseehaie 131.188.3.221    2 u   10   64    1   25.972   98.203   2.218
*stratum2-3.ntp. 129.70.130.70    2 u   11   64    1   17.144   96.815   2.613
-minion.webershe 192.53.103.104   2 u   14   64    1   24.525   98.300   1.728
+foxtrot.zq1.de  122.227.206.195  3 u   14   64    1   24.565   94.514   1.454


$ ntpdate -q 10.2.1.1
server 10.2.1.1, stratum 16, offset -0.144872, delay 0.02753
 9 Jan 12:33:01 ntpdate[12523]: no server suitable for synchronization found

Not sure why it responds with stratum 16, I am not really an ntp expert, but it used to work before the upgrade.
Anyone having any hint for me how to investigate this?

thanks, Till

charliem

Your machine cannot get the time from the peers: 'reach' is shown as 1, and for a properly running system reach should eventually move up to 377. Clients don't accept the time because the server really is at stratum 16.

http://www.ntp.org/ntpfaq/NTP-s-trouble.htm#Q-MON-REACH

How have you configured NTP on the pfSense web gui? Can you post contents of /var/etc/ntpd.conf? Anything interesting in the NTP syslogs? You can attach the output of "clog /var/log/ntpd.log | tail -100" if you aren't sure whether it's interesting or not :)

ntpd also has to run a while after starting or re-starting. Maybe your connection and setup are OK, and you are just looking at the server in the first minute or so after startup?

skywalker

Hi,

thanks for your offer to help, I can't see any reason why it shouldn't be able to reach it's peers.
I was expecting the pfsense to retrieve the time from pool servers and all clients to retrieve time from pfsense then.

regards, Till

/var/etc/ntpd.conf:


2.2-RC][admin@pfsense6.middle.earth]/root: less /var/etc/ntpd.conf
# 
# pfSense ntp configuration file 
# 

tinker panic 0 
# Orphan mode stratum
tos orphan 12

# Upstream Servers
server 0.de.pool.ntp.org iburst maxpoll 9
server 1.de.pool.ntp.org iburst maxpoll 9
server 2.de.pool.ntp.org iburst maxpoll 9
server 3.de.pool.ntp.org iburst maxpoll 9

disable monitor
statsdir /var/log/ntp
logconfig =syncall +clockall
driftfile /var/db/ntpd.drift
restrict default kod limited nomodify nopeer notrap
restrict -6 default kod limited nomodify nopeer notrap

interface ignore all
interface listen vr0

Logs since last daemon restart:


Jan  9 12:32:28 pfsense6 ntpd[53450]: ntpd 4.2.8@1.3265-o Mon Dec 22 14:36:36 UTC 2014 (1): Starting
Jan  9 12:32:28 pfsense6 ntpd[53450]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
Jan  9 12:32:28 pfsense6 ntpd[53587]: proto: precision = 2.323 usec (-19)
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen and drop on 0 v6wildcard [::]:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen normally on 2 vr0 10.2.1.1:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen normally on 3 vr0 [fe80::20d:b9ff:fe20:f8d0%1]:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: setsockopt IPV6_MULTICAST_IF 0 for fe80::20d:b9ff:fe20:f8d0%1 fails: Can't assign requested address
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen normally on 4 lo0 127.0.0.1:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listen normally on 5 lo0 [::1]:123
Jan  9 12:32:28 pfsense6 ntpd[53587]: Listening on routing socket on fd #26 for interface updates

skywalker

b.t.w. reach is not 1 anymore, but stratum is still 16:


[2.2-RC][admin@pfsense6.middle.earth]/root: ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+time.ostseehaie 131.188.3.221    2 u    3   64  377   20.961  325.163 113.461
*stratum2-3.NTP. 129.70.130.71    2 u   17   64  177   15.939  228.837  75.943
+minion.webershe 192.53.103.104   2 u   15   64  177   20.602  318.756 111.715
-foxtrot.zq1.de  122.227.206.195  3 u   12   64  377   21.298  165.432 111.714

charliem

@skywalker:

b.t.w. reach is not 1 anymore, but stratum is still 16:


[2.2-RC][admin@pfsense6.middle.earth]/root: ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+time.ostseehaie 131.188.3.221    2 u    3   64  377   20.961  325.163 113.461
*stratum2-3.NTP. 129.70.130.71    2 u   17   64  177   15.939  228.837  75.943
+minion.webershe 192.53.103.104   2 u   15   64  177   20.602  318.756 111.715
-foxtrot.zq1.de  122.227.206.195  3 u   12   64  377   21.298  165.432 111.714

Your peer output looks OK to me: second host is selected as system peer, first and third hosts are selected as candidates, and the last host is rejected. This should match what you see in the gui under status –> ntp tab

What's the output of the sysinfo command in ntpq? The rv command?

skywalker

see below for the output.
What really struck me here is that the reach goes up for a while and then falls back to 1.
Surprisingly the same configuration did work before the upgrade (well, at least none of my clients complained before).

thanks again for looking into this.

-Till


ntpq> sysinfo
associd=0 status=0613 leap_none, sync_ntp, 1 event, spike_detect,
system peer:        stratum2-3.NTP.TechFak.NET:123
system peer mode:   client
leap indicator:     00
stratum:            3
log2 precision:     -19
root delay:         18.136
root dispersion:    252.075
reference ID:       129.70.132.36
reference time:     d85b7f77.d360dde9  Sat, Jan 10 2015 11:37:43.825
system jitter:      42.708488
clock jitter:       26.742
clock wander:       1.427
broadcast delay:    0.000
symm. auth. delay:  0.000


ntpq> rv
associd=0 status=0613 leap_none, sync_ntp, 1 event, spike_detect,
version="ntpd 4.2.8@1.3265-o Mon Dec 22 14:36:36 UTC 2014 (1)",
processor="i386", system="FreeBSD/10.1-RELEASE-p3", leap=00, stratum=3,
precision=-19, rootdelay=18.136, rootdisp=252.630, refid=129.70.132.36,
reftime=d85b7f77.d360dde9  Sat, Jan 10 2015 11:37:43.825,
clock=d85b8011.78a8ff76  Sat, Jan 10 2015 11:40:17.471, peer=38903, tc=6,
mintc=3, offset=109.039377, frequency=-198.172, sys_jitter=42.814557,
clk_jitter=26.742, clk_wander=1.427
ntpq> ntpq> rv
associd=0 status=0613 leap_none, sync_ntp, 1 event, spike_detect,
version="ntpd 4.2.8@1.3265-o Mon Dec 22 14:36:36 UTC 2014 (1)",
processor="i386", system="FreeBSD/10.1-RELEASE-p3", leap=00, stratum=3,
precision=-19, rootdelay=18.136, rootdisp=252.780, refid=129.70.132.36,
reftime=d85b7f77.d360dde9  Sat, Jan 10 2015 11:37:43.825,
clock=d85b801a.dc1e50e8  Sat, Jan 10 2015 11:40:26.859, peer=38903, tc=6,
mintc=3, offset=109.039377, frequency=-198.172, sys_jitter=42.814557,
clk_jitter=26.742, clk_wander=1.427

charliem

@skywalker:

see below for the output.

The output looks OK to me, that machine is sync'd and operating as a stratum 3

What really struck me here is that the reach goes up for a while and then falls back to 1.
Surprisingly the same configuration did work before the upgrade (well, at least none of my clients complained before).

'reach' is a living number, updated every time a response is expected from your system peer(s). (Please re-visit the link posted above). If it decreases from 377, then you are not getting valid replies from you system peer(s). Something may be blocking port 123.