PfSense 2.2: ntpd keeps terminating and restarting

igormt

Good tip: everytime it is caused by the same thing:

php-fpm[61631]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 192.168.20.1 - Restarting packages.

And then there is a problem because ntp still sees its lock file (my guess is that the terminate and restart are happening too quickly).

Now 192.168.20.1 is an OpenVPN tunnel network (there are two, and sometimes it is one, sometimes it is the other, and sometimes both). The OpenVPN logs show OpenVPN server doing SIGTERM[hard,] and restarting. But it is not clear to me yet why. But apparently that triggers a re-start of all packages and ntpd has trouble restarting because it still sees its lock file…

Now to figure out what is going with OpenVPN. But testing confirms that the terminates/restarts of OpenVPN don't visibly disrupt on-going OpenVPN connections.

doktornotor

@igormt:

Good tip: everytime it is caused by the same thing:
php-fpm[61631]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - -> 192.168.20.1 - Restarting packages.
And then there is a problem because ntp still sees its lock file (my guess is that the terminate and restart are happening too quickly).

Yours at least terminates gracefully under these circumstances. For lot of people, it just crashes hard. https://redmine.pfsense.org/issues/4155

charliem

@doktornotor:

Yours at least terminates gracefully under these circumstances. For lot of people, it just crashes hard. https://redmine.pfsense.org/issues/4155

Can you add the '-U 0' switch to the ntpd command, to disable ntpd dynamic interface scanning? You have to modify /etc/system.inc, but that's been shown to work around the crash reported in bug 4155:
https://forum.pfsense.org/index.php?topic=78194.msg448222#msg448222

I suggest excluding ntpd from the package restart if possible, and instead rely on the dynamic interface scanning in ntpd for dealing with IP changes. Until then, the '-U 0' option may provide some relief.

doktornotor

@charliem:

Can you add the '-U 0' switch to the ntpd command, to disable ntpd dynamic interface scanning? You have to modify /etc/system.inc, but that's been shown to work around the crash reported in bug 4155:
https://forum.pfsense.org/index.php?topic=78194.msg448222#msg448222

I could, yes… I'd somehow expect this to get fixed properly though, plus, this entire "canonical" NTP implementation proves like a swiss-cheese buggy crap with retarded design. Frankly getting increasingly tired of it. I do not want it to listen on any WAN interface in the first place, so this kinda should be a total non-issue, except for the braindead design.

charliem

@doktornotor:

I could, yes… I'd somehow expect this to get fixed properly though, plus, this entire "canonical" NTP implementation proves like a swiss-cheese buggy crap with retarded design. Frankly getting increasingly tired of it. I do not want it to listen on any WAN interface in the first place, so this kinda should be a total non-issue, except for the braindead design.

With the current method of restarting packages, ntpd will restart every time the openvpn interface flaps; no way around that unless you exclude ntpd from being restarted. That's what started this thread.

WRT ntpd in general, personally ntpd has worked very well for my systems, pfSense included. That said, it's big, old and difficult to test, and some projects are looking for alternatives. Chrony is one alternative, and is being used by default in many distros, including RHEL 7 and Fedora (one of the primary authors works for redhat). It's also in the FreeBSD ports collection. Chrony is reported to converge somewhat quicker, and better handle service interruptions or even intermittent network connections of only a few minutes per day.
http://chrony.tuxfamily.org/ and
http://portsmon.freebsd.org/portoverview.py?category=net&portname=chrony

Longer term, Poul-Henning Kamp (a name familiar to both timenuts and FreeBSD old timers) is writing a new system from scratch: Ntimed, in separate versions (client, slave and master). Though promising, this is clearly a longer term option.
http://phk.freebsd.dk/time/index.html and
http://phk.freebsd.dk/_downloads/FOSDEM_2015.pdf

I would not consider for an instant going back to openntp.

If you are unhappy with ntpd and wish to contribute, you could bring up chrony on your pfSense box and report your findings. But until I have time to try chrony, I'll be sticking to the canonical ntpd.

stan-qaz

The Linux folks are sending some love (and money) to the NTP developers and that is likely to transfer over to BSD fairly quickly. I'd rather see it fixed up that see a bunch of different options competing for funds and time.

http://www.networkworld.com/article/2358202/security/core-infrastructure-initiative-to-delve-into-security-of-openssl–openssh--network-time-pro.html

In addition to funding a code audit of the OpenSSL protocol, CII today also said it’s directing its security review efforts to two other widely-used protocols: OpenSSH and Network Time Protocol (NTP).

doktornotor

Would not hold my breath. I mean… how much funding you need to stop doing retarded things like binding to each and every interface that exists on the box? People have been complaining and bugging upstream regarding this for ages.

charliem

@stan-qaz:

The Linux folks are sending some love (and money) to the NTP developers and that is likely to transfer over to BSD fairly quickly. I'd rather see it fixed up that see a bunch of different options competing for funds and time.

The NTP developers that LF is funding are PHK mentioned above, and Harlan Stenn, the current canonical ntpd maintainer, and they concluded that starting over was the best approach, hence Ntimed. From the PHK link above:

I spent a couple of months going through a lot of the source code, and there is a lot of it: 100 KLOC, looking for bad news, and fortunately I only found a few minor nits, worrying, but not advisory material.

100.000 lines of code is insane for a program which basically steers your clock to some remote server, it can be done in 1000 lines if you really squeeze it.

At the end of my review, I concluded that trying to slim down the current monster would be a lot more work and effort than simply starting from scratch.

David_W

@doktornotor:

Would not hold my breath. I mean… how much funding you need to stop doing retarded things like binding to each and every interface that exists on the box? People have been complaining and bugging upstream regarding this for ages.

In ntp 4.2.8 you can use 'interface' to specify which interfaces ntpd should bind to. I can't remember exactly when this was introduced, though pfSense has used it for some time. I haven't checked git, though I suspect pfSense used it from the time openntpd was dropped. pfSense shipped a ntp 4.2.7 build for some time before 4.2.8 was released. Whilst 4.2.7 was nominally a development version, the long delay between 4.2.6 and 4.2.8 meant that the 4.2.7 development builds were a better choice for some time in the run up to the 4.2.8 release.

Part of the problem for FreeBSD users is that the base system is doggedly sticking with the ancient ntp 4.2.4 plus security patches, so there is no support for 'interface' or 'pool'. ntp 4.2.8p1 is in ports (devel/ntp).

Though the fixes in 4.2.8p1 are minor (and I don't think the security fix is relevant to pfSense), it might be worth pfSense 2.2.1 updating to the latest release. It would also be good to see pfSense's configuration interface to allow configuration using 'pool' as an alternative to 'server', as well as specifying a pool or server as -4 (use IPv4) or -6 (use IPv6).

As has been said, work is underway on a scratch ntp reimplementation by PHK, which will initially target stratum >1 servers. I'm not sure a final decision has been made on what to do with reference clock drivers, though the view has been expressed that the reference clock drivers should move out of the ntpd daemon.

Autokey is going to be replaced by something else - it was rarely implemented by server operators, is incompatible with NAT and there was no real concept of public trust roots. The crypto code in ntpd was often the source of security problems in ntp. I do use Autokey on my network, but it's a pain to configure correctly and requires any server using Autokey against a remote ntp server to have a no-NAT address (not a problem for IPv6, but tricky with IPv4 unless, like me, you have multiple IPv4 addresses or an IPv4 netblock).

It's possible that some current ntp methods will not be supported in PHK's implementation, though I haven't been following discussions closely enough to know the latest thinking here. In particular, I wonder whether the symmetric and broadcast modes are still needed, or whether manycast suffices these days. I find manycast works very well on IPv6 on my network, not least as IPv6 tends to do the right thing when it comes to multicast (which ntp's manycast functionality relies on).

doktornotor

@David_W:

In ntp 4.2.8 you can use 'interface' to specify which interfaces ntpd should bind to.

Sure like hell does nothing useful wrt binding in any ISC NTPd version and sure like hell never ever worked.


$ ntpd --version
ntpd 4.2.8@1.3265-o Mon Dec 22 14:36:36 UTC 2014 (1)

$ grep interface /var/etc/ntpd.conf
interface ignore all
interface listen ste0

$ netstat -an | grep .123
udp6       0      0 ::1.123                *.*
udp4       0      0 127.0.0.1.123          *.*
udp6       0      0 2001:470:dead:beef:1.123   *.*
udp6       0      0 fe80::21f:c6ff:f.123   *.*
udp4       0      0 192.168.0.254.123      *.*
udp4       0      0 *.123                  *.*
udp6       0      0 *.123                  *.*

charliem

@David_W:

Though the fixes in 4.2.8p1 are minor (and I don't think the security fix is relevant to pfSense), it might be worth pfSense 2.2.1 updating to the latest release.

pfSense 2.2 is already on 4.2.8.

Already It would also be good to see pfSense's configuration interface to allow configuration using 'pool' as an alternative to 'server', as well as specifying a pool or server as -4 (use IPv4) or -6 (use IPv6).

Support is there, just not yet in the gui or php code. To make experimental changes to your config, behind the gui, edit /etc/inc/system.inc, function system_ntp_configure around line 1492. Otherwise, manual edits to /var/etc/ntpd.conf will be overwritten. Otherwise you can open a feature request ticket in redmine.

Autokey is going to be replaced by something else

IETF 'Network Time Security', work is in progress. Draft: https://tools.ietf.org/html/draft-ietf-ntp-network-time-security-06
But this is drifting away from pfSense; maybe better continued on the pfSense Development forum area.

stan-qaz

Looking at options, pool would be good but how would peer be for an additional choice? I'm no ntp expert but I saw it when reading the man and web pages and it seemed like something I could use here on my little network.

This is what I'm seeing on my 2.2 pfSense box, WAN, LAN and OPT1 ports are configured.


[2.2-RELEASE][root@pfSense.home]/root: ntpd --version                                                                                                                           
ntpd 4.2.8@1.3265-o Mon Dec 22 14:36:40 UTC 2014 (1)

[2.2-RELEASE][root@pfSense.home]/root: grep interface /var/etc/ntpd.conf                                                                                                        
interface ignore all                                                                                                                                                            
interface listen em1                                                                                                                                                            

[2.2-RELEASE][root@pfSense.home]/root: netstat -an | grep .123
udp6       0      0 ::1.123                *.*                                                                                                                                  
udp4       0      0 127.0.0.1.123          *.*                                                                                                                                  
udp4       0      0 172.16.0.1.123         *.*                                                                                                                                  
udp6       0      0 fe80::21b:21ff:f.123   *.*                                                                                                                                  
udp4       0      0 *.123                  *.*                                                                                                                                  
udp6       0      0 *.123                  *.*

kejianshi

Been running NTP server on pfsense for years (because I either need it so much or just like turning things on - Not sure)
Anyway, its never had a problem.

scurrier

I am experiencing this issue, also. Packages are restarted when OpenVPN hiccups. Not sure why NTP should care about this since it only listens on my local LAN.

charliem

@scurrier:

I am experiencing this issue, also. Packages are restarted when OpenVPN hiccups. Not sure why NTP should care about this since it only listens on my local LAN.

Please try the following patches: they simply remove the ntp reconfiguration and kill/restart from the files /etc/inc/newwanip and /etc/inc/newwanipv6. The packages will still be restarted. This will let ntpd use its own code for detecting interface changes. Should also help with https://redmine.pfsense.org/issues/4155 and https://forum.pfsense.org/index.php?topic=78194.0

I've tried to walk through the earlier revisions for these files to see when these lines were added, but couldn't find anything applicable. I suspect they date from when openntpd was being used, which did not handle dynamic interface scanning like the current ntpd does.

--- rc.newwanip.orig	2015-01-22 15:39:45.000000000 -0500
+++ rc.newwanip	2015-03-01 12:41:43.000000000 -0500
@@ -47,8 +47,6 @@
 	global $oldip, $curwanip, $g;

 	/* restart packages */
-	system_ntp_configure(false);
-	mwexec_bg("/usr/local/sbin/ntpdate_sync_once.sh", true);
 	log_error("{$g['product_name']} package system has detected an IP change or dynamic WAN reconnection - $oldip ->  $curwanip - Restarting packages.");
 	send_event("service reload packages");
 }
--- rc.newwanipv6.orig	2015-01-22 15:39:45.000000000 -0500
+++ rc.newwanipv6	2015-03-01 12:42:07.000000000 -0500
@@ -48,8 +48,6 @@
 	global $oldipv6, $curwanipv6, $g;

 	/* restart packages */
-	system_ntp_configure(false);
-	mwexec_bg("/usr/local/sbin/ntpdate_sync_once.sh", true);
 	log_error("{$g['product_name']} package system has detected an IP change or dynamic WAN reconnection - $oldipv6 -> $curwanipv6 - Restarting packages.");		
 	send_event("service reload packages");
 }

cmb

@charliem:

Please try the following patches: they simply remove the ntp reconfiguration and kill/restart from the files /etc/inc/newwanip and /etc/inc/newwanipv6. The packages will still be restarted. This will let ntpd use its own code for detecting interface changes. Should also help with https://redmine.pfsense.org/issues/4155 and https://forum.pfsense.org/index.php?topic=78194.0

Has anyone tried this and had it resolve their issue and not break anything? I'm pretty confident that's a fine change, and it seems like it should avoid the crash described here. I made this change in 2.3 as part of https://redmine.pfsense.org/issues/4155

doktornotor

@cmb:

I'm pretty confident that's a fine change, and it seems like it should avoid the crash described here. I made this change in 2.3 as part of https://redmine.pfsense.org/issues/4155

Yeah, I'm confident it's a fine change as well, though obviously the core issue is somewhere else (i.e., in the ntpd code). Unfortunately, unless you backport fixes to usable pfSense branch, it's like if you did nothing.

charliem

@cmb:

[Has anyone tried this and had it resolve their issue and not break anything? I'm pretty confident that's a fine change, and it seems like it should avoid the crash described here. I made this change in 2.3 as part of https://redmine.pfsense.org/issues/4155
[/quote]

Well, my machines have worked OK with that patch since before I posted it. No known breakage, but I guess I don't count as a second opinion :)

Looking forward to testing 2.3….

cmb

@charliem:

Well, my machines have worked OK with that patch since before I posted it. No known breakage, but I guess I don't count as a second opinion :)

Your first opinion is appreciated regardless. :) Wasn't clear from your earlier post if you were running it at all at the time, or were still running it now 6 months after the fact.