pfSense NTP server is very unstable.

johnpoz

@einsdisp said in pfSense NTP server is very unstable.:

0.0.0.0 061d 0d kern kernel time sync disabled

well that not good.. Huge time difference could be the cause of that.

If this is a vm, you prob want to make sure the VM isn't doing sync with the host, etc if you want it to sync time with ntp.

stephenw10

Mmm, it's unlikely to sync to a single server that is showing a 46s offset.

If you add a pool there so it can see multiple servers and they are all showing close to the same offset I would expect it to sync.

Why is the offset so large initially anyway?

Steve

einsdisp

@johnpoz
@stephenw10

I already tried NTP pool (rather than a single NTP server) before. No matter what I set, pfSense NTP always works fine only for the first several hours, then no longer working.

I just adjusted the system clock of the host OS and restarted pfSense VM. This time I set the external server to ntp.aliyun.com. Everything is fine now:

Jan 25 01:52:47 	ntpd 	34952 	ntpd 4.2.8p15@1.3728-o Thu Jun 24 21:53:38 UTC 2021 (1): Starting
Jan 25 01:52:47 	ntpd 	34952 	Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
Jan 25 01:52:47 	ntpd 	34952 	----------------------------------------------------
Jan 25 01:52:47 	ntpd 	34952 	ntp-4 is maintained by Network Time Foundation,
Jan 25 01:52:47 	ntpd 	34952 	Inc. (NTF), a non-profit 501(c)(3) public-benefit
Jan 25 01:52:47 	ntpd 	34952 	corporation. Support and training for ntp-4 are
Jan 25 01:52:47 	ntpd 	34952 	available at https://www.nwtime.org/support
Jan 25 01:52:47 	ntpd 	34952 	----------------------------------------------------
Jan 25 01:52:47 	ntpd 	35232 	proto: precision = 17.470 usec (-16)
Jan 25 01:52:47 	ntpd 	35232 	basedate set to 2021-06-12
Jan 25 01:52:47 	ntpd 	35232 	gps base set to 2021-06-13 (week 2162)
Jan 25 01:52:47 	ntpd 	35232 	Listen normally on 0 lo0 [::1]:123
Jan 25 01:52:47 	ntpd 	35232 	Listen normally on 1 lo0 127.0.0.1:123
Jan 25 01:52:47 	ntpd 	35232 	Listen normally on 2 bridge0 10.1.1.2:123
Jan 25 01:52:47 	ntpd 	35232 	Listening on routing socket on fd #23 for interface updates
Jan 25 01:52:47 	ntpd 	35232 	kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 25 01:52:47 	ntpd 	35232 	0.0.0.0 c01d 0d kern kernel time sync enabled
Jan 25 01:52:47 	ntpd 	35232 	kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 25 01:52:47 	ntpd 	35232 	0.0.0.0 c012 02 freq_set kernel -8.436 PPM
Jan 25 01:52:47 	ntpd 	35232 	0.0.0.0 c016 06 restart
Jan 25 01:52:53 	ntpd 	35232 	DNS ntp.aliyun.com -> 203.107.6.88
Jan 25 01:52:53 	ntpd 	35232 	203.107.6.88 8011 81 mobilize assoc 57546
Jan 25 01:52:54 	ntpd 	35232 	203.107.6.88 8014 84 reachable
Jan 25 01:53:00 	ntpd 	35232 	203.107.6.88 901a 8a sys_peer
Jan 25 01:53:00 	ntpd 	35232 	0.0.0.0 c61c 0c clock_step +0.236668 s
Jan 25 01:53:00 	ntpd 	35232 	0.0.0.0 c615 05 clock_sync
Jan 25 01:54:08 	ntpd 	35232 	0.0.0.0 c618 08 no_sys_peer
Jan 25 01:54:08 	ntpd 	35232 	203.107.6.88 8014 84 reachable
Jan 25 01:54:14 	ntpd 	35232 	203.107.6.88 901a 8a sys_peer

But as expected, pfSense NTP will very likely to fail some hours later. I will track its status periodically.

To figure out whether it is due to VM, I run an OpenWrt instance in the same host OS, using the same KVM/libvirt config, and enable NTP server in OpenWrt. It turns out OpenWrt NTP works fine currently. I will check its status as well.

johnpoz

@einsdisp said in pfSense NTP server is very unstable.:

NTP server in OpenWrt.

While that is a good test, better test would be a freebsd vm..

Have you make sure to disable ntp sync with the host on the vm? I take it your running the openvm tools package.. Been quite sometime since have used that - but more than likely you want to disable its time sync function..

I could fire up the vm I have running under my nas virtual machine stuff, but I have never left it running for any length of time, and never even installed the vm tools package.

Edit: Seems I did have the openvm package installed.. So I have turned on graphing for ntp and will let this vm run for a day or so and see what it shows.

I just booted, and here is current status

edit2: here is current ntp graph

einsdisp

@johnpoz

My host OS is Linux, and it was set to sync time to pfSense VM before. As you suggested, I disabled NTP in host OS just now: sudo timedatectl set-ntp false.
My virtualization software is KVM+QEMU+libvirt. "openvm-tools" is VMware staff. The KVM equivalent is qemu-guest-agent, which does the host-VM time sync job. But pfSense VM does not have such staff apparently. There is no such "tools" running in pfSense VM which syncs VM time to host.
My current NTP graph:

johnpoz

@einsdisp well if you got it turned off in the host, lets see if that has any effect on the issue you were seeing.

the openvm tools for doesnt really have a setting either that I can see in the gui, so guessing it might just disable that by default..

But from this - I take it the qemu package is available

https://forum.netgate.com/post/995504

that is if running 2.6 or 22.01 I take it.. I am looking forward to it myself for my VMs - since my nas virtual machine is qemu based.. I not sure why I had those openvm tools installed - might of been habit from when I ran esxi ;) I have now removed it. And think might update that vm to 2.6 to try out the qemu tools - now maybe my dashboard will show the IP of pfsense, and will be able to shutdown vs having to halt the system from inside the vm.

edit:
Well I updated to 2.6, and installed the package and then ran it and now I see IPs on that vm

On my VM dashboard on my nas..

edit: just to update its been hours and hours now and working as it should..

einsdisp

@stephenw10 @johnpoz

As I tested more these days, I finally figured out the cause of this issue: The host OS should not be set to sync time with pfSense VM. As my test, if I stopped the NTP of host OS, all runs fine. If I enabled host OS NTP, the host RTC will advance 3 more seconds compare to the real world clock, in every 5 minutes. The accumulative error is about 10 minutes per day.

I am not a KVM expert, but I guess it is due to that, by default, (or in my VM config), KVM may adjust VM RTC clock ticking rate, when host time is changed. So if host OS NTP sever is set to pfSense VM, it may ended up in a "dead loop".

My original VM config related to clocking:

<clock offset="utc">
  <timer name="rtc" tickpolicy="catchup"/>
  <timer name="pit" tickpolicy="delay"/>
  <timer name="hpet" present="yes"/>
</clock>

I guess (but haven't tested), changing <timer name="rtc" tickpolicy="catchup"/> to <timer name="rtc" tickpolicy="delay" track="guest"/> may direct KVM to handle VM RTC clock as normal, when host time changes, as if host time is not changed, thus resolving the "dead loop".

But a more simple solution is disabling host OS NTP or set host OS NTP server to an external one, rather than pfSense VM.

einsdisp

@stephenw10

My final question regarding pfSense itself:

it's unlikely to sync to a single server that is showing a 46s offset.

In my test, if pfSense VM RTC clock differs from remote NTP server by a large amount, pfSense refuses to sync time.

How to force pfSense to believe remote time of a single server, in case the offset is very large? I already checked "prefer" checkbox in the NTP server settings, but it is no use.

If there is no way for a single remote server, then how many servers is needed at least?

bingo600

https://www.ntp.org/ntpfaq/NTP-s-trbl-general.htm#AEN5162
NTP will reject a peer that is #roughtly 20 or more minutes off.

http://www.ntp.org/ntpfaq/NTP-s-algo.htm
And it will consider a 128ms diff enough to be "unsync'ed"

@einsdisp said in pfSense NTP server is very unstable.:

How to force pfSense to believe remote time of a single server, in case the offset is very large?

ntpdate will "step the time" ,but requires the ntp daemon to have released it's binding to the UDP 123 port ... AKA "usually" not running.

/Bingo

bingo600

This post is deleted!