NTP / Clock Unsynchronized error after KVM kernel upgrade
-
Greetings,
I'll be appreciated for your help with the following issue. We have latest pfSense VM in HA configuration works as NPT client (of public pools) and server (for LAN nodes). We use KVM CentOS to run those VMs and after the latest upgrade from kernel version 3.10.0-957.el7.x86_64 to **3.10.0-1160.11.1.**el7.x86_64 clock synchronisation on pfSense VMs stopped working.
My ntp settings:
[2.4.5-RELEASE][admin@]/root: cat /var/etc/ntpd.conf # # pfSense ntp configuration file # tinker panic 0 # Orphan mode stratum tos orphan 2 # Upstream Servers pool 0.pfsense.pool.ntp.org iburst maxpoll 9 prefer pool 1.pfsense.pool.ntp.org iburst maxpoll 9 pool 2.pfsense.pool.ntp.org iburst maxpoll 9 enable stats statistics clockstats loopstats peerstats statsdir /var/log/ntp logconfig =syncall +clockall +peerall +sysall driftfile /var/db/ntpd.drift restrict default kod limited nomodify nopeer notrap restrict -6 default kod limited nomodify nopeer notrap restrict source kod limited nomodify notrap interface ignore all interface ignore wildcard interface listen vtnet1.1 [2.4.5-RELEASE][admin@]/root:
NTP client sends/receives data from public pool:
[2.4.5-RELEASE][admin@]/root: ntpdate -dv 0.pfsense.pool.ntp.org 7 Jan 14:03:14 ntpdate[7583]: ntpdate 4.2.8p14@1.3728-o Thu Mar 5 17:03:56 UTC 2020 (1) transmit(34.194.39.113) receive(34.194.39.113) transmit(204.2.134.163) transmit(208.67.72.50) receive(208.67.72.50) transmit(172.104.226.77) receive(172.104.226.77) transmit(34.194.39.113) receive(34.194.39.113) transmit(204.2.134.163) transmit(208.67.72.50) receive(208.67.72.50) transmit(172.104.226.77) receive(172.104.226.77) transmit(34.194.39.113) receive(34.194.39.113) transmit(204.2.134.163) receive(204.2.134.163) transmit(208.67.72.50) receive(208.67.72.50) transmit(172.104.226.77) receive(172.104.226.77) transmit(34.194.39.113) receive(34.194.39.113) transmit(204.2.134.163) receive(204.2.134.163) transmit(208.67.72.50) receive(208.67.72.50) transmit(172.104.226.77) receive(172.104.226.77) 204.2.134.163: Server dropped: strata too high server 34.194.39.113, port 123 stratum 3, precision -24, leap 00, trust 000 refid [209.51.161.238], root delay 0.068146, root dispersion 0.036453 reference time: e3a1d513.7527d506 Thu, Jan 7 2021 13:44:35.457 originate timestamp: e3a1d978.b29aedfb Thu, Jan 7 2021 14:03:20.697 transmit timestamp: e3a1d978.b93f4f65 Thu, Jan 7 2021 14:03:20.723 filter delay: 0.03386 0.03378 0.03374 0.03383 ---- ---- ---- ---- filter offset: -0.030041 -0.030121 -0.030056 -0.030072 ---- ---- ---- ---- delay 0.03374, dispersion 0.00003, offset -0.030056 server 204.2.134.163, port 123 stratum 16, precision -23, leap 11, trust 000 refid [STEP], root delay 0.000000, root dispersion 0.000168 reference time: (no time) originate timestamp: e3a1d978.ed08c46c Thu, Jan 7 2021 14:03:20.925 transmit timestamp: e3a1d978.ec6f5c6f Thu, Jan 7 2021 14:03:20.923 filter delay: 0.00000 0.00000 0.09340 0.09364 ---- ---- ---- ---- filter offset: +0.000000 +0.000000 -0.031582 -0.031698 ---- ---- ---- ---- delay 0.09340, dispersion 24.00006, offset -0.031582 server 208.67.72.50, port 123 stratum 2, precision -24, leap 00, trust 000 refid [128.227.205.3], root delay 0.018417, root dispersion 0.028046 reference time: e3a1d6ed.b837f301 Thu, Jan 7 2021 13:52:29.719 originate timestamp: e3a1d979.1c527622 Thu, Jan 7 2021 14:03:21.110 transmit timestamp: e3a1d979.1fa30981 Thu, Jan 7 2021 14:03:21.123 filter delay: 0.06474 0.06458 0.06453 0.06467 ---- ---- ---- ---- filter offset: -0.032544 -0.032647 -0.032585 -0.032579 ---- ---- ---- ---- delay 0.06453, dispersion 0.00003, offset -0.032585 server 172.104.226.77, port 123 stratum 2, precision -24, leap 00, trust 000 refid [129.6.15.28], root delay 0.091034, root dispersion 0.025314 reference time: e3a1d938.3674b9be Thu, Jan 7 2021 14:02:16.212 originate timestamp: e3a1d979.5585ba71 Thu, Jan 7 2021 14:03:21.334 transmit timestamp: e3a1d979.52d82d82 Thu, Jan 7 2021 14:03:21.323 filter delay: 0.10706 0.10814 0.10812 0.10759 ---- ---- ---- ---- filter offset: -0.030471 -0.031088 -0.031042 -0.030539 ---- ---- ---- ---- delay 0.10706, dispersion 0.00024, offset -0.030471 7 Jan 14:03:21 ntpdate[7583]: adjust time server 208.67.72.50 offset -0.032585 sec [2.4.5-RELEASE][admin@]/root:
When I'm trying to sync from pfSense as its client:
[jamal@jamalungma ~]$ ntpdate -dv <pfSense IP - 10.0.11.3> 7 Jan 17:02:17 ntpdate[2192]: ntpdate 4.2.8p15@1.3728-o Tue Jun 23 00:00:00 UTC 2020 (1) Looking for host 10.0.11.3 and service ntp host found : 10.0.11.3 transmit(10.0.11.3) receive(10.0.11.3) transmit(10.0.11.3) receive(10.0.11.3) transmit(10.0.11.3) receive(10.0.11.3) transmit(10.0.11.3) receive(10.0.11.3) 10.0.11.3: Server dropped: Leap not in sync server 10.0.11.3, port 123 stratum 2, precision -18, leap 11, trust 000 refid [163.237.218.18], root delay 0.020844, root dispersion 0.017929 reference time: e3a1dd82.34b2cb6d Thu, Jan 7 2021 19:20:34.205 originate timestamp: e3a1dfea.c453e0bc Thu, Jan 7 2021 19:30:50.766 transmit timestamp: e3a1dfea.6eb420bb Thu, Jan 7 2021 19:30:50.432 filter delay: 0.16284 0.18611 0.16600 0.16328 ---- ---- ---- ---- filter offset: +19.497126 +19.508922 +19.497515 +19.495252 ---- ---- ---- ---- delay 0.16284, dispersion 0.00249, offset +19.497126 7 Jan 17:02:23 ntpdate[2192]: no server suitable for synchronization found [jamal@jamalungma ~]$
I searched for some errors in ntpd logs and I got this precisely after KVM kernel upgrade, a lot of these messages:
Jan 7 12:18:08 pfSenseB ntpd[75414]: 0.0.0.0 c01d 0d kern kernel time sync disabled Jan 7 12:18:08 pfSenseB ntpd[45063]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
One thing I also noticed that pfSense use ACPI clock source and KVM uses TSC
[2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter kern.timecounter.tsc_shift: 1 kern.timecounter.smp_tsc_adjust: 0 kern.timecounter.smp_tsc: 0 kern.timecounter.invariant_tsc: 0 kern.timecounter.fast_gettime: 1 kern.timecounter.tick: 1 kern.timecounter.choice: i8254(0) ACPI-fast(900) TSC-low(-100) dummy(-1000000) kern.timecounter.hardware: ACPI-fast kern.timecounter.alloweddeviation: 5 kern.timecounter.timehands_count: 2 kern.timecounter.stepwarnings: 0 kern.timecounter.tc.i8254.quality: 0 kern.timecounter.tc.i8254.frequency: 1193182 kern.timecounter.tc.i8254.counter: 1077 kern.timecounter.tc.i8254.mask: 65535 kern.timecounter.tc.ACPI-fast.quality: 900 kern.timecounter.tc.ACPI-fast.frequency: 3579545 kern.timecounter.tc.ACPI-fast.counter: 4144087 kern.timecounter.tc.ACPI-fast.mask: 16777215 kern.timecounter.tc.TSC-low.quality: -100 kern.timecounter.tc.TSC-low.frequency: 1149932904 kern.timecounter.tc.TSC-low.counter: 2121589485 kern.timecounter.tc.TSC-low.mask: 4294967295 [2.4.5-RELEASE][admin@]/root:
I changed different parameters in NTP configuration page first, restarted ntp server via gui bunch of times. A day later I reboot VM few times, changing pfSense sysctl variable kern.timecounter.hardware from ACPI-fast to TSC-low . After a few reboots of pfSense VMs NTP magically synchronised.
After the last reboot kern.timecounter.hardware value is ACPI-fast even though I put it like:
[2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter.hardware=TSC-low kern.timecounter.hardware: ACPI-fast -> TSC-low [2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter.hardware kern.timecounter.hardware: TSC-low
but time was finally synchronised:
[2.4.5-RELEASE][admin@]/root: ntptime ntp_gettime() returns code 0 (OK) time e3a1de64.2b2a0664 Thu, Jan 7 2021 14:24:20.168, (.168610934), maximum error 591133 us, estimated error 343 us, TAI offset 0 ntp_adjtime() returns code 0 (OK) modes 0x0 (), offset -10.219 us, frequency 0.116 ppm, interval 4 s, maximum error 591133 us, estimated error 343 us, status 0x2001 (PLL,NANO), time constant 7, precision 0.001 us, tolerance 496 ppm, pps frequency 0.000 ppm, stability 0.000 ppm, jitter 0.000 us, intervals 0, jitter exceeded 0, stability exceeded 0, errors 0. [2.4.5-RELEASE][admin@]/root:
The second pfSense VM is still not synchronised, NTP configuration files are the same on both VMs, but ntptime shows OK on first one and ERROR on the other. I don't know what is changed after manipulations with first pfSense VM, but all parameters from NTP settings to sysctl variables remains the same as they were before KVM kernel upgrade.
Please suggest me what should be checked in addition.
-
ntp can take quite some time before it sees the source as valid and syncs.
In a VM the emulated hardware can appear to be far less stable as the hypervisor switches CPU time between VMs.
Normally ACPI-Fast would be a far better timeounter to use the TSC-slow. You can see the time quality listed. Even i8254 shows as better.
What is a the error message actually seen when trying to sync against the secondary?
Steve
-
Hi @stephenw10, thanks for answer
When I'm trying to sync from the second pfSense VM, here what I get from client:
[jamal@jamalungma ~]$ ntpdate -dv 10.0.11.254 16 Jan 13:22:08 ntpdate[2300]: ntpdate 4.2.8p15@1.3728-o Tue Jun 23 00:00:00 UTC 2020 (1) Looking for host 10.0.11.254 and service ntp host found : 10.0.11.254 transmit(10.0.11.254) transmit(10.0.11.254) receive(10.0.11.254) transmit(10.0.11.254) receive(10.0.11.254) transmit(10.0.11.254) receive(10.0.11.254) 10.0.11.254: Server dropped: server has gone too long without sync server 10.0.11.254, port 123 stratum 2, precision -16, leap 00, trust 000 refid [127.0.0.1], root delay 0.000000, root dispersion 0.000000 reference time: (no time) originate timestamp: e3ad6707.80b783ad Sat, Jan 16 2021 13:22:15.502 transmit timestamp: e3ad6706.5257518f Sat, Jan 16 2021 13:22:14.321 filter delay: 0.00000 0.16133 0.15317 0.30644 ---- ---- ---- ---- filter offset: +0.000000 +1.117263 +1.112640 +1.040580 ---- ---- ---- ---- delay 0.15317, dispersion 8.02031, offset +1.112640 16 Jan 13:22:14 ntpdate[2300]: no server suitable for synchronization found
The state of pfSense VM:
[2.4.5-RELEASE][admin@]/root: ntptime ntp_gettime() returns code 5 (ERROR) time e3ad673b.c86f9000 Sat, Jan 16 2021 8:23:07.782, (.453782952), maximum error 16540500 us, estimated error 16000000 us, TAI offset 0 ntp_adjtime() returns code 5 (ERROR) modes 0x0 (), offset 0.000 us, frequency 0.000 ppm, interval 4 s, maximum error 16540500 us, estimated error 16000000 us, status 0x41 (PLL,UNSYNC), time constant 3, precision 0.000 us, tolerance 496 ppm, pps frequency 0.000 ppm, stability 0.000 ppm, jitter 0.000 us, intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
/var/log/ntpd.log:
Jan 16 08:05:06 pfSenseC ntpd[11280]: ntpd 4.2.8p14@1.3728-o Thu Mar 5 17:03:54 UTC 2020 (1): Starting Jan 16 08:05:06 pfSenseC ntpd[11280]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid Jan 16 08:05:06 pfSenseC ntpd[11280]: ---------------------------------------------------- Jan 16 08:05:06 pfSenseC ntpd[11280]: ntp-4 is maintained by Network Time Foundation, Jan 16 08:05:06 pfSenseC ntpd[11280]: Inc. (NTF), a non-profit 501(c)(3) public-benefit Jan 16 08:05:06 pfSenseC ntpd[11280]: corporation. Support and training for ntp-4 are Jan 16 08:05:06 pfSenseC ntpd[11280]: available at https://www.nwtime.org/support Jan 16 08:05:06 pfSenseC ntpd[11280]: ---------------------------------------------------- Jan 16 08:05:06 pfSenseC ntpd[11442]: proto: precision = 12.572 usec (-16) Jan 16 08:05:06 pfSenseC ntpd[11442]: basedate set to 2020-02-22 Jan 16 08:05:06 pfSenseC ntpd[11442]: gps base set to 2020-02-23 (week 2094) Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 0 lo0 [::1]:123 Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 1 lo0 127.0.0.1:123 Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 2 vtnet1.42 [fe80::5054:ff:fe64:8034%9]:123 Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 3 vtnet1.42 10.0.11.254:123 Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 4 vtnet1.42 10.0.11.3:123 Jan 16 08:05:06 pfSenseC ntpd[11442]: Listening on routing socket on fd #25 for interface updates Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45286 Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45287 Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45288 Jan 16 08:05:06 pfSenseC ntpd[11442]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 c01d 0d kern kernel time sync enabled Jan 16 08:05:06 pfSenseC ntpd[11442]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
ntpq:
[2.4.5-RELEASE][admin@]/root: ntpq -p remote refid st t when poll reach delay offset jitter ============================================================================== 0.pfsense.pool. .POOL. 16 p - 64 0 0.000 +0.000 0.015 1.pfsense.pool. .POOL. 16 p - 64 0 0.000 +0.000 0.015 2.pfsense.pool. .POOL. 16 p - 64 0 0.000 +0.000 0.015
Any suggestions?
-
Hmm, well ntpd won't respond if it thinks it is not sync'd itself, which looks like the case here.
What if you try to manually sync the Secondary from the command line? Against the Primary maybe? Or some external time source?
Steve