NTP / Clock Unsynchronized error after KVM kernel upgrade


  • Greetings,

    I'll be appreciated for your help with the following issue. We have latest pfSense VM in HA configuration works as NPT client (of public pools) and server (for LAN nodes). We use KVM CentOS to run those VMs and after the latest upgrade from kernel version 3.10.0-957.el7.x86_64 to **3.10.0-1160.11.1.**el7.x86_64 clock synchronisation on pfSense VMs stopped working.

    My ntp settings:

    [2.4.5-RELEASE][admin@]/root: cat /var/etc/ntpd.conf 
    # 
    # pfSense ntp configuration file 
    # 
    
    tinker panic 0 
    # Orphan mode stratum
    tos orphan 2
    
    
    # Upstream Servers
    pool 0.pfsense.pool.ntp.org iburst maxpoll 9 prefer
    pool 1.pfsense.pool.ntp.org iburst maxpoll 9
    pool 2.pfsense.pool.ntp.org iburst maxpoll 9
    
    
    enable stats
    statistics clockstats loopstats peerstats
    statsdir /var/log/ntp
    logconfig =syncall +clockall +peerall +sysall
    driftfile /var/db/ntpd.drift
    restrict default kod limited nomodify nopeer notrap
    restrict -6 default kod limited nomodify nopeer notrap
    restrict source kod limited nomodify notrap
    interface ignore all
    interface ignore wildcard
    interface listen vtnet1.1
    [2.4.5-RELEASE][admin@]/root: 
    

    NTP client sends/receives data from public pool:

    [2.4.5-RELEASE][admin@]/root: ntpdate -dv 0.pfsense.pool.ntp.org
     7 Jan 14:03:14 ntpdate[7583]: ntpdate 4.2.8p14@1.3728-o Thu Mar  5 17:03:56 UTC 2020 (1)
    transmit(34.194.39.113)
    receive(34.194.39.113)
    transmit(204.2.134.163)
    transmit(208.67.72.50)
    receive(208.67.72.50)
    transmit(172.104.226.77)
    receive(172.104.226.77)
    transmit(34.194.39.113)
    receive(34.194.39.113)
    transmit(204.2.134.163)
    transmit(208.67.72.50)
    receive(208.67.72.50)
    transmit(172.104.226.77)
    receive(172.104.226.77)
    transmit(34.194.39.113)
    receive(34.194.39.113)
    transmit(204.2.134.163)
    receive(204.2.134.163)
    transmit(208.67.72.50)
    receive(208.67.72.50)
    transmit(172.104.226.77)
    receive(172.104.226.77)
    transmit(34.194.39.113)
    receive(34.194.39.113)
    transmit(204.2.134.163)
    receive(204.2.134.163)
    transmit(208.67.72.50)
    receive(208.67.72.50)
    transmit(172.104.226.77)
    receive(172.104.226.77)
    204.2.134.163: Server dropped: strata too high
    
    server 34.194.39.113, port 123
    stratum 3, precision -24, leap 00, trust 000
    refid [209.51.161.238], root delay 0.068146, root dispersion 0.036453
    reference time:      e3a1d513.7527d506  Thu, Jan  7 2021 13:44:35.457
    originate timestamp: e3a1d978.b29aedfb  Thu, Jan  7 2021 14:03:20.697
    transmit timestamp:  e3a1d978.b93f4f65  Thu, Jan  7 2021 14:03:20.723
    filter delay:  0.03386    0.03378    0.03374    0.03383   
                   ----       ----       ----       ----      
    filter offset: -0.030041  -0.030121  -0.030056  -0.030072 
                   ----       ----       ----       ----      
    delay 0.03374, dispersion 0.00003, offset -0.030056
    
    server 204.2.134.163, port 123
    stratum 16, precision -23, leap 11, trust 000
    refid [STEP], root delay 0.000000, root dispersion 0.000168
    reference time:      (no time)
    originate timestamp: e3a1d978.ed08c46c  Thu, Jan  7 2021 14:03:20.925
    transmit timestamp:  e3a1d978.ec6f5c6f  Thu, Jan  7 2021 14:03:20.923
    filter delay:  0.00000    0.00000    0.09340    0.09364   
                   ----       ----       ----       ----      
    filter offset: +0.000000  +0.000000  -0.031582  -0.031698 
                   ----       ----       ----       ----      
    delay 0.09340, dispersion 24.00006, offset -0.031582
    
    server 208.67.72.50, port 123
    stratum 2, precision -24, leap 00, trust 000
    refid [128.227.205.3], root delay 0.018417, root dispersion 0.028046
    reference time:      e3a1d6ed.b837f301  Thu, Jan  7 2021 13:52:29.719
    originate timestamp: e3a1d979.1c527622  Thu, Jan  7 2021 14:03:21.110
    transmit timestamp:  e3a1d979.1fa30981  Thu, Jan  7 2021 14:03:21.123
    filter delay:  0.06474    0.06458    0.06453    0.06467   
                   ----       ----       ----       ----      
    filter offset: -0.032544  -0.032647  -0.032585  -0.032579 
                   ----       ----       ----       ----      
    delay 0.06453, dispersion 0.00003, offset -0.032585
    
    server 172.104.226.77, port 123
    stratum 2, precision -24, leap 00, trust 000
    refid [129.6.15.28], root delay 0.091034, root dispersion 0.025314
    reference time:      e3a1d938.3674b9be  Thu, Jan  7 2021 14:02:16.212
    originate timestamp: e3a1d979.5585ba71  Thu, Jan  7 2021 14:03:21.334
    transmit timestamp:  e3a1d979.52d82d82  Thu, Jan  7 2021 14:03:21.323
    filter delay:  0.10706    0.10814    0.10812    0.10759   
                   ----       ----       ----       ----      
    filter offset: -0.030471  -0.031088  -0.031042  -0.030539 
                   ----       ----       ----       ----      
    delay 0.10706, dispersion 0.00024, offset -0.030471
    
     7 Jan 14:03:21 ntpdate[7583]: adjust time server 208.67.72.50 offset -0.032585 sec
    [2.4.5-RELEASE][admin@]/root: 
    

    When I'm trying to sync from pfSense as its client:

    [jamal@jamalungma ~]$ ntpdate -dv <pfSense IP - 10.0.11.3>
     7 Jan 17:02:17 ntpdate[2192]: ntpdate 4.2.8p15@1.3728-o Tue Jun 23 00:00:00 UTC 2020 (1)
    Looking for host 10.0.11.3 and service ntp
    host found : 10.0.11.3
    transmit(10.0.11.3)
    receive(10.0.11.3)
    transmit(10.0.11.3)
    receive(10.0.11.3)
    transmit(10.0.11.3)
    receive(10.0.11.3)
    transmit(10.0.11.3)
    receive(10.0.11.3)
    10.0.11.3: Server dropped: Leap not in sync
    server 10.0.11.3, port 123
    stratum 2, precision -18, leap 11, trust 000
    refid [163.237.218.18], root delay 0.020844, root dispersion 0.017929
    reference time:      e3a1dd82.34b2cb6d  Thu, Jan  7 2021 19:20:34.205
    originate timestamp: e3a1dfea.c453e0bc  Thu, Jan  7 2021 19:30:50.766
    transmit timestamp:  e3a1dfea.6eb420bb  Thu, Jan  7 2021 19:30:50.432
    filter delay:  0.16284    0.18611    0.16600    0.16328   
                   ----       ----       ----       ----      
    filter offset: +19.497126 +19.508922 +19.497515 +19.495252
                   ----       ----       ----       ----      
    delay 0.16284, dispersion 0.00249, offset +19.497126
    
     7 Jan 17:02:23 ntpdate[2192]: no server suitable for synchronization found
    [jamal@jamalungma ~]$
    

    I searched for some errors in ntpd logs and I got this precisely after KVM kernel upgrade, a lot of these messages:

    Jan  7 12:18:08 pfSenseB ntpd[75414]: 0.0.0.0 c01d 0d kern kernel time sync disabled
    Jan  7 12:18:08 pfSenseB ntpd[45063]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    

    One thing I also noticed that pfSense use ACPI clock source and KVM uses TSC

    [2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter
    kern.timecounter.tsc_shift: 1
    kern.timecounter.smp_tsc_adjust: 0
    kern.timecounter.smp_tsc: 0
    kern.timecounter.invariant_tsc: 0
    kern.timecounter.fast_gettime: 1
    kern.timecounter.tick: 1
    kern.timecounter.choice: i8254(0) ACPI-fast(900) TSC-low(-100) dummy(-1000000)
    kern.timecounter.hardware: ACPI-fast
    kern.timecounter.alloweddeviation: 5
    kern.timecounter.timehands_count: 2
    kern.timecounter.stepwarnings: 0
    kern.timecounter.tc.i8254.quality: 0
    kern.timecounter.tc.i8254.frequency: 1193182
    kern.timecounter.tc.i8254.counter: 1077
    kern.timecounter.tc.i8254.mask: 65535
    kern.timecounter.tc.ACPI-fast.quality: 900
    kern.timecounter.tc.ACPI-fast.frequency: 3579545
    kern.timecounter.tc.ACPI-fast.counter: 4144087
    kern.timecounter.tc.ACPI-fast.mask: 16777215
    kern.timecounter.tc.TSC-low.quality: -100
    kern.timecounter.tc.TSC-low.frequency: 1149932904
    kern.timecounter.tc.TSC-low.counter: 2121589485
    kern.timecounter.tc.TSC-low.mask: 4294967295
    [2.4.5-RELEASE][admin@]/root:
    

    I changed different parameters in NTP configuration page first, restarted ntp server via gui bunch of times. A day later I reboot VM few times, changing pfSense sysctl variable kern.timecounter.hardware from ACPI-fast to TSC-low . After a few reboots of pfSense VMs NTP magically synchronised.

    After the last reboot kern.timecounter.hardware value is ACPI-fast even though I put it like:

    [2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter.hardware=TSC-low
    kern.timecounter.hardware: ACPI-fast -> TSC-low
    [2.4.5-RELEASE][admin@]/root: sysctl kern.timecounter.hardware
    kern.timecounter.hardware: TSC-low
    

    but time was finally synchronised:

    [2.4.5-RELEASE][admin@]/root: ntptime 
    ntp_gettime() returns code 0 (OK)
      time e3a1de64.2b2a0664  Thu, Jan  7 2021 14:24:20.168, (.168610934),
      maximum error 591133 us, estimated error 343 us, TAI offset 0
    ntp_adjtime() returns code 0 (OK)
      modes 0x0 (),
      offset -10.219 us, frequency 0.116 ppm, interval 4 s,
      maximum error 591133 us, estimated error 343 us,
      status 0x2001 (PLL,NANO),
      time constant 7, precision 0.001 us, tolerance 496 ppm,
      pps frequency 0.000 ppm, stability 0.000 ppm, jitter 0.000 us,
      intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
    [2.4.5-RELEASE][admin@]/root: 
    

    The second pfSense VM is still not synchronised, NTP configuration files are the same on both VMs, but ntptime shows OK on first one and ERROR on the other. I don't know what is changed after manipulations with first pfSense VM, but all parameters from NTP settings to sysctl variables remains the same as they were before KVM kernel upgrade.

    Please suggest me what should be checked in addition.

  • Netgate Administrator

    ntp can take quite some time before it sees the source as valid and syncs.

    In a VM the emulated hardware can appear to be far less stable as the hypervisor switches CPU time between VMs.

    Normally ACPI-Fast would be a far better timeounter to use the TSC-slow. You can see the time quality listed. Even i8254 shows as better.

    What is a the error message actually seen when trying to sync against the secondary?

    Steve


  • Hi @stephenw10, thanks for answer

    When I'm trying to sync from the second pfSense VM, here what I get from client:

    [jamal@jamalungma ~]$ ntpdate -dv 10.0.11.254
    16 Jan 13:22:08 ntpdate[2300]: ntpdate 4.2.8p15@1.3728-o Tue Jun 23 00:00:00 UTC 2020 (1)
    Looking for host 10.0.11.254 and service ntp
    host found : 10.0.11.254
    transmit(10.0.11.254)
    transmit(10.0.11.254)
    receive(10.0.11.254)
    transmit(10.0.11.254)
    receive(10.0.11.254)
    transmit(10.0.11.254)
    receive(10.0.11.254)
    10.0.11.254: Server dropped: server has gone too long without sync
    
    server 10.0.11.254, port 123
    stratum 2, precision -16, leap 00, trust 000
    refid [127.0.0.1], root delay 0.000000, root dispersion 0.000000
    reference time:      (no time)
    originate timestamp: e3ad6707.80b783ad  Sat, Jan 16 2021 13:22:15.502
    transmit timestamp:  e3ad6706.5257518f  Sat, Jan 16 2021 13:22:14.321
    filter delay:  0.00000    0.16133    0.15317    0.30644   
                   ----       ----       ----       ----      
    filter offset: +0.000000  +1.117263  +1.112640  +1.040580 
                   ----       ----       ----       ----      
    delay 0.15317, dispersion 8.02031, offset +1.112640
    
    16 Jan 13:22:14 ntpdate[2300]: no server suitable for synchronization found
    

    The state of pfSense VM:

    [2.4.5-RELEASE][admin@]/root: ntptime
    ntp_gettime() returns code 5 (ERROR)
      time e3ad673b.c86f9000  Sat, Jan 16 2021  8:23:07.782, (.453782952),
      maximum error 16540500 us, estimated error 16000000 us, TAI offset 0
    ntp_adjtime() returns code 5 (ERROR)
      modes 0x0 (),
      offset 0.000 us, frequency 0.000 ppm, interval 4 s,
      maximum error 16540500 us, estimated error 16000000 us,
      status 0x41 (PLL,UNSYNC),
      time constant 3, precision 0.000 us, tolerance 496 ppm,
      pps frequency 0.000 ppm, stability 0.000 ppm, jitter 0.000 us,
      intervals 0, jitter exceeded 0, stability exceeded 0, errors 0.
    

    /var/log/ntpd.log:

    Jan 16 08:05:06 pfSenseC ntpd[11280]: ntpd 4.2.8p14@1.3728-o Thu Mar  5 17:03:54 UTC 2020 (1): Starting
    Jan 16 08:05:06 pfSenseC ntpd[11280]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
    Jan 16 08:05:06 pfSenseC ntpd[11280]: ----------------------------------------------------
    Jan 16 08:05:06 pfSenseC ntpd[11280]: ntp-4 is maintained by Network Time Foundation,
    Jan 16 08:05:06 pfSenseC ntpd[11280]: Inc. (NTF), a non-profit 501(c)(3) public-benefit
    Jan 16 08:05:06 pfSenseC ntpd[11280]: corporation.  Support and training for ntp-4 are
    Jan 16 08:05:06 pfSenseC ntpd[11280]: available at https://www.nwtime.org/support
    Jan 16 08:05:06 pfSenseC ntpd[11280]: ----------------------------------------------------
    Jan 16 08:05:06 pfSenseC ntpd[11442]: proto: precision = 12.572 usec (-16)
    Jan 16 08:05:06 pfSenseC ntpd[11442]: basedate set to 2020-02-22
    Jan 16 08:05:06 pfSenseC ntpd[11442]: gps base set to 2020-02-23 (week 2094)
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 0 lo0 [::1]:123
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 1 lo0 127.0.0.1:123
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 2 vtnet1.42 [fe80::5054:ff:fe64:8034%9]:123
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 3 vtnet1.42 10.0.11.254:123
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listen normally on 4 vtnet1.42 10.0.11.3:123
    Jan 16 08:05:06 pfSenseC ntpd[11442]: Listening on routing socket on fd #25 for interface updates
    Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45286
    Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45287
    Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 8811 81 mobilize assoc 45288
    Jan 16 08:05:06 pfSenseC ntpd[11442]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 c01d 0d kern kernel time sync enabled
    Jan 16 08:05:06 pfSenseC ntpd[11442]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
    Jan 16 08:05:06 pfSenseC ntpd[11442]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
    

    ntpq:

    [2.4.5-RELEASE][admin@]/root: ntpq -p
         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
     0.pfsense.pool. .POOL.          16 p    -   64    0    0.000   +0.000   0.015
     1.pfsense.pool. .POOL.          16 p    -   64    0    0.000   +0.000   0.015
     2.pfsense.pool. .POOL.          16 p    -   64    0    0.000   +0.000   0.015
    

    Any suggestions?

  • Netgate Administrator

    Hmm, well ntpd won't respond if it thinks it is not sync'd itself, which looks like the case here.

    What if you try to manually sync the Secondary from the command line? Against the Primary maybe? Or some external time source?

    Steve