• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

NTPD losing contact with servers: "Unexpected origin timestamp"

Scheduled Pinned Locked Moved General pfSense Questions
9 Posts 4 Posters 11.9k Views 3 Watching
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C Offline
    clough42
    last edited by Jul 19, 2018, 5:48 PM

    I've been struggling to get NTP to sync up and stay connected to its servers. It seems no matter how I configure it, it connects to the configured pools and servers, soon gets to reach "377" on everything, and then after a while loses contact with everything, going to status "Unreach/Pending". When I look in the log, I see several error messages like this:

    "Jul 19 10:44:21	ntpd	76676	receive: Unexpected origin timestamp 0xdefb3e65.c7971dcc does not match aorg 0000000000.00000000 from server@173.255.206.154 xmt 0xdefb3e65.8b6be764"
    

    If I restart ntpd, everything connects again, and some time later (an hour or two, maybe) it loses contact with everything. Sometimes, it seems to start over and connect to everything again. Sometimes it stays stuck in this state, with no contact.

    I've seen suggestions that there was a bug in ntpd that caused this a while back, but as far as I can tell, pfSense has a new enough version of ntpd that this should be fixed.

    Am I missing something obvious?

    I'm running the latest stable 2.4.x version, in a Hyper-V virtual machine on Windows Server 2012, Datacenter Edition. It's configured with two virtual networks, tied to different VLANs, configured in Hyper-V. Everything else appears to be working, and performance is way better than I was getting previously with Untangle NGFW.

    0_1532021710436_9f301b84-259f-4e8a-8c09-38ae1a082ea0-image.png

    I've tried several combinations of different pools and individual hosts. Right now, this is what I have configured for testing:

    0_1532021742476_83eff17d-ade0-476e-aa37-b75cef2ef772-image.png

    Now, after a couple of hours, everything has gone "Unreach/Pending":

    0_1532022147300_105a03cb-cc6d-4101-8608-8d709dbb8cd3-image.png

    Logs for this time period:

    0_1532022180786_61593470-8ff4-4651-ba29-1345e847cf15-image.png

    And, for good measure, the actual output of "ntpq -p":

    0_1532022388401_91ccde4b-ca26-4fc2-bb42-ad398d62712b-image.png

    1 Reply Last reply Reply Quote 0
    • C Offline
      clough42
      last edited by Jul 20, 2018, 5:07 PM

      I'm still fighting this. I've tried several different server combinations, including a single server, and it doesn't seem to make much difference. The daemon connects and appears to be working. I even saw the poll get up to 512 last night on everything, but when I checked this morniing, reach was back to 1 and the offsets had increased from around 35ms to around 200ms.

      It goes from this, which looks kind of okay:

      0_1532106326605_4e9c5714-2e93-4333-bb0d-411d9f997369-image.png

      back to this, 30 minutes later:

      0_1532106359933_978e5ac2-43f5-4c8e-893f-40c77acece59-image.png

      I have Hyper-V time synchronization turned off now, but it doesn't seem to make a ton of difference. I did check, and the system is using the Hyper-V-TSC timecounter. I don't know if this is the issue, or if I should try something else, or if I can even do that with pfSense.

      1 Reply Last reply Reply Quote 0
      • C Offline
        clough42
        last edited by Jul 20, 2018, 5:46 PM

        I do see a recurring pattern where about half the servers have a small offset (~30ms) and the other half have a very different offset (~-200ms). That seems pretty odd.

        I'm wondering if my underlying hardware clock (or the implementation in Hyper-V) is misbehaving.

        1 Reply Last reply Reply Quote 0
        • C Offline
          clough42
          last edited by Jul 22, 2018, 2:50 AM

          I think I finally figured it out.

          It appears that the clock was drifting enough that every half hour (or even more often) the NTP server was stepping the clock and then starting over, contacting the pool and connecting to new servers.

          Jitter and offsets were also all over the map. See the jitters above of well over 100ms and a huge spread of offsets.

          However another VM (Ubuntu LTS) on the same host with the same settings and the same clock source was working fine.

          I ultimately discovered that the ntpd.drift file contained a value well over 600. I cleared the file and restarted ntpd, and everything is working fine now.

          0_1532227544266_545b098f-5a32-485a-a362-473c50454de1-image.png

          Offsets and jitter are now single digits, and the drift file shows about 25PPM, the same as the other VM.

          My theory is that the drift file got screwed up because I was originally running both Hyper-V time synchronization and NTPD. (And the host clock was off by about 20 seconds.) Even after I turned off the Hyper-V time sync, the value in the drift file was causing the crazy issues I was seeing. The value was coming down very slowly (50PPM/day?). Emptying it and letting NTPD start over without the Hyper-V sync solved the problem.

          1 Reply Last reply Reply Quote 1
          • J Offline
            johnpoz LAYER 8 Global Moderator
            last edited by johnpoz Jul 22, 2018, 8:59 AM Jul 22, 2018, 8:59 AM

            Thanks for the great follow up to this - this for sure might help someone in the future!

            I am a bit curious to why are you using 2 different pools that really point to the same place. Or did you remove the pfsense vanity/vendor based one and now your using just pool.ntp.org?

            Whle your offset are way better now.. Your delay in the 105 its a bit high. What part of the world are you in? You should prob use a pool with clients in your same region. They have them based on country and or region. Or just call out specific ntp servers in your region.. The closer the better trying to min delay and jitter in their responses.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 25.07.1 | Lab VMs 2.8.1, 25.07.1

            1 Reply Last reply Reply Quote 0
            • C Offline
              clough42
              last edited by Jul 22, 2018, 12:52 PM

              I was desperately trying all kinds of things: multiple pools, lists of specific servers, a single server, etc. Right now, I have it set to just pool.ntp.org. It's been running for about 12 hours, and it's looking pretty good:

              0_1532263411050_c08a10e4-bb32-44b4-94f0-dbf8ba8b3ef1-image.png

              I'm in the US, so I just restarted it with us.pool.ntp.org, but the delays are about the same. About half of the servers are under 45ms. Compared to the crazy time swings I was seeing before, this makes me quite happy. NTP has now fallen into the "seems fine; time to worry about something else" category.

              1 Reply Last reply Reply Quote 0
              • J Offline
                johnpoz LAYER 8 Global Moderator
                last edited by Jul 22, 2018, 1:04 PM

                Thanks for coming back and reporting on what you found.. The next guy with such an odd issue may love you for it. Yeah running the vm sync to host and ntp on the vm at the same time can cause all kinds of weirdness to be sure.

                An intelligent man is sometimes forced to be drunk to spend time with his fools
                If you get confused: Listen to the Music Play
                Please don't Chat/PM me for help, unless mod related
                SG-4860 25.07.1 | Lab VMs 2.8.1, 25.07.1

                1 Reply Last reply Reply Quote 0
                • N Offline
                  ntranx
                  last edited by Aug 14, 2020, 1:59 AM

                  @clough42 said in NTPD losing contact with servers: "Unexpected origin timestamp":

                  ntpd.drift

                  have any idea where the ntpd.drift file is located?

                  P 1 Reply Last reply Aug 14, 2020, 10:27 AM Reply Quote 0
                  • P Offline
                    provels @ntranx
                    last edited by Aug 14, 2020, 10:27 AM

                    @ntranx
                    /var/db

                    Peder

                    MAIN - pfSense+ 25.07.1-RELEASE - Adlink MXE-5401, i7, 16 GB RAM, 64 GB SSD. 500 GB HDD for SyslogNG
                    BACKUP - pfSense+ 23.01-RELEASE - Hyper-V Virtual Machine, Gen 1, 2 v-CPUs, 3 GB RAM, 8GB VHDX (Dynamic)

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                      This community forum collects and processes your personal information.
                      consent.not_received