NTP Frustrations



  • Hello,

    For the last few months I have been having horrible pains with NTP in pfsense.  For the longest time I had my pfsense server as the NTP server for my entire domain, but for the last few months the NTP server would lose track of time about once a week.  I would sign in and the entire domain would be out of sync by hours (and minutes) sometimes as much as a day or two.

    I have very reliable time servers set in pfsense and have even tried changing them a once to see if that was the issue.

    I have tried everything i could find in the forum including unbinding from all interfaces which seemed to help a bit.

    In the NTP status section, the status for each server usually says 'Unreach/Pending' however, after restarting the ntp server for the first few minutes they are in active, and other statuses.  (screenshot attached)

    I even went so far as to setup a new timeserver for the domain, but I can't figure out how to disable NTP inside of pfsense.  (VPN users experience connectivity issues because of the time difference.)

    Any ideas would be appreciated.

    Ryan

    ![2015-03-11 21_45_25-pfsense - Status_ NTP.png](/public/imported_attachments/1/2015-03-11 21_45_25-pfsense - Status_ NTP.png)
    ![2015-03-11 21_45_25-pfsense - Status_ NTP.png_thumb](/public/imported_attachments/1/2015-03-11 21_45_25-pfsense - Status_ NTP.png_thumb)


  • LAYER 8 Netgate

    Many admins are making bone-headed decisions trying to mitigate all the NTP DDoS.  Maybe you're a victim?



  • Are you using an embedded pfSense or a desktop installation?



  • So it was stable, then went to crap with no changes on the pfSense end?  Any rules changes that would block udp port 123?

    Can you post your /var/etc/ntpd.conf?  And the result of 'clog /var/log/ntpd.log | tail -50'?



  • Could be an issue with anything with the amount of info provided.

    Is it a physical machine or VM?

    Is it synched to other NTP servers or GPS or what?


  • LAYER 8 Netgate

    @Derelict:

    Many admins are making bone-headed decisions trying to mitigate all the NTP DDoS.  Maybe you're a victim?

    Yeah, I was talking about something bone-headed upstream.  If it just stopped working and you didn't add any UDP/123 rules or otherwise make changes.

    Running ntpdate from the pfSense command line should tell you what's up.

    ntpdate -d 0.pfsense.pool.ntp.org



  • Oh my, thanks for all the replies. :)

    After I posted I realized i had a spare motherboard and thought why not switch them out, perhaps it was the clock that was going bad.  I replaced the motherboard and after boot all my peer statuses went back to looking normal and since then the clock hasn't wavered one bit.

    It almost seemed like the clock was randomly loosing 5-40 minutes an hour, I am guessing there was something wrong with the hardware clock.  It did seem like a gradual failure, I had some trouble in December and then a bit more in January and then it couldn't keep any time the last two weeks.

    Thank you for the replies, I'll speak up if the problems resume.

    Ryan

    To answer some of the questions..

    Desktop install on a physical machine.
    Synced with NTP servers.
    I don't have any rules that should impact port 123

    /var/etc/ntpd.conf below

    pfSense ntp configuration file

    tinker panic 0

    Orphan mode stratum

    tos orphan 12

    Upstream Servers

    server time-a.nist.gov iburst maxpoll 9 prefer
    server 0.us.pool.ntp.org iburst maxpoll 9 noselect
    server 1.us.pool.ntp.org iburst maxpoll 9 noselect
    server ntp.okstate.edu iburst maxpoll 9
    server navobs1.wustl.edu iburst maxpoll 9
    server navobs1.oar.net iburst maxpoll 9
    server tick.gatech.edu iburst maxpoll 9
    server clock.sjc.he.net iburst maxpoll 9
    server 0.pfsense.pool.ntp.org iburst maxpoll 9 noselect

    disable monitor
    enable stats
    statistics clockstats loopstats peerstats
    statsdir /var/log/ntp
    logconfig =syncall +clockall +peerall +sysall
    driftfile /var/db/ntpd.drift
    restrict default kod limited nomodify
    restrict -6 default kod limited nomodify

    interface ignore all
    interface listen em2
    interface listen em3
    interface listen em4
    interface listen em5
    interface listen em2_vlan2
    interface listen em2_vlan3
    interface listen em2_vlan4
    interface listen em2_vlan9


  • LAYER 8 Netgate

    Wow. I've never seen that happen.  Batteries, sure…



  • Batteries should only affect time when the computer is powered off. At least in my experience as IT, you could remove the battery from a computer and have it keep perfect time as long as it remained plugged into AC. I've done my fair share of battery replacement and bios time fixing.


  • LAYER 8 Netgate

    Right.  Never seen a running clock go bad like that.


  • Banned

    Hi!

    Would be nice to know: WHICH board? ;-)


  • LAYER 8 Global Moderator

    Sure it was the board, and not just that the time was so off that ntp had not correct it yet.  The screen shot OP showed showed a large offset and reaches were not even 377 yet.  So its like they had just started ntpd.  Prob do a ntpdate first before restarting ntpd to get the time close so that ntpd has easier time of starting.



  • Sorry, I had restarted NTP just before posting.  Needless to say, it was off for days.. Since replacing the board it has been perfect.


  • LAYER 8 Global Moderator

    Depending ntpd will not fix it if too far out.  Pretty sure the default max offset is like 1000 seconds.

    You can use the -g to have it set the time no matter what I believe.  I have not looked to deep into all the new features of ntp on pfsense since while I have it sync its time to my ntp server, it is not a ntp server for my network.

    As other have stated I have never seen a clock fail in such a way to even with ntp running have it loose so much time, etc..  But sure if the clock is so bad that even with ntp running it can keep decent time at some point its offset could exceed what ntp will fix, etc.

    Seems your all good now, but setting the time via ntpdate before starting ntpd would of been something I would of done when seeing such a large offset.  And then keeping an eye on how it was drifting.  You could of enabled the Graphs which are new in 2.2 for example and kept an eye on it - see attached.

    My pfsense is a VM so you can not expect its time to be dead nuts on, etc.




  • If you want to have an ugly NTP graphs competition, mine is worse - haha.

    But yeah - not soooooooooo far off.



  • @johnpoz:

    setting the time via ntpdate before starting ntpd would of been something I would of done when seeing such a large offset.

    That's already how it is normally done, at least during boot and restarting services on a WAN change.  Take a look at /usr/local/sbin/ntpdate_sync_once.sh for details.  You should get a log notice for both sync failure and sync success before the ntpd daemon is run.

    Not sure how ntp is restarted via the status_services page or services widget.


  • LAYER 8 Global Moderator

    I thought there was a ntpdate done before - but what if that fails?  And time is way off?



  • Hmm, it looks like I have always had the graph on.  Attached is the graph from my machine for the last week.

    I am by no means an NTP expert and really don't know what any of this means.

    You can see where Wednesday night I changed the board and the graph looks different.  (That was the only thing I changed.)

    For some reason there are a lot of blank spots, I don't know why because the machine is always on.

    Ryan



  • LAYER 8 Global Moderator

    I would think the blank spots are when the ntpd was offline, or was so far out of wack that was out of sync.  What does the ntpd log show for these times?



  • NTP can crash and without a watchdog service restarting it, it would probably stay down until reboot



  • @johnpoz:

    I thought there was a ntpdate done before - but what if that fails?  And time is way off?

    If sync'ing fails, a message is logged but pfSense still tries to start the ntp daemon.  If the time is off by more than 1000 seconds at that point, ntpd should promptly exit with a message saying so, and should remove its pid file (not sure it always does).  pfSense carries on, assuming that ntpd is running, AFAIK.



  • My NTP gets a watchdog restart pretty regularly.



  • Blank spots are negative values being clipped off. More here: https://forum.pfsense.org/index.php?topic=76620.msg482370#msg482370


Log in to reply