NTP Frustrations

Derelict

Many admins are making bone-headed decisions trying to mitigate all the NTP DDoS. Maybe you're a victim?

Yeah, I was talking about something bone-headed upstream. If it just stopped working and you didn't add any UDP/123 rules or otherwise make changes.

Running ntpdate from the pfSense command line should tell you what's up.

ntpdate -d 0.pfsense.pool.ntp.org

vc6SfV8

Oh my, thanks for all the replies. :)

After I posted I realized i had a spare motherboard and thought why not switch them out, perhaps it was the clock that was going bad. I replaced the motherboard and after boot all my peer statuses went back to looking normal and since then the clock hasn't wavered one bit.

It almost seemed like the clock was randomly loosing 5-40 minutes an hour, I am guessing there was something wrong with the hardware clock. It did seem like a gradual failure, I had some trouble in December and then a bit more in January and then it couldn't keep any time the last two weeks.

Thank you for the replies, I'll speak up if the problems resume.

Ryan

–

To answer some of the questions..

Desktop install on a physical machine.
Synced with NTP servers.
I don't have any rules that should impact port 123

/var/etc/ntpd.conf below

pfSense ntp configuration file

tinker panic 0

Orphan mode stratum

tos orphan 12

Upstream Servers

server time-a.nist.gov iburst maxpoll 9 prefer
server 0.us.pool.ntp.org iburst maxpoll 9 noselect
server 1.us.pool.ntp.org iburst maxpoll 9 noselect
server ntp.okstate.edu iburst maxpoll 9
server navobs1.wustl.edu iburst maxpoll 9
server navobs1.oar.net iburst maxpoll 9
server tick.gatech.edu iburst maxpoll 9
server clock.sjc.he.net iburst maxpoll 9
server 0.pfsense.pool.ntp.org iburst maxpoll 9 noselect

disable monitor
enable stats
statistics clockstats loopstats peerstats
statsdir /var/log/ntp
logconfig =syncall +clockall +peerall +sysall
driftfile /var/db/ntpd.drift
restrict default kod limited nomodify
restrict -6 default kod limited nomodify

interface ignore all
interface listen em2
interface listen em3
interface listen em4
interface listen em5
interface listen em2_vlan2
interface listen em2_vlan3
interface listen em2_vlan4
interface listen em2_vlan9

Derelict

Wow. I've never seen that happen. Batteries, sure…

Harvy66

Batteries should only affect time when the computer is powered off. At least in my experience as IT, you could remove the battery from a computer and have it keep perfect time as long as it remained plugged into AC. I've done my fair share of battery replacement and bios time fixing.

Derelict

Right. Never seen a running clock go bad like that.

2chemlud

Hi!

Would be nice to know: WHICH board? ;-)

johnpoz

Sure it was the board, and not just that the time was so off that ntp had not correct it yet. The screen shot OP showed showed a large offset and reaches were not even 377 yet. So its like they had just started ntpd. Prob do a ntpdate first before restarting ntpd to get the time close so that ntpd has easier time of starting.

vc6SfV8

Sorry, I had restarted NTP just before posting. Needless to say, it was off for days.. Since replacing the board it has been perfect.

johnpoz

Depending ntpd will not fix it if too far out. Pretty sure the default max offset is like 1000 seconds.

You can use the -g to have it set the time no matter what I believe. I have not looked to deep into all the new features of ntp on pfsense since while I have it sync its time to my ntp server, it is not a ntp server for my network.

As other have stated I have never seen a clock fail in such a way to even with ntp running have it loose so much time, etc.. But sure if the clock is so bad that even with ntp running it can keep decent time at some point its offset could exceed what ntp will fix, etc.

Seems your all good now, but setting the time via ntpdate before starting ntpd would of been something I would of done when seeing such a large offset. And then keeping an eye on how it was drifting. You could of enabled the Graphs which are new in 2.2 for example and kept an eye on it - see attached.

My pfsense is a VM so you can not expect its time to be dead nuts on, etc.

ntpdgraphs.png_thumb

kejianshi

If you want to have an ugly NTP graphs competition, mine is worse - haha.

But yeah - not soooooooooo far off.

charliem

@johnpoz:

setting the time via ntpdate before starting ntpd would of been something I would of done when seeing such a large offset.

That's already how it is normally done, at least during boot and restarting services on a WAN change. Take a look at /usr/local/sbin/ntpdate_sync_once.sh for details. You should get a log notice for both sync failure and sync success before the ntpd daemon is run.

Not sure how ntp is restarted via the status_services page or services widget.

johnpoz

I thought there was a ntpdate done before - but what if that fails? And time is way off?

vc6SfV8

Hmm, it looks like I have always had the graph on. Attached is the graph from my machine for the last week.

I am by no means an NTP expert and really don't know what any of this means.

You can see where Wednesday night I changed the board and the graph looks different. (That was the only thing I changed.)

For some reason there are a lot of blank spots, I don't know why because the machine is always on.

Ryan

status_rrd_graph_img.png_thumb

johnpoz

I would think the blank spots are when the ntpd was offline, or was so far out of wack that was out of sync. What does the ntpd log show for these times?

kejianshi

NTP can crash and without a watchdog service restarting it, it would probably stay down until reboot

charliem

@johnpoz:

I thought there was a ntpdate done before - but what if that fails? And time is way off?

If sync'ing fails, a message is logged but pfSense still tries to start the ntp daemon. If the time is off by more than 1000 seconds at that point, ntpd should promptly exit with a message saying so, and should remove its pid file (not sure it always does). pfSense carries on, assuming that ntpd is running, AFAIK.

kejianshi

My NTP gets a watchdog restart pretty regularly.

fragged

Blank spots are negative values being clipped off. More here: https://forum.pfsense.org/index.php?topic=76620.msg482370#msg482370