Ntpd / gps need some love part II
-
I just backed up configuration and probably will reinstall PFSense on another drive.
Looks like its a hardware issue?
Yes, it's hardware, backing up now and re-installing somewhere else is the right move. Looks like something being read from disk by rrdtool failed, same block each time. Is this an SSD?
-
Thank-you charliem.
Its just an old IDE drive. Actually failed a few minutes ago. No boot at all.
Geez it looked new as it was just a Tivo backup drive.
I did back up right before. Switched to a another temporary IDE drive.
I do have a configured CF card slot and wondering whether to go to this or just SSD.
I did do a new build just now of 2.1.3 and restored from backup. It looks like its working.
Rebooted and restoring packages.
Need to do the change on the serial port and I think I will be good to go until I change drives.
Something else is going on. I just utilized my new drive for 2nd test machine to do the above rebuild. It worked for some 10 minutes and started again with similiar errors.
I've been testing with 2.2 just fine with this drive.
Swapped cables; same error. Might fallback to the smoothwall box…odd stuff...my pfsense now has spelt its guts with parts out of it and its still one the rack...ugly picture.....
-
Just shut off the RRD service and the above mentioned errors went away. Very odd.
I was watching the console and while navigating the GUI the same error would come up.
After shutting off the RRD service the errors have gone away.
Looks to be running fine now.
Still running fine after about 30 minutes. Put it all back together again as I didn't like seeings its guts all over the rack.
-
Try NanoBSD from CF card. I'm using nano for several years now. Most reliable IMHO, the whole system runs from RAM.
-
Thanks Robi and will do. What is the recommended size for the CF card? I read somewhere 2 Gb should work.
Can I install with the PFSense ISO?
-
-
Thank-you for the pointers Robi.
-
Once I got the GPS sending only the $GPGGA sentence, I haven't had any spikes (so far). If I was ambitious, I'd go back in my clockstats file, and see what the strings looked like around the times the spikes were detected. Hopefully it's fixed though.
I spoke (wrote?) too soon! My spikes are back, so I've got to do more digging; will update if I find anything.
Just an update: It seems the spikes are an artifact.
I've modified /var/db/rrd/updaterrd.sh to keep the output of every 'ntpq -c rv', as well as the the values parsed for rrd. I see spikes suddenly, with no corresponding upsets in the ntp clockstats or loopstats files. So, it's an ntp issue, not a pfSense issue.
For reference, if anyone else faces a similar problem, my debugging mods to updaterrd.sh are like so:
LOGPATH=/tmp LOGFILE=$LOGPATH/`date +'%y.%m.%d_%H:%M:%S'` PERM_LOG=$LOGPATH/ntp_stuff.log /usr/local/sbin/ntpq -c rv | /usr/bin/awk 'BEGIN{ RS=","}{ print }' >> $LOGFILE NOFFSET=`grep offset $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` NFREQ=`grep frequency $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` NSJIT=`grep sys_jitter $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` NCJIT=`grep clk_jitter $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` NWANDER=`grep clk_wander $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` NDISPER=`grep rootdisp $LOGFILE | awk 'BEGIN{FS="="}{print $2}'` /usr/bin/nice -n20 /usr/local/bin/rrdtool update /var/db/rrd/ntpd.rrd \N:${NOFFSET}:${NSJIT}:${NCJIT}:${NWANDER}:${NFREQ}:${NDISPER} echo $NOFFSET : $NSJIT : $NCJIT : $NWANDER : $NFREQ : $NDISPER >> $PERM_LOG
-
Thanks charliem!
So, it's an ntp issue, not a pfSense issue.
So its an NTP issue with BSD or an NTP issue relating to using a GPS with PPS for time sync or just an NTP issue relating to using NTP servers on the internet or a GPS/PPS?
Decided to start from scratch (hardware) on the other PFSense build.
-
The patch file ntpd_love_patch_213d.txt from here applies cleanly to pfSense 2.1.4 also.
Tested on x64 NanoBSD, should work on full install too. -
Thank-you robi!
Updated PFSense today to 2.1.4 along with adding the NTP stuff.
Works great. Still building box #2.
-
It's for a GPS with a uBlox chipset; I mentioned it in my first post about GPS bugs here:
https://forum.pfsense.org/index.php?topic=67189.msg367460#msg367460 I believe JimP later posted that it was added because it was funded by a customer. IMHO it should not be there by defaultYes, apparently a uBlox. And a poor command set at that, probably found on some webpage somewhere. But somebody did pay some cash to have it added to pfSense, which then became the basis to further develop it into something that became far more useful to many others. That is the reason it is the "default", with the caveat that it is not recommended. Far from ideal, but unless the person who originally payed for the feature expresses that the newer uBlox config is just as good or better, the original should remain.
BTW charliem, nice find on the rrd graphs, I never had enough time to time figure out why the DB didn't survive a reboot. :(
@pete: I run off a 16GB SLC CF card, using the standard version but with all the options to run as much as possible in RAM set. It's plenty big enough, but I don't run anything like Squid.
-
To get the patch to work in 2.1.5, I had to edit one services_ntpd.php hunk in Robi's 2.1.3 patch with change according to the commit:
https://github.com/pfsense/pfsense/commit/88c24958a9625d2daa55adb2bb685c70ec9d6eba
-
Don't know if this has been mentioned, but the NTP RRD graphs should scale from negative to positive as offset can have a negative value. Now it's clipped off as the graph is shown from 0 up.
-
To get the patch to work in 2.1.5, I had to edit one services_ntpd.php hunk in Robi's 2.1.3 patch with change according to the commit:
https://github.com/pfsense/pfsense/commit/88c24958a9625d2daa55adb2bb685c70ec9d6eba
Yes indeed. Please find attached the patch.
-
Now that 2.2-RELEASE is out, which includes all this, a quick note with good news for the people who had to tweak serial ports on their motherboard.
-
To make ntp rrd graphs look nicer I did some changes on my system to allow full scaling from negative to positive values and I made the graph scale a bit nicer (in my opinion).
Modify rrd.inc to allow negative values. I'm not sure if all of these can actually go to negative, but this was a lazy initial edit. I'm sure none of the values should ever get anywhere close to 1000 though.
$rrdcreate .= "DS:offset:GAUGE:$ntpdvalid:-1000:1000 "; $rrdcreate .= "DS:sjit:GAUGE:$ntpdvalid:-1000:1000 "; $rrdcreate .= "DS:cjit:GAUGE:$ntpdvalid:-1000:1000 "; $rrdcreate .= "DS:wander:GAUGE:$ntpdvalid:-1000:1000 "; $rrdcreate .= "DS:freq:GAUGE:$ntpdvalid:-1000:1000 "; $rrdcreate .= "DS:disp:GAUGE:$ntpdvalid:-1000:1000 ";
Modify the status_rrd_graph_img.php file to scale things better for the actual graph. Another part to touch in this file would be the COMMENT/GPRINT part for ntp graph to tweak the number of decimals etc.
$graphcmd .= "--alt-autoscale "; $graphcmd .= "--alt-y-grid "; $graphcmd .= "--units-exponent 0 "; $graphcmd .= "--rigid ";
I've never touched any of the pfSense code before and I don't have a github account nor have I signed the CLA. I would very much appreciate it if someone could take a look at this, make the actual changes needed and post a pull request.
-
To make ntp rrd graphs look nicer I did some changes on my system to allow full scaling from negative to positive values and I made the graph scale a bit nicer (in my opinion).
Can you post a diff? Would be easier for others to test.
Modify rrd.inc to allow negative values. I'm not sure if all of these can actually go to negative, but this was a lazy initial edit. I'm sure none of the values should ever get anywhere close to 1000 though.
As far as I know, only offset and frequency could have possible negative values. Jitter and wander are calculated as RMS averages and so should always be positive. Dispersion is related to delay and latency measurements, both of which should be positive (unless you live in a Tardis …)
-
Plotting negative offset seems like a good idea. What does it show now if it can't go negative? Worst case it shows zero offset unrealistically.
Steve
-
Negative values are clipped off completely resulting in gaps in the graph.