[Solved] 2.3.1 upgrade really botched my nagios monitoring config. :-(



  • Well this is pretty frustrating. I moved from Linux to pfSense to get away from exactly this sort of disaster. :-( Sigh.

    Anyway… I had a wonderfully functioning 2.2.something router running on Alix hardware, doing everything I could ever need it to do splendidly. It took me some time to get it configured as I wanted, but it was there, and 100% solid.

    Today I logged in to the Web UI and saw "an update is available". I clicked the button to update, and with no further warning it cheerfully proceeded to download and install the update, and reboot. This was when I discovered that the Nagios monitoring of all my internal network services was now broken. Logging in to the Web UI again I saw an alert to the effect of "The NRPEv2 package does not exist". Doing some research via Google and the forums here, I discovered that, yup, the NRPEv2 package was removed from 2.3.1. Great. I'm going to try not to go off on a rant here about how the update process should WARN THE USER if they have installed packages that are known to be broken by the update process before proceeding with an update, but I digress. Perhaps I will file a bug report on that.

    In any event, apparently the "NRPE" package (note the slight difference in name, it's not "NRPEv2" anymore) was ported to 2.3.1. (https://forum.pfsense.org/index.php?topic=110732) Checking the web UI again, I couldn't find that package, but I found that the auto update had actually only updated me to 2.3, not 2.3.1., but there was an option to update again, so I did. After doing so and rebooting, I went in to the package manager from the web ui, installed NRPE and rebooted again. At this point everything seemed to be working; all of my monitored services in Nagios went back to "green", so obviously NRPE was installed and running again. This is where things get weird though... now on my "services" menu in the web ui I still have "NRPEv2", and if I select it, I get a page that says "No valid package defined." I've found a thread here that explains how to clean up old package info (looks like a painful and dangerous task of hand-editing the XML config file and reloading the config, ugh - https://forum.pfsense.org/index.php?topic=110096), but I haven't tried doing this yet because (a)  I don't see "NRPE" on the services menu, which I don't believe clearing the old config will fix? (b) NRPE seems to be using the existing config from before I upgraded, I just can't edit it. So I really don't want to wipe out the config and have to start over. It seems like the two packages are potentially conflicting with one-another.

    How do I fix this? Do I really need to remove the NRPE package, use the above hack to fix my config file and reload, re-install the NRPE package (which hopefully will give me a web UI this time), and then go in and manually re-enter all of my services (which will probably take an hour)? Is there a way to backup just my NRPE part of the config, somehow restore the old functionality, then restore the NRPE configuration?

    I'm really tempted to just flash back to whatever the latest version of 2.2 was and restore my config from backup, but then I'll be stuck on an older version indefinitely, which eventually won't be supported, yadda, yadda...

    Really, really disappointing that a "recommended", one-click, automatic update process would break a properly functioning system so badly. :-(



  • Well, since I have backups of everything anyway, I decided to hack at it…

    I first removed the new "NRPE" package.  My plan was to go in and hand-edit the XML file after that to remove any traces of NRPE/NRPEv2, then restore it, then re-install the new NRPE package, and then if possible copy-paste the relevant part of the config back from my NRPEv2 configuration, and if not, go in and re-enter everything via the gui (ugh.) However, simply removing the NRPE package actually cleared up the broken NRPEv2 menu items. I then backed up and hand-edited the XML file. The only thing related to NRPE that I could still see remaining in the XML file (other than firewall rules I had created) was the orphaned configuration for the NRPEv2 service (exactly the part that I wanted to try to save), specifically:

    pfsense -> installedpackages -> nrpe2 -> config

    So I carefully cut the "nrpe2" and below branch out of the XML file and reloaded it. Then I reinstalled the NRPE package using the package manager. Then I backed up the XML file and took a look at it again to see what changes it had made. I discovered that the new NRPE package actually uses the same "nrpe2" stanza in the config file (so I'm not sure why it wasn't working before), specifically, pfsense -> installedpackages -> nrpe2 -> config. So this time I replaced that branch of the XML with the one I had backed up previously, saved and rebooted yet again, and now everything seems to be back to normal. So it must have actually been something else somewhere else in the XML that was gummed up, which installing, removing, and then reinstalling the NRPE package fixed. Sigh.

    Interestingly, the new NRPE package actually is referred to as "NRPEv2" in the menus and such, so... shrug. I have no idea why all this happened. It seems to me that even if the package were removed and then later reinstalled, that the XML wouldn't get all bungled up like this.

    Annoying to have to go to this much work to clean up after an update, but at least my router is working again!