Runaway notification emails
-
Hi,
I've somehow managed to trigger for the second time a case where notify_monitor.php seems to send me the same notification email in an infinite loop. The only way I found to break the loop was to reboot the device.Some relevant config details:
- PFsense 2.6.0
- I have a multi-wan using the secondary WAN as a fallback. I'm using 8.8.8.8 as the gateway ping address because comcast gw won't return pings.
- I have service watchdog set up to reload unbound and dhcpd (they've mysteriously died in the past).
It looks as though my WAN failed over to the backup, and then the main WAN came back at about 2 AM. The WAN reload seemed to cause all of the packages to restart. In the midst of this, I think the service watchdog triggered on unbound.
In any case, I started getting repeated emails with the following:2:05:59 MONITOR: WANW_DHCP is available now, adding to routing group rtr_ipv4 8.8.8.8|MY_IP_REDACTED|WANW_DHCP|16.467ms|4.112ms|0.0%|online|none 2:06:00 Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
This same email (2:05:59 time stamp) is being sent hours later.
System.log contains evidence of repeated sending:
Dec 8 02:07:09 pfSense php[54791]: notify_monitor.php: Message sent to EMAIL@REDACTED.COM OK Dec 8 02:07:21 pfSense php[54791]: notify_monitor.php: Message sent to EMAIL@REDACTED.COM OK
The system generated a few hundred lines of logs in 20 seconds or so - I can pull more if needed.
Is there any way to manually flush the notify_monitor.php queue - other than rebooting? Any other ideas? -
You should see anything in /var/db/notifyqueue.messages.
We have seen a few reports of this but I've never been able to replicate it.
This for example: https://redmine.pfsense.org/issues/13224 -
@stephenw10
I have seen it once ....
That pfSense went bananas, and sent the same email in a loop.
I didn't know where to look for mailqueue, so i ended up doing a "reroot" and it went away ....Didn't even affect "uptime"
(the reroot)
/Bingo
-
@stephenw10 I've just encountered the same issue as reported in the example you give (https://redmine.pfsense.org/issues/13224) where I received an alert from NUT that my UPS was not responding and then a barrage of 'is available now...' notifications for my secondary WAN connection.
There was also a second notification from NUT when network connectivity was re-established to the UPS (only two notifications from NUT, the down notification and the up notification).
I wasn't at liberty to take any action that might cause network downtime (reboot, etc...), but clearing out the repeating notification from /var/db/notifyqueue.messages worked to put a stop to it. Not sure what, if anything, NUT has to do with it, but it seemed interesting that the circumstances I encountered matched what was reported there.