Annoying Notifications in Normal Operation
-
Hi,
I have 2 XG-7100 boxes set up in HA.
I also have Service Watchdog Notifications setup.
Today, I forced failover to then do some package and system updates and for over an hour while doing this I got hundreds of notifications regarding services not running and being restarted, but they were from the Slave router.
I understand that the service is doing what it's supposed to do, but shouldn't it also take notice of if it is the slave or master router, and then only send a notification if it is the master?
I had my bosses jumping up and down about all the notifications coming thru. All I am doing is some upgrades and I don't care if the HAProxy and VPN aren't working on the Slave. It's not supposed to be.
-
@eangulus I would think most people would want the same packages running on the slave so failover keeps everything working. That's generally how we've set up HA router pairs, though we haven't done so with a proxy. The docs refer to "XMLRPC Sync" which is usually used to keep the slave config in sync...most of the more complex packages I've seen can sync their config to the slave automatically.
-
You could always just disable notifications during known maintenance.
Knowing the state of the secondary is important. You don't want to failover to it and find things are broken there.
Steve
-
@stephenw10 having to disable notifications first, not only introduces extra workflow, but also leaves open the chance of being forgotten and not turned back on.
More importantly, the services in question are actually never running on the slave. Certain services do not run on the slave, and only start once the failover is triggered. For instance HAProxy services is stopped on slave, if a failover occurs then HAProxy starts on the slave.
It's exactly because of this that I am having the issue with notifications.
My particular scenario:
Master and Slave pfSense. HAProxy and OpenVPN services running on master, Slave versions are stopped (only some services stay running in slave).
I update packages and pfsense on slave, no interruptions to users. I then force a failover.
Slave takes over, starts HAProxy and OpenVPN, most users done see any issues, HAProxy and OpenVPN have a momentary switch.
Now master (running as slave now), stops it's HAProxy and OpenVPN service. While I go and do updates. In the mean time every minute I get several emails complaining about the services that has stopped.
But I DONT CARE. They are running on the slave (currently master). I only care if the services stop on the current master, or I care that HA sync is broken.
Doesn't make sense to me. And currently the only reason it stopped when I switched back, is because I don't have notifications s turn on on the slave.
-
The Service Watchdog package is not that smart, and has no knowledge of HA or even if services are disabled.
All it does is check if a service is stopped and starts it if it isn't.
The package is a kludge and not intended for that kind of "serious" role. If things are configured and working properly, it is almost always unnecessary.
-
@jimp But that is my point.
It's quite obvious that it isn't that smart and does not seem to be HA aware, but why not?
I'm not even expecting anything for free, we pay for our support and we run in Netgate hardware.
Out issue is this, I am the sole IT person for our business but my boss likes to get all the notifications etc in case I am off work and don't see them.
I am getting very tired of having to explain every little thing to my boss because he got a notification and is now questioning why.
Why can't the monitoring just match that of HA Proxy for instance, HA Proxy only runs on the Master in HA, why can't monitoring also know that? Even manually, why couldn't it be setup to notify only if the router is Master for example, give an option to notify if stopped on Master/Salve or both?
I believe some work needs to be done in this area.
-
Your use case doesn't align with the intended use case of the package, and limiting it in the way you describe would also limit its usefulness for others.
My point is that it's not necessary to run the watchdog package in nearly all cases. If you feel that the watchdog package is necessary there is almost certainly something else wrong with your setup that is making it appear like it's what you want, when it likely is masking the real problem.
The watchdog package is unlikely to gain additional features due to the fact that it really shouldn't exist at all, and is only a workaround for other bugs people have hit over the years.