Praising Service Watchdog !!
-
I am testing a beta version of one pfSense package and and it has a bug when unbound service process keeps disappearing.
As you can imagine it brings all my network to its knees, all is basically frozen until unbound re-started again.
Using Service Watchdog has been and life saver and works as it supposed to .
Thanks to those who brought this service to life !
-
Not sure why you paged me to this?
What "beta" package are you testing? Shouldn't this be in the package section (subsection the package your beta testing)
-
disappearing ?
That would be "getting killed" or signalled to stop (SIGUP).
Instead of electroshocking the wrong patient, I would cut deal with the root reason : remove the beta or better, check out and repair the reason why it decides to stop the DNS (unbound). -
Well the goal of the post was to say that Watchdog is doing its job.
As far as "getting killed" here is what I see:
May 2 07:14:15 kernel pid 22129 (unbound), uid 59: exited on signal 11
What would you do to figure out why it's going on ?
I know for sure that it happens when DNSBL in pfbNG has Unbound python mode enabled. -
@chudak said in Praising Service Watchdog !!:
signal 11
This is a segfault aka general protection fault aka memory access violation. Anything else in the log around the same time or just before?
-
May 2 07:15:02 dhcpleases kqueue error: unkown May 2 07:15:01 dhcpleases Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process. May 2 07:15:01 dhcpleases /etc/hosts changed size from original! May 2 07:15:00 php-cgi servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver) May 2 07:14:15 kernel pid 22129 (unbound), uid 59: exited on signal 11 May 2 06:13:02 dhcpleases kqueue error: unkown May 2 06:13:01 dhcpleases Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process. May 2 06:13:01 dhcpleases /etc/hosts changed size from original! May 2 06:13:00 php-cgi servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver) May 2 06:12:52 kernel pid 33726 (unbound), uid 59: exited on signal 11
-
Euh .....
Read the log file : it shows what goes on :
The watchdog finds unbound dead.
One second later, dhcpleases found "etc/hosts changed size from original!" and want to restart unbound also ...[ edit : if you are using 'pfblocker ' and the like, this will take some time ... ]
For the - maybe related "dhcpleases kqueue error: unkown" see, for example, see https://forum.netgate.com/topic/112302/dhcpleases-unbound-errors-in-the-logs
[edit : dhcpleases does this probably to early ... unbound is about to be started - pid file not yet created => things get messy now ]
Btw : restarting a process that goes flat out with a
@KOM said in Praising Service Watchdog !!:
segfault aka general protection fault aka memory access violation
should not be restarted with the wacthdog.
The problem should be solved.@chudak said in Praising Service Watchdog !!:
What would you do to figure out why it's going on ?
Applying the one big advantage of open software : look at the code : you can see what happens yourself ;)