Praising Service Watchdog !!



  • I am testing a beta version of one pfSense package and and it has a bug when unbound service process keeps disappearing.

    As you can imagine it brings all my network to its knees, all is basically frozen until unbound re-started again.

    Using Service Watchdog has been and life saver and works as it supposed to .

    Thanks to those who brought this service to life !

    @johnpoz


  • LAYER 8 Global Moderator

    Not sure why you paged me to this?

    What "beta" package are you testing? Shouldn't this be in the package section (subsection the package your beta testing)



  • disappearing ?

    That would be "getting killed" or signalled to stop (SIGUP).
    Instead of electroshocking the wrong patient, I would cut deal with the root reason : remove the beta or better, check out and repair the reason why it decides to stop the DNS (unbound).



  • @Gertjan

    Well the goal of the post was to say that Watchdog is doing its job.

    As far as "getting killed" here is what I see:

    May 2 07:14:15  kernel      pid 22129 (unbound), uid 59: exited on signal 11
    

    What would you do to figure out why it's going on ?
    I know for sure that it happens when DNSBL in pfbNG has Unbound python mode enabled.



  • @chudak said in Praising Service Watchdog !!:

    signal 11

    This is a segfault aka general protection fault aka memory access violation. Anything else in the log around the same time or just before?



  • @KOM

    May 2 07:15:02  dhcpleases      kqueue error: unkown
    May 2 07:15:01  dhcpleases      Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    May 2 07:15:01  dhcpleases      /etc/hosts changed size from original!
    May 2 07:15:00  php-cgi     servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
    May 2 07:14:15  kernel      pid 22129 (unbound), uid 59: exited on signal 11
    May 2 06:13:02  dhcpleases      kqueue error: unkown
    May 2 06:13:01  dhcpleases      Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process.
    May 2 06:13:01  dhcpleases      /etc/hosts changed size from original!
    May 2 06:13:00  php-cgi     servicewatchdog_cron.php: Service Watchdog detected service unbound stopped. Restarting unbound (DNS Resolver)
    May 2 06:12:52  kernel      pid 33726 (unbound), uid 59: exited on signal 11
    


  • Euh .....

    Read the log file : it shows what goes on :
    The watchdog finds unbound dead.
    One second later, dhcpleases found "etc/hosts changed size from original!" and want to restart unbound also ...

    [ edit : if you are using 'pfblocker ' and the like, this will take some time ... ]

    For the - maybe related "dhcpleases kqueue error: unkown" see, for example, see https://forum.netgate.com/topic/112302/dhcpleases-unbound-errors-in-the-logs

    [edit : dhcpleases does this probably to early ... unbound is about to be started - pid file not yet created => things get messy now ]

    Btw : restarting a process that goes flat out with a

    @KOM said in Praising Service Watchdog !!:

    segfault aka general protection fault aka memory access violation

    should not be restarted with the wacthdog.
    The problem should be solved.

    @chudak said in Praising Service Watchdog !!:

    What would you do to figure out why it's going on ?

    Applying the one big advantage of open software : look at the code : you can see what happens yourself ;)


Log in to reply