Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    CPU 100%, unbound and dhcpd restarting whenever the filter reloads

    Scheduled Pinned Locked Moved Firewalling
    19 Posts 5 Posters 942 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • U
      Uglybrian
      last edited by

      I would try turning off watchdog. As I understand it, watchdog is used for developers not production.

      pfuser23984P 1 Reply Last reply Reply Quote 0
      • GertjanG
        Gertjan @pfuser23984
        last edited by

        @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

        figure out the logic of what this "watchdog" is (it does not appear to be a "service" watchdog.,

        This one :

        Jan 23 06:45:00 	php-cgi 	81580 	servicewatchdog_cron.php: Service Watchdog detected service radiusd stopped. Restarting radiusd (FreeRADIUS Server)
        

        Is a pfSense package you've installed.

        c9b74cfd-0529-4be5-bfee-bc5aa7a9facc-image.png

        It install itself with a cron task, and executes every x minutes and checks if the selected packaged are still 'up and runin'.
        If it isn't, it restarts them.
        Do other tests are done, so mostly, this will create even more problems, and obfuscate the original problem.
        ( yeah, you've noticed, I'm biased, I 'hate' this package - and of course, never used it myself )

        @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

        kernel re1: watchdog timeout

        That's another watchdog. That one comes from the driver itself ... telling things are not going well.
        But what to say, its a Realtek NIC.

        Throw this in Google "Should I use realtek nics ?" and read the very first 5 titles that pop up.
        Pick some Youtube video from big, know quality authors, and you're good : you would even send a Realtek NIC to your worst enemy - so why using them yourself ?
        Realtek is a known brand and you see them a lot in 'end user devices' where price is more important as quality. They exist, so maybe for Microsoft OSs the NIC behave well ? You'll never see them with switches routers servers, everything that should work 'all the time'.

        @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

        pfSense package system has detected an IP change or dynamic WAN reconnection - 10.0.10.1 -> 10.0.10.1 - Restarting packages. is not right. Why would it decide to restart all packages because of an IP that HAS NOT changed?

        And you're right. I have, actually had, the very same thoughts.

        But the change of an IP, or not, wasn't the issue that triggered the event. Its just a side effect.
        The event that triggers the whole packet, firewall rule, whatever reloading was the NIC down-up event.
        One of the action will by "applying an IP" and this could be the same IP.
        That event, you can see with your own eyes it yourself, look at the RJ45 plugs, there are leds, they went "off" for a moment. And came back. So, your mission is : who pulled the plug ?
        "Realtek NIC can do this themselves" when they think it needs to reset because of an issue, like " kernel re1: watchdog timeout". (I think, I stopped using these NICs decades ago.

        No "help me" PM's please. Use the forum, the community will thank you.
        Edit : and where are the logs ??

        pfuser23984P 1 Reply Last reply Reply Quote 0
        • bmeeksB
          bmeeks @pfuser23984
          last edited by

          @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

          @bmeeks

          Thanks. I installed realtek-re-kmod and rebooted, but no change in the behavior. It isn't random, it is being triggered. I'm trying to figure out the logic of what this "watchdog" is (it does not appear to be a "service" watchdog., and why it is timing out on re1 LAN interface... which seems to be causing the cascade of hotplug event detections and link state changes.

          pfsense2.log.txt:
          Jan 23 07:16:15 kernel re1: watchdog timeout

          It is also different behavior when this is triggered by rc.newwanip starting ovpns2 (pfsense1.log.txt) when OpenVPN server is started. But similarly...
          pfSense package system has detected an IP change or dynamic WAN reconnection - 10.0.10.1 -> 10.0.10.1 - Restarting packages. is not right. Why would it decide to restart all packages because of an IP that HAS NOT changed?

          I do not have any gateway monitoring actions enabled either.Screenshot 2025-01-23 at 09-18-43 System Routing Gateways Edit - Zoltar.zDomain.png

          Certain events detected by the pfSense subsystem automatically kick off scripts to adjust for the event. One of those is when the system believes the NIC has disconnected and then reconnected. The automatic assumption is the IP configuration is likely to have changed.

          Your Realtek NIC is flapping, and that is causing pfSense to trigger its automatic scripts. You can't stop that. Instead, you must correct the flapping of the NIC interface.

          The "watchdog" mentioned in the log message is not the Service Watchdog package someone referred to earlier. Instead, that is a built-in hardware thingy in the Realtek NIC. You must have a newer Realtek NIC that is not compatible with the FreeBSD version used in pfSense. You have a NIC driver compatibilty issue, and the only solution is to change the NIC driver.

          You can change the NIC driver most easily by replacing the Realtek NIC with something better supported like Intel. If you have space in the hardware, buy a cheap Intel NIC off Amazon or elsewhere and install that. Remove the Realtek NIC or disable it if it is an onboard option.

          pfuser23984P 1 Reply Last reply Reply Quote 0
          • pfuser23984P
            pfuser23984 @Gertjan
            last edited by pfuser23984

            @Uglybrian said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

            I would try turning off watchdog. As I understand it, watchdog is used for developers not production.

            Okay, I've uninstalled the Service Watchdog package. Re-testing to see if if changes the initial behavior. UPDATE: No change in behavior

            @Gertjan
            The "Realtek is bad, don't use it" sentiment is understandable, but unhelpful. I've been using this hardware for 5 years without issue.
            I am trying to find the root cause. And its the NIC down-up events that seem to come AFTER the initial issue of a watchdog or some detection event that is triggering the down-up to happen.

            Also, pfsense1.log.txt shows the same issue, but NOT showing the interfaces flapping. Just the newwanip script detecting something that makes it want to restart all packages.

            1 Reply Last reply Reply Quote 0
            • pfuser23984P
              pfuser23984 @Uglybrian
              last edited by

              @Uglybrian said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

              I would try turning off watchdog. As I understand it, watchdog is used for developers not production.

              After uninstalling the Service Watchdog... the issue still happens, but while dhcpd comes back up... unbound stays off, and I need to manually restart the service.

              Watchdog was NOT the root cause, but was working to recover.

              1 Reply Last reply Reply Quote 0
              • pfuser23984P
                pfuser23984 @bmeeks
                last edited by pfuser23984

                @bmeeks said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                @bmeeks

                Thanks. I installed realtek-re-kmod and rebooted, but no change in the behavior. It isn't random, it is being triggered. I'm trying to figure out the logic of what this "watchdog" is (it does not appear to be a "service" watchdog., and why it is timing out on re1 LAN interface... which seems to be causing the cascade of hotplug event detections and link state changes.

                pfsense2.log.txt:
                Jan 23 07:16:15 kernel re1: watchdog timeout

                It is also different behavior when this is triggered by rc.newwanip starting ovpns2 (pfsense1.log.txt) when OpenVPN server is started. But similarly...
                pfSense package system has detected an IP change or dynamic WAN reconnection - 10.0.10.1 -> 10.0.10.1 - Restarting packages. is not right. Why would it decide to restart all packages because of an IP that HAS NOT changed?

                I do not have any gateway monitoring actions enabled either.Screenshot 2025-01-23 at 09-18-43 System Routing Gateways Edit - Zoltar.zDomain.png

                Certain events detected by the pfSense subsystem automatically kick off scripts to adjust for the event. One of those is when the system believes the NIC has disconnected and then reconnected. The automatic assumption is the IP configuration is likely to have changed.

                Your Realtek NIC is flapping, and that is causing pfSense to trigger its automatic scripts. You can't stop that. Instead, you must correct the flapping of the NIC interface.

                The "watchdog" mentioned in the log message is not the Service Watchdog package someone referred to earlier. Instead, that is a built-in hardware thingy in the Realtek NIC. You must have a newer Realtek NIC that is not compatible with the FreeBSD version used in pfSense. You have a NIC driver compatibilty issue, and the only solution is to change the NIC driver.

                You can change the NIC driver most easily by replacing the Realtek NIC with something better supported like Intel. If you have space in the hardware, buy a cheap Intel NIC off Amazon or elsewhere and install that. Remove the Realtek NIC or disable it if it is an onboard option.

                This still does not make any sense.
                How can hardware be to blame for something that has been working for years?
                I would understand if this was a new install problem. But it's not.

                Also, again. The pfsense1.log.txt shows the same issue, but NOT showing the interfaces flapping until much later. Just the newwanip script detecting IP changes that makes it want to restart all packages.

                Jan 23 17:46:25 	kernel 		re1: link state changed to DOWN
                Jan 23 17:46:25 	kernel 		re1: watchdog timeout
                Jan 23 17:46:25 	check_reload_status 	80356 	Linkup starting re1
                Jan 23 17:46:21 	php-fpm 	80230 	/rc.newwanip: Removing static route for monitor 1.1.1.1 and adding a new route through x.x.x.x
                Jan 23 17:46:18 	php-fpm 	80230 	/rc.newwanip: rc.newwanip: on (IP address: 10.0.10.1) (interface: VPN[opt6]) (real interface: ovpns2).
                Jan 23 17:46:18 	php-fpm 	80230 	/rc.newwanip: rc.newwanip: Info: starting on ovpns2.
                Jan 23 17:46:17 	check_reload_status 	80356 	rc.newwanip starting ovpns2
                Jan 23 17:46:17 	kernel 		ovpns2: link state changed to UP
                Jan 23 17:46:17 	check_reload_status 	80356 	Reloading filter
                Jan 23 17:46:17 	php-fpm 	80178 	OpenVPN PID written: 59515 
                

                All I did was turn on the OpenVPN server..
                Look at the timestamps The newwanip stuff starts up seconds before the NIC link state changes. And it looks like the watchdog is CAUSING the link state changes to happen. Not the other way around.

                S 1 Reply Last reply Reply Quote 0
                • S
                  SteveITS Galactic Empire @pfuser23984
                  last edited by

                  @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                  How can hardware be to blame for something that has been working for years?

                  pfSense upgrades the FreeBSD OS over time which could well include different drivers. Other than many comments here over the past 10 years or so about Realtek and FreeBSD I don't have much experience except on a CE install and that was fine for years unless we enabled Suricata Inline mode in which case (IIRC) port forwards would stop working after a time.

                  Jan 23 06:44:41 	php-fpm 	28016 	/rc.newwanip: rc.newwanip: on (IP address: 10.0.10.1) (interface: VPN[opt6]) (real interface: ovpns2).
                  Jan 23 06:44:41 	php-fpm 	28016 	/rc.newwanip: rc.newwanip: Info: starting on ovpns2.
                  ...
                  Jan 23 06:44:41 	php-cgi 	20458 	pfSsh.php: Configuration Change: (system): WAN to VPN
                  Jan 23 06:44:41 	php-cgi 	20458 	pfSsh.php: New alert found: Enabling Rule - WAN to VPN for x.x.x.x on port 443
                  Jan 23 06:44:40 	check_reload_status 	28303 	rc.newwanip starting ovpns2
                  Jan 23 06:44:40 	kernel 			ovpns2: link state changed to UP
                  

                  Is your VPN disconnecting/reconnecting? At least, at 6:44?

                  "rc.newwanip" is the name of a script and that name can be confusing because AFAIK it's run on any IP change (or add or subtract). Unbound, nginx, and other services may need to bind to the new IP so pfSense restarts many services. It might be clearer if it was renamed "rc.newipdetected" or similar.

                  "link state changed to DOWN" is the port detecting (reporting) no connection.

                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                  Upvote ๐Ÿ‘ helpful posts!

                  pfuser23984P 2 Replies Last reply Reply Quote 0
                  • pfuser23984P
                    pfuser23984 @SteveITS
                    last edited by pfuser23984

                    @SteveITS said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                    @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                    How can hardware be to blame for something that has been working for years?

                    pfSense upgrades the FreeBSD OS over time which could well include different drivers. Other than many comments here over the past 10 years or so about Realtek and FreeBSD I don't have much experience except on a CE install and that was fine for years unless we enabled Suricata Inline mode in which case (IIRC) port forwards would stop working after a time.

                    Jan 23 06:44:41 	php-fpm 	28016 	/rc.newwanip: rc.newwanip: on (IP address: 10.0.10.1) (interface: VPN[opt6]) (real interface: ovpns2).
                    Jan 23 06:44:41 	php-fpm 	28016 	/rc.newwanip: rc.newwanip: Info: starting on ovpns2.
                    ...
                    Jan 23 06:44:41 	php-cgi 	20458 	pfSsh.php: Configuration Change: (system): WAN to VPN
                    Jan 23 06:44:41 	php-cgi 	20458 	pfSsh.php: New alert found: Enabling Rule - WAN to VPN for x.x.x.x on port 443
                    Jan 23 06:44:40 	check_reload_status 	28303 	rc.newwanip starting ovpns2
                    Jan 23 06:44:40 	kernel 			ovpns2: link state changed to UP
                    

                    Is your VPN disconnecting/reconnecting? At least, at 6:44?

                    "rc.newwanip" is the name of a script and that name can be confusing because AFAIK it's run on any IP change (or add or subtract). Unbound, nginx, and other services may need to bind to the new IP so pfSense restarts many services. It might be clearer if it was renamed "rc.newipdetected" or similar.

                    "link state changed to DOWN" is the port detecting (reporting) no connection.

                    It is common for hammers to see only nails. I could tell this community desires to be very helpful, but for many, seeing anything to do with Realtek NICs may cause flashbacks and a deep focus on blaming them.

                    I am manually toggling the VPN. I have some automation that will do this, but it's just an example. That first log file was me using a pfSsh.php script to do it. But this latest log post, I just test it from the Services page in the UI.
                    But ANY change to the firewall configuration that causes a filter reload will usually trigger this cascade.
                    I've been on Pfsense 2.7.2 since April or May 2024, no issues. This just started in December 2024.
                    I am going to check any packages or patches installed or updated around that time.

                    bmeeksB 2 Replies Last reply Reply Quote 0
                    • pfuser23984P
                      pfuser23984 @SteveITS
                      last edited by pfuser23984

                      @SteveITS

                      Jan 23 18:23:27 	kernel 		re1.800: link state changed to DOWN
                      Jan 23 18:23:27 	kernel 		re1: link state changed to DOWN
                      Jan 23 18:23:27 	kernel 		re1: watchdog timeout
                      Jan 23 18:23:27 	check_reload_status 	79149 	Linkup starting re1
                      Jan 23 18:23:20 	check_reload_status 	79149 	Reloading filter
                      Jan 23 18:23:20 	check_reload_status 	79149 	Syncing firewall
                      Jan 23 18:23:20 	php-fpm 	35087 	/system_advanced_misc.php: Configuration Change: user@10.0.1.10 (RADIUS/FreeRADIUS): Miscellaneous Advanced Settings saved 
                      

                      Even benign changes to the configuration in the UI like, Do NOT send Netgate Device ID with user agent, triggers the reloading of the filter, which triggers the watchdog timeout.

                      My question is this...
                      If this were a Realtek related issue, on Layer 2 or Layer 1... wouldn't I expect to see this happen randomly when I am not changing the firewall config or reloading the filter?

                      1 Reply Last reply Reply Quote 0
                      • bmeeksB
                        bmeeks @pfuser23984
                        last edited by

                        @pfuser23984
                        When you installed the latest kmod driver, did you follow the steps outlined in this post:
                        https://forum.netgate.com/topic/160529/realtek-nic-and-watchdog-timeout/13?

                        If not, the new driver is likely not being used. The current driver for pfSense 2.7.2 CE is I believe v1.98 instead of the v1.96 mentioned in the linked post.

                        Don't focus on the rc.newwanip script. That is getting triggered by your VPN interface coming up. As mentioned by another reply, anything that alters interface connectivity triggers that script so that all other firewall processes can be notified of a potential change (such as a newly configured interface, a change in IP address on an existing interface, deletion of an interface, etc.).

                        I found a number of Google search hits on Realtek NICs triggering their internal watchdog timeout when a VPN is brought online and is crossing the NIC. No, I don't have an immediate explanation for why it worked for 5 years as you say. But apparently you have either updated something on the firewall to cause this now or your NIC is glitching from some new hardware anomaly and heading towards failure.

                        Do you have more VLANs defined now than in the past on this port? Is the traffic across the NIC heavier than in the past (ISP speeds increased, more users, etc.)?

                        From the logs, unbound (the DNS Resolver) is not starting because it seems to already be running (or at least something is occupying the port it wants to use as it is logging "port already in use"). That could be the result of the machine gun burst of "restart all packages" commands happening.

                        1 Reply Last reply Reply Quote 1
                        • bmeeksB
                          bmeeks @pfuser23984
                          last edited by bmeeks

                          @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                          I've been on Pfsense 2.7.2 since April or May 2024, no issues. This just started in December 2024.
                          I am going to check any packages or patches installed or updated around that time.

                          If you can definitely pinpoint the start of the issue as December 2024, then certainly you need to start by looking at all changes made to the firewall since that date. You can examine the configuration changes by looking at the diffs of the automatic config.xml backups under DIAGNOSTICS.

                          Just don't automatically discount the NIC, though. As mentioned, the Realtek devices can work okay and then start to get flaky when traffic loads increase. Lots of Google search results detailing that.

                          pfuser23984P 1 Reply Last reply Reply Quote 0
                          • pfuser23984P
                            pfuser23984 @bmeeks
                            last edited by

                            @bmeeks said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                            @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                            I've been on Pfsense 2.7.2 since April or May 2024, no issues. This just started in December 2024.
                            I am going to check any packages or patches installed or updated around that time.

                            If you can definitely pinpoint the start of the issue as December 2024, then certainly you need to start by looking at all changes made to the firewall since that date. You can examine the configuration changes by looking at the diffs of the automatic config.xml backups under DIAGNOSTICS.

                            Just don't automatically discount the NIC, though. As mentioned, the Realtek devices can work okay and then start to get flaky when traffic loads increase. Lots of Google search results detailing that.

                            When you installed the latest kmod driver, did you follow the steps outlined in this post:
                            https://forum.netgate.com/topic/160529/realtek-nic-and-watchdog-timeout/13?

                            SOMMOMMA!@##@!!

                            That did it.
                            I am used to linux where loading kernel drivers is easy to do and easy to verify. I did ithe install with pkg install realtek-re-kmod and rebooted... but the echo 'if_re_load="YES"' >> /boot/loader.conf.local was needed to load the new driver. Not really an intuitive process.

                            I ran through my tests, and the problem is gone now. I've even restored gateway monitoring, patches and watchdog. The rc.newwanip still does its thing, but the re1 NIC no longer flaps, the dhcpd / unbound services no longer crash, the CPU no longer spikes making the system unusable until php-fpm is restarted.

                            Thank you so much.

                            bmeeksB 1 Reply Last reply Reply Quote 2
                            • bmeeksB
                              bmeeks @pfuser23984
                              last edited by

                              @pfuser23984 said in CPU 100%, unbound and dhcpd restarting whenever the filter reloads:

                              Just don't automatically discount the NIC, though. As mentioned, the Realtek devices can work okay and then start to get flaky when traffic loads increase. Lots of Google search results detailing that.

                              When you installed the latest kmod driver, did you follow the steps outlined in this post:
                              https://forum.netgate.com/topic/160529/realtek-nic-and-watchdog-timeout/13?

                              SOMMOMMA!@##@!!

                              That did it.
                              I am used to linux where loading kernel drivers is easy to do and easy to verify. I did ithe install with pkg install realtek-re-kmod and rebooted... but the echo 'if_re_load="YES"' >> /boot/loader.conf.local was needed to load the new driver. Not really an intuitive process.

                              I ran through my tests, and the problem is gone now. I've even restored gateway monitoring, patches and watchdog. The rc.newwanip still does its thing, but the re1 NIC no longer flaps, the dhcpd / unbound services no longer crash, the CPU no longer spikes making the system unusable until php-fpm is restarted.

                              Thank you so much.

                              Glad that fixed it for you ๐Ÿ™‚.

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.