• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

100% /usr/local/sbin/check_reload_status after gateway down

Scheduled Pinned Locked Moved Official Netgate® Hardware
54 Posts 10 Posters 7.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S
    stephenw10 Netgate Administrator
    last edited by Sep 26, 2023, 12:43 PM

    If you kill that process does it respawn?

    Is there any reason you're not running 23.05.1?

    A 1 Reply Last reply Sep 27, 2023, 10:41 AM Reply Quote 0
    • A
      adamw @stephenw10
      last edited by Sep 27, 2023, 10:41 AM

      @stephenw10

      I had to use "kill -9" to terminate it.
      Waited for 5 mins and didn't see it respawning.
      Being onsite I decided to do a full reboot anyway.

      Regarding pfSense versions.
      I have 3 x Netgate 3100 appliances. 2 live and one spare. One of the live ones is located in a distant datacenter so upgrading it remotely is too risky.
      Typically I upgrade all 3 firewalls only about once per year when I have other reasons to travel to the dc. I import config to the spare one and just physically swap them around followed by some testing. If anything goes wrong then I just swap them back.

      A 1 Reply Last reply Sep 27, 2023, 11:24 AM Reply Quote 0
      • A
        adamw @adamw
        last edited by Sep 27, 2023, 11:24 AM

        BTW: after a reboot I was able to ssh to the fw but couldn't elevate to root as my user any more: "xxx is not in the sudoers file".

        I think I checked everything (sudo installed, member of admins, sudo config) and couldn't see why. Sudo worked fine just before the reboot with no changes made.

        Luckily my web GUI access still worked so I added my user explicitly under /pkg_edit.php?xml=sudo.xml which allowed me to root in shell.

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by Sep 27, 2023, 12:47 PM

          Hmm, that was after upgrading? Or has it worked since then and just stopped now?
          I'm not aware of any issues with sudoers in 23.05(.1).

          A 1 Reply Last reply Sep 27, 2023, 1:30 PM Reply Quote 0
          • A
            adamw @stephenw10
            last edited by Sep 27, 2023, 1:30 PM

            @stephenw10

            What I did last time (about 3 months ago):

            • installed a vanilla 22.01 image from USB (offline),

            • went online,

            • upgraded to 23.05 (2 stages if I remember correctly),

            • restored config.xml (previously exported from 22.01).

            Once the FW booted up everything was working (including sudo) and I haven't restarted it until today.

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Sep 27, 2023, 1:32 PM

              Hmm, odd. Your user was in the admins group and that group was enabled in sudo for the command you were trying to run?

              A 1 Reply Last reply Sep 27, 2023, 1:40 PM Reply Quote 0
              • A
                adamw @stephenw10
                last edited by Sep 27, 2023, 1:40 PM

                @stephenw10

                I don't know for sure how sudoers looked like before (unable to read) but this is how it looks like after 2 users have been explicitly added from web GUI:

                cat /usr/local/etc/sudoers

                root ALL=(root) ALL
                admin ALL=(root) ALL
                %admins ALL=(root) ALL
                myuser ALL=(root) ALL
                someotheradminuser ALL=(root) ALL

                id myuser

                uid=2000(myuser) gid=65534(nobody) groups=65534(nobody),1999(admins)

                1 Reply Last reply Reply Quote 1
                • D
                  darcey
                  last edited by Sep 28, 2023, 9:37 AM

                  I experienced this when I briefly turned off the DSL modem connected to pfSense WAN. When the gateway came back up, /usr/local/sbin/check_reload_status was consuming 100%. I killed that pid but believe it respawns and remained at 100%. I resorted to rebooting.
                  I am on 2.7CE and don't recall seeing this happening before on the rare occassions I've powered off the modem.

                  1 Reply Last reply Reply Quote 0
                  • C
                    copacetic
                    last edited by Oct 15, 2023, 6:45 AM

                    I have the same issue on CE 2.7.0-RELEASE. I installed WireGuard and FRR Packages a couple of days ago (running 2 wireguard VPNs with OSPF), after which it started. Possibly process check_reload_status is using 100% after interface flap, but am not 100% sure. Shutting and no shutting an interface did not reproduce the problem. Manually restarting services in the UI did not work, as well as restarting the WebConfigurator. After using kill -9 I managed to end the process without rebooting. I am aware WireGuard is still under development, but maybe a fix can be found.

                    1 Reply Last reply Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Oct 15, 2023, 2:14 PM

                      So you were not using OpenVPN when this started? And after killing it it didn't return?

                      C 1 Reply Last reply Oct 16, 2023, 5:57 AM Reply Quote 0
                      • C
                        copacetic @stephenw10
                        last edited by Oct 16, 2023, 5:57 AM

                        @stephenw10 No, I restarted ovpn service, and nothing changed. Next time this happens I will a stop the ovpn service for a couple of minutes to see if the behavior changes.

                        1 Reply Last reply Reply Quote 1
                        • D
                          DAVe3283
                          last edited by DAVe3283 Nov 5, 2023, 5:31 PM Nov 5, 2023, 5:28 PM

                          This just happened to me on CE 2.7.0 on the backup CARP node. Have OpenVPN and Wireguard both. OpenVPN was stopped (due to being the backup node). Stopped Wireguard, no change.

                          kill -9 the process and it came back but at 0% usage. Rebooted anyway to make sure the node is not in a weird state in case I need it.

                          Edit: looks like CARP was flapping between master and backup for a minute right before this happened. Will dig into that.

                          1 Reply Last reply Reply Quote 0
                          • S
                            stephenw10 Netgate Administrator
                            last edited by Nov 5, 2023, 10:41 PM

                            Mmm, check the logs when that happened for anything unusual. Flapping interfaces could cause it to queue a number of events. Perhaps hit some race condition....

                            1 Reply Last reply Reply Quote 0
                            • A
                              adamw
                              last edited by adamw Dec 11, 2023, 3:50 PM Dec 10, 2023, 1:17 PM

                              Netgate 3100,
                              23.05-RELEASE (arm)
                              built on Mon May 22 15:04:22 UTC 2023
                              FreeBSD 14.0-CURRENT

                              Things have turned worse for me today.
                              It started as usual, with a very brief connectivity loss on the secondary gateway (ADSL):

                              Dec  7 08:58:46 netgate ppp[5225]: [opt1_link0] PPPoE: connection closed
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1_link0] Link: DOWN event
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1_link0] LCP: Down event
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1_link0] LCP: state change Opened --> Starting
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1_link0] Link: Leave bundle "opt1"
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1] Bundle: Status update: up 0 links, total bandwidth 9600 bps
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1] IPCP: Close event
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1] IPCP: state change Opened --> Closing
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1] IPCP: SendTerminateReq #12
                              Dec  7 08:58:46 netgate ppp[5225]: [opt1] IPCP: LayerDown
                              Dec  7 08:58:47 netgate check_reload_status[24910]: Rewriting resolv.conf
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IFACE: Removing IPv4 address from pppoe0 failed(IGNORING for now. This should be only for PPPoE friendly!): Can't assign requested address
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IFACE: Down event
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IFACE: Rename interface pppoe0 to pppoe0
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IFACE: Set description "WAN2"
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IPCP: Down event
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IPCP: LayerFinish
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] Bundle: No NCPs left. Closing links...
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] IPCP: state change Closing --> Initial
                              Dec  7 08:58:47 netgate ppp[5225]: [opt1] Bundle: Last link has gone, no links for bw-manage defined
                              

                              Since this point CPU usage went up from about 15% to 60% and stayed there.
                              Warning threshold in our monitoring is set to 90%. It hasn't been reached once, so nobody noticed.
                              3 days later this happened:

                              Dec 10 10:10:05 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached   ---> MONITORING STARTED PICKING UP ISSUES
                              Dec 10 10:14:16 netgate kernel: sonewconn: pcb 0xe2f8a000 (192.168.8.1:3128 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 62, jail 0
                              Dec 10 10:15:05 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
                              Dec 10 10:15:16 netgate kernel: sonewconn: pcb 0xe2f8a000 (192.168.8.1:3128 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (250 occurrences), euid 0, rgid 62, jail 0
                              (...)
                              Dec 10 11:00:21 netgate kernel: sonewconn: pcb 0xe2f8a000 (192.168.8.1:3128 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (18 occurrences), euid 0, rgid 62, jail 0
                              Dec 10 11:00:25 netgate kernel: sonewconn: pcb 0xe4a4f800 (127.0.0.1:3128 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (4 occurrences), euid 0, rgid 62, jail 0
                              Dec 10 11:05:06 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
                              Dec 10 11:10:06 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
                              Dec 10 11:15:07 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
                              Dec 10 11:20:07 netgate kernel: [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached   ---> FIREWALL BECAME COMPLETELY UNRESPONSIVE AND REQUIRED POWER CYCLING
                              

                              Has this issue been properly addressed in the latest release 23.09.1?
                              If so, can I still install it on Netgate 3100?

                              S A 2 Replies Last reply Dec 11, 2023, 2:42 PM Reply Quote 0
                              • S
                                stephenw10 Netgate Administrator
                                last edited by Dec 10, 2023, 7:27 PM

                                Nothing has been specifically added to address that AFAIK. But you should upgrade to 23.09.1 to make sure the behavior exists there. Or doesn't.

                                A 1 Reply Last reply Dec 11, 2023, 11:45 AM Reply Quote 0
                                • A
                                  adamw @stephenw10
                                  last edited by Dec 11, 2023, 11:45 AM

                                  @stephenw10

                                  I will upgrade to 23.09.1 some time between Xmas and New Year.

                                  I'm assuming than all future releases will apply to Netgate 3100 as long as FreeBSD Version is still 14.0, correct?

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    stephenw10 Netgate Administrator
                                    last edited by Dec 11, 2023, 1:26 PM

                                    As long as it builds we will try to build it. At some point it's going to become nonviable though.

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      serbus @adamw
                                      last edited by Dec 11, 2023, 2:42 PM

                                      @adamw

                                      Hello!

                                      I have several boxes on 23.05.1 with this problem, and it can run undetected until something goes really bad.

                                      I run a simple command at Diagnostics -> Command Prompt as a quick check for the issue :

                                      ps -Ao comm,pcpu | grep "check_reload_status" | awk '$2 > 10'
                                      

                                      No output means check_reload_status is not using more than 10% cpu.

                                      I setup the mailreport and cron packages to run this command every minute and send me an email if check_reload_status is "overloaded".

                                      Make sure the report is skippable ("Skip If No Content") in the mailreport config.

                                      The cron package can be used after setting up or changing the mailreport to specify a more frequent run than once per day.

                                      John

                                      Lex parsimoniae

                                      P 1 Reply Last reply Dec 11, 2023, 3:24 PM Reply Quote 2
                                      • S
                                        stephenw10 Netgate Administrator
                                        last edited by Dec 11, 2023, 3:07 PM

                                        Have you tested that in 23.09.1?

                                        1 Reply Last reply Reply Quote 0
                                        • P
                                          perka.home @serbus
                                          last edited by Dec 11, 2023, 3:24 PM

                                          @serbus
                                          Great suggestion !
                                          Just set this up on my 23.09.1
                                          We'll see how it behaves.

                                          1 Reply Last reply Reply Quote 1
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received