• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

100% /usr/local/sbin/check_reload_status after gateway down

Official NetgateĀ® Hardware
10
54
7.1k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    mrsunfire
    last edited by mrsunfire Aug 29, 2023, 7:15 PM Aug 29, 2023, 7:12 PM

    I'm running pfSense Plus 23.05.1 without any issues. Since 3 days I'm testing a 5G failover connection. I'm tunneling this via VLAN and setup a new VLAN interface for that WAN. It's running flawless until the connection goes down and up again. After that the process "/usr/local/sbin/check_reload_status" is consuming 100% CPU all the time until next reboot.

    Don't know how to sort this issue out.

    login-to-view

    ps uxawww

    USER      PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED        TIME COMMAND
    root       11 305.1  0.0      0     64  -  RNL  Sun18   11658:57.43 [idle]
    root      441 100.0  0.0  13244   2936  -  RNs  Sun18     234:45.30 /usr/local/sbin/check_reload_status
    root    63714   0.3  1.1 166708  86988  -  S    17:18       0:19.50 php-fpm: pool nginx (php-fpm)
    root    22606   0.1  8.2 825440 680352  -  Ss   Sun18      42:57.87 /usr/local/bin/suricata -i igc1.40 -D -c /usr/local/etc/suricata/suricata_24707_igc1.40/suricata.yaml --pidfile /var/run/suricata_igc1.4024707.pid
    root        0   0.0  0.0      0   1776  -  DLs  Sun18      45:15.15 [kernel]
    root        1   0.0  0.0  11352   1220  -  ILs  Sun18       0:00.24 /sbin/init
    root        2   0.0  0.0      0     64  -  WL   Sun18       0:47.49 [clock]
    root        3   0.0  0.0      0     80  -  DL   Sun18       0:00.00 [crypto]
    root        4   0.0  0.0      0     48  -  DL   Sun18       0:00.00 [cam]
    root        5   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [busdma]
    root        6   0.0  0.0      0    928  -  DL   Sun18       1:47.75 [zfskern]
    root        7   0.0  0.0      0     16  -  DL   Sun18       1:14.27 [pf purge]
    root        8   0.0  0.0      0     16  -  DL   Sun18       0:34.54 [rand_harvestq]
    root        9   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [mmcsd0: mmc/sd card]
    root       10   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [audit]
    root       12   0.0  0.0      0    560  -  WL   Sun18      11:50.22 [intr]
    root       13   0.0  0.0      0     64  -  DL   Sun18       0:06.86 [ng_queue]
    root       14   0.0  0.0      0     48  -  DL   Sun18       0:00.46 [geom]
    root       15   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [sequencer 00]
    root       16   0.0  0.0      0     80  -  DL   Sun18       0:05.43 [usb]
    root       17   0.0  0.0      0     16  -  DL   Sun18       0:02.01 [acpi_thermal]
    root       18   0.0  0.0      0     16  -  DL   Sun18       0:00.84 [acpi_cooling0]
    root       19   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [mmcsd0boot0: mmc/sd]
    root       20   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [mmcsd0boot1: mmc/sd]
    root       21   0.0  0.0      0     48  -  DL   Sun18       0:22.20 [pagedaemon]
    root       22   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [vmdaemon]
    root       23   0.0  0.0      0     80  -  DL   Sun18       0:04.66 [bufdaemon]
    root       24   0.0  0.0      0     16  -  DL   Sun18       0:01.09 [vnlru]
    root       25   0.0  0.0      0     16  -  DL   Sun18       0:01.81 [syncer]
    root       26   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [ALQ Daemon]
    root      402   0.0  0.4 114368  31144  -  Ss   Sun18       0:05.71 php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
    root      404   0.0  0.8 170932  63060  -  I    Sun18       3:27.80 php-fpm: pool nginx (php-fpm)
    root      443   0.0  0.0  13244   2636  -  IN   Sun18       0:00.00 check_reload_status: Monitoring daemon of check_reload_status (check_reload_status)
    root      907   0.0  0.0  14364   4016  -  Ss   Sun18       0:00.27 /sbin/devd -q -f /etc/pfSense-devd.conf
    root      917   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [iimb0]
    root      918   0.0  0.0      0     16  -  DL   Sun18       0:00.00 [iimb1]
    

    Netgate 6100 MAX

    1 Reply Last reply Reply Quote 1
    • S
      stephenw10 Netgate Administrator
      last edited by Aug 29, 2023, 7:20 PM

      You can just kill the process as a one time fix:

      kill 441
      

      If it happens again though it requires more investigation.

      Steve

      M 1 Reply Last reply Aug 29, 2023, 7:32 PM Reply Quote 0
      • M
        mrsunfire @stephenw10
        last edited by Aug 29, 2023, 7:32 PM

        @stephenw10 Restarting php-fpm also "fixes" this. But after the 5G goes down again the issue reappears.

        Netgate 6100 MAX

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by Aug 29, 2023, 8:40 PM

          Every time?

          How are the interfaces configured/connected?

          M 1 Reply Last reply Aug 30, 2023, 4:08 AM Reply Quote 0
          • S
            serbus
            last edited by Aug 30, 2023, 3:39 AM

            Hello!

            Around 30% of my routers running 23.05.1 had check_reload_status stuck at 100%. This was not a huge problem for the faster boxes, but the slower ones (sg-3100) were inaccessible from the gui (gateway timeout). Could not kill the check_reload_status process. They needed a reboot from the shell.

            I have no idea how they got in this state. Some are multi-wan. DHCP and static IP WANs. All of them have the wan occasionally go down and come back. Pretty basic setups. Nothing odd.

            check_reload_status seems to have a history...

            https://forum.netgate.com/topic/112573/what-is-check_reload_status

            https://redmine.pfsense.org/issues/2555

            John

            Lex parsimoniae

            1 Reply Last reply Reply Quote 2
            • M
              mrsunfire @stephenw10
              last edited by mrsunfire Aug 30, 2023, 4:22 AM Aug 30, 2023, 4:08 AM

              @stephenw10 Not every time. But every 2-3th time. WAN1 and WAN2 are connected physically via PPPoE and WAN3 (the 5G test WAN) is connected via VLAN and DHCP. I will try to use another physical port for that interface to see if this issue still exists. But I think the gateway handling with 23.05.1 is the problem. Since that release I also do have the issue that the gateway for my Wireguard connection is disabled after bootup.

              Netgate 6100 MAX

              1 Reply Last reply Reply Quote 0
              • S
                stephenw10 Netgate Administrator
                last edited by Aug 30, 2023, 12:14 PM

                A regression since 23.05?

                There is history with check_reload_status, yes, but no recent bugs for it.

                Steve

                M 1 Reply Last reply Aug 30, 2023, 1:04 PM Reply Quote 0
                • M
                  mcury @stephenw10
                  last edited by Aug 30, 2023, 1:04 PM

                  @stephenw10 said in 100% /usr/local/sbin/check_reload_status after gateway down:

                  A regression since 23.05?

                  I think it is, but unfortunately I can't diagnose it further.
                  I sold a SG-3100 to a friend that lives in another state, and he is facing the same problem.
                  PPPoE also..

                  dead on arrival, nowhere to be found.

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephenw10 Netgate Administrator
                    last edited by Aug 30, 2023, 1:08 PM

                    And he definitely wasn't seeing it in 23.05?

                    M S 2 Replies Last reply Aug 30, 2023, 1:11 PM Reply Quote 0
                    • M
                      mcury @stephenw10
                      last edited by mcury Aug 30, 2023, 1:11 PM Aug 30, 2023, 1:11 PM

                      @stephenw10 said in 100% /usr/local/sbin/check_reload_status after gateway down:

                      And he definitely wasn't seeing it in 23.05?

                      Unfortunately I'm not sure.. He is not the type of the guy that monitors CPU usage and other metrics.
                      He called me to help him in something I found that problem and he was already running 23.05.1

                      All I can say is that he is using 23.05.1, multi WAN configuration, one link is PPPoE and the other is not.

                      dead on arrival, nowhere to be found.

                      1 Reply Last reply Reply Quote 0
                      • S
                        stephenw10 Netgate Administrator
                        last edited by Aug 30, 2023, 1:29 PM

                        Mmm, hard to see what might have caused that between 23.05 and 23.05.1. Far more likely to have been introduced since 23.01. Though still nothing obviously that might have caused it. šŸ¤”

                        M 1 Reply Last reply Aug 30, 2023, 1:32 PM Reply Quote 0
                        • M
                          mcury @stephenw10
                          last edited by mcury Aug 30, 2023, 3:05 PM Aug 30, 2023, 1:32 PM

                          @stephenw10 said in 100% /usr/local/sbin/check_reload_status after gateway down:

                          Though still nothing obviously that might have caused it. šŸ¤”

                          It happened three times already, sorry I can't help further..
                          I asked him to buy a modem and let the modem handle the PPPoE for the time being, he will use the DMZ in the modem..
                          He is not using uPNP, IPv6 track interface, Voip or anything like that, so I think the DMZ option will work fine for him..

                          If there is something that I can check during the next event, just tell me and I'll check with him next time.

                          dead on arrival, nowhere to be found.

                          1 Reply Last reply Reply Quote 0
                          • S
                            serbus @stephenw10
                            last edited by Aug 30, 2023, 2:15 PM

                            Hello!

                            I traced the point of failure on two routers back to 8/3 using Status -> Monitoring.

                            They are both in the same comcast service area that was having persistent outages for about 2 hours - WANs going up and down.

                            One router is single wan dhcp. The other is dual wan comcast static ip and dsl pppoe.

                            The single wan is a sg-3100 and the dual wan is a fw4b. Both on 23.05.1

                            Both routers seemed to be running fine for almost 30days with check_reload_status stuck at 100%. I only noticed it on the 3100 when I tried to login.

                            Even though check_reload_status was at 100%, the system load was significantly lower. System/nice util at 13% on the fw4b and system 35% / nice 15% on the 3100.

                            I probably never would have noticed it if I didnt need to login to the 3100.

                            What (other) problems can check_reload_status cause when it is stuck?

                            John

                            Lex parsimoniae

                            1 Reply Last reply Reply Quote 0
                            • S
                              stephenw10 Netgate Administrator
                              last edited by Aug 30, 2023, 2:58 PM

                              Well it uses CPU cycles so you would see reduced throughput etc if you were already close to the hardware limits.

                              Let me see what I can fins here.

                              M 1 Reply Last reply Aug 31, 2023, 3:13 PM Reply Quote 0
                              • M
                                mrsunfire @stephenw10
                                last edited by mrsunfire Aug 31, 2023, 3:17 PM Aug 31, 2023, 3:13 PM

                                @stephenw10 Right now it happened again. I figured out that after reconnecting the 5G modem the WebGUI gets very unresponsive for couple of minutes. Also another gateway (CYBERGHOST) is losing it's connection (seems to be restarted). Took 5 minutes until the 5G got it's public IPv4 via passthrough mode. After that check_reload_status is at 100% again.

                                This is the log while that happened:

                                Aug 31 17:01:26	vnstatd	28204	Error: pidfile "/var/run/vnstat/vnstat.pid" lock failed (Resource temporarily unavailable), exiting.
                                Aug 31 17:01:26	kernel		igc1.300: promiscuous mode disabled
                                Aug 31 17:01:26	php-fpm	80144	/rc.dyndns.update: phpDynDNS (xxx.xxx.net): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                                Aug 31 17:01:25	check_reload_status	63049	rc.newwanip starting ovpnc4
                                Aug 31 17:01:25	kernel		ovpnc4: link state changed to UP
                                Aug 31 17:01:25	php-fpm	41840	/rc.newwanip: rc.newwanip: on (IP address: 10.112.1.1) (interface: OPENVPN_SRV[opt11]) (real interface: ovpns3).
                                Aug 31 17:01:25	php-fpm	41840	/rc.newwanip: rc.newwanip: Info: starting on ovpns3.
                                Aug 31 17:01:25	php-fpm	41840	OpenVPN PID written: 59390
                                Aug 31 17:01:24	check_reload_status	63049	Reloading filter
                                Aug 31 17:01:24	kernel		ovpnc4: link state changed to DOWN
                                Aug 31 17:01:24	php-fpm	41840	OpenVPN terminate old pid: 7849
                                Aug 31 17:01:23	php-fpm	41840	/rc.openvpn: OpenVPN: Resync client4 CYBERGHOST
                                Aug 31 17:01:23	check_reload_status	63049	rc.newwanip starting ovpns3
                                Aug 31 17:01:23	kernel		ovpns3: link state changed to UP
                                Aug 31 17:01:23	php-fpm	41840	OpenVPN PID written: 49415
                                Aug 31 17:01:23	php-fpm	80144	/rc.dyndns.update: phpDynDNS (xxx.xxx.net): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                                Aug 31 17:01:23	check_reload_status	63049	Reloading filter
                                Aug 31 17:01:23	kernel		ovpns3: link state changed to DOWN
                                Aug 31 17:01:23	php-fpm	41840	OpenVPN terminate old pid: 53852
                                Aug 31 17:01:23	php-fpm	81311	/rc.dyndns.update: phpDynDNS (xxx.xxx.net): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                                Aug 31 17:01:22	php-fpm	41840	/rc.openvpn: OpenVPN: Resync server3 OpenVPN Server
                                Aug 31 17:01:21	php-fpm	41840	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_5G.
                                Aug 31 17:01:21	check_reload_status	63049	rc.newwanip starting ovpnc4
                                Aug 31 17:01:21	kernel		ovpnc4: link state changed to UP
                                Aug 31 17:01:21	snmpd	68386	disk_OS_get_disks: adding device 'nvd0' to device list
                                Aug 31 17:01:21	snmpd	68386	disk_OS_get_disks: adding device 'mmcsd0boot0' to device list
                                Aug 31 17:01:21	snmpd	68386	disk_OS_get_disks: adding device 'mmcsd0boot1' to device list
                                Aug 31 17:01:21	check_reload_status	63049	Updating static routes based on hostnames
                                Aug 31 17:01:21	check_reload_status	63049	Reloading filter
                                Aug 31 17:01:21	php-fpm	80144	OpenVPN PID written: 7849
                                Aug 31 17:01:20	php-fpm	60232	/rc.start_packages: Restarting/Starting all packages.
                                Aug 31 17:01:20	php-fpm	41840	/rc.newroutedns: Static Routes: One or more aliases used for routing has changed its IP. Refreshing.
                                Aug 31 17:01:20	check_reload_status	63049	Reloading filter
                                Aug 31 17:01:20	check_reload_status	63049	Starting packages
                                Aug 31 17:01:20	php-fpm	60232	/rc.newwanip: Netgate pfSense Plus package system has detected an IP change or dynamic WAN reconnection - 10.2.5.70 -> 10.4.5.5 - Restarting packages.
                                Aug 31 17:01:20	kernel		ovpnc4: link state changed to DOWN
                                Aug 31 17:01:20	php-fpm	80144	OpenVPN terminate old pid: 5686
                                Aug 31 17:01:19	php-fpm	80144	/rc.openvpn: OpenVPN: Resync client4 CYBERGHOST
                                Aug 31 17:01:19	check_reload_status	63049	rc.newwanip starting ovpns3
                                Aug 31 17:01:19	php-fpm	80144	OpenVPN PID written: 53852
                                Aug 31 17:01:19	kernel		ovpns3: link state changed to UP
                                Aug 31 17:01:19	php-fpm	81311	/rc.dyndns.update: phpDynDNS (rv1125g.homeip.net): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                                Aug 31 17:01:18	kernel		ovpns3: link state changed to DOWN
                                Aug 31 17:01:18	devd	907	notify_clients: send() failed; dropping unresponsive client
                                Aug 31 17:01:18	php-fpm	80144	OpenVPN terminate old pid: 78480
                                Aug 31 17:01:18	php-fpm	60232	/rc.newwanip: Creating rrd update script
                                Aug 31 17:01:18	php-fpm	60232	/rc.newwanip: Ignoring IPsec reload since there are no tunnels on interface opt10
                                Aug 31 17:01:17	php-fpm	80144	/rc.openvpn: OpenVPN: Resync server3 OpenVPN Server
                                Aug 31 17:01:17	php-fpm	60232	/rc.newwanip: IP Address has changed, killing states on former IP Address 10.2.5.70.
                                Aug 31 17:01:17	php-fpm	80144	/rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed IP addresses. Reloading endpoints that may use WAN_5G.
                                Aug 31 17:01:17	check_reload_status	63049	Restarting OpenVPN tunnels/interfaces
                                Aug 31 17:01:17	check_reload_status	63049	Restarting IPsec tunnels
                                Aug 31 17:01:17	check_reload_status	63049	updating dyndns WAN_5G
                                Aug 31 17:01:17	rc.gateway_alarm	74674	>>> Gateway alarm: WAN_5G (Addr:4.2.2.2 Alarm:1 RTT:0ms RTTsd:0ms Loss:100%)
                                Aug 31 17:01:16	php_pfb	86922	[pfBlockerNG] filterlog daemon started
                                Aug 31 17:01:15	tail_pfb	86542	[pfBlockerNG] Firewall Filter Service started
                                Aug 31 17:01:15	php_pfb	82278	[pfBlockerNG] filterlog daemon stopped
                                Aug 31 17:01:15	tail_pfb	82104	[pfBlockerNG] Firewall Filter Service stopped
                                Aug 31 17:01:15	vnstatd	76188	Error: pidfile "/var/run/vnstat/vnstat.pid" lock failed (Resource temporarily unavailable), exiting.
                                Aug 31 17:01:15	snmpd	56099	disk_OS_get_disks: adding device 'nvd0' to device list
                                Aug 31 17:01:15	snmpd	56099	disk_OS_get_disks: adding device 'mmcsd0boot0' to device list
                                Aug 31 17:01:15	snmpd	56099	disk_OS_get_disks: adding device 'mmcsd0boot1' to device list
                                Aug 31 17:01:15	php-fpm	81311	/rc.newroutedns: Static Routes: One or more aliases used for routing has changed its IP. Refreshing.
                                Aug 31 17:01:15	php-fpm	81311	/rc.start_packages: Skipping STARTing packages process because previous/another instance is already running
                                Aug 31 17:01:15	check_reload_status	63049	Reloading filter
                                Aug 31 17:01:15	check_reload_status	63049	Starting packages
                                

                                Netgate 6100 MAX

                                1 Reply Last reply Reply Quote 0
                                • S
                                  stephenw10 Netgate Administrator
                                  last edited by Aug 31, 2023, 3:45 PM

                                  Hmm, it actually took 5mins? Because the 5G router didn't have an IP for that long?

                                  That is at least unusual. Obviously that shouldn't make any difference but....

                                  M 1 Reply Last reply Aug 31, 2023, 3:58 PM Reply Quote 0
                                  • M
                                    mrsunfire @stephenw10
                                    last edited by Aug 31, 2023, 3:58 PM

                                    @stephenw10 5G modem took around 1 minute to bootup and getting an IP address.

                                    Netgate 6100 MAX

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      stephenw10 Netgate Administrator
                                      last edited by Aug 31, 2023, 5:02 PM

                                      And pfSense took a further 4 mins to pull a dhcp lease?

                                      What was logged in that time? In the dhcp and/or system log?

                                      M 1 Reply Last reply Sep 1, 2023, 4:26 AM Reply Quote 0
                                      • M
                                        mrsunfire @stephenw10
                                        last edited by Sep 1, 2023, 4:26 AM

                                        @stephenw10 Unfortunately did not check that. I did now use a physical port for that connection and will see if the error still appears. Otherwise it's an issue with VLAN WAN interfaces as it seems.

                                        Netgate 6100 MAX

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          stephenw10 Netgate Administrator
                                          last edited by Sep 1, 2023, 1:09 PM

                                          I've yet to see this here on any system. If anyone has a way to replicate this please let us know so we can dig into it.

                                          P adamwA 2 Replies Last reply Sep 18, 2023, 11:36 PM Reply Quote 0
                                          8 out of 54
                                          • First post
                                            8/54
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.