Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Number of running processes increasing

    Official Netgate® Hardware
    5
    18
    2.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      jonybat
      last edited by

      Hi

      We have 4 SG-3100's, all on 21.02.2-RELEASE, with 66 hours of uptime. Over the last month and half, the number of processes has been steadily increasing in 3 of them, now reaching over 300 in one of them. They do not have any packages installed other than zabbix agent 5.2. They are configured in a 2x2 HA sync mode.

      Results from the one with the most number of processes:

      ps axl | wc -l
           312
      

      And the repeated processes

      ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l
           115
      

      ...all in wait state

      ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l
            57
      

      ...all in iircreq state

      ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l
            57
      

      ...all in wait state

      It seems like the processes can be killed, but they are increasing on a daily basis.

      Any ideas are appreciated.

      Thanks

      1 Reply Last reply Reply Quote 0
      • jimpJ
        jimp Rebel Alliance Developer Netgate
        last edited by

        I have not seen it happen that often, but on very rare occasions I've seen a 3100 get stuck trying to set the LED state, which is what appears to be happening in your case.

        It happens so rarely, though, I can't recall if a reboot fixed it or if I had to shut it down and unplug/replug power.

        Either way, it's unlikely to resolve itself once the hardware is in that state, so reboot one but have someone ready in case it needs unplugged locally.

        That said, you're also on an outdated release, so updating to 21.05 is advisable.

        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

        Need help fast? Netgate Global Support!

        Do not Chat/PM for help!

        1 Reply Last reply Reply Quote 1
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, two of those are triggered by the 1st one because there is an update available and it's setting the LED to flash to tell you. I wouldn't expect them to stall out like that.

          You should upgrade though when you can. If not then you could disable the update check in System > Updates > Settings.

          Steve

          1 Reply Last reply Reply Quote 1
          • J
            jonybat
            last edited by

            Will reboot, update and disable dashboard update check. I'll report back if this happens again.

            Thanks

            J 1 Reply Last reply Reply Quote 0
            • J
              jonybat @jonybat
              last edited by

              This started happening again in one of the gateways. All of them were rebooted and updated 2 weeks ago. Dashboard update check has also been disabled.

              ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l
                    20
              
              ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l
                     9
              
              ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l
                    10
              
              uptime
              11:50AM  up 14 days, 21:06, 3 users, load averages: 0.76, 0.79, 0.75
              

              Any ideas?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Are you running the 3rd party led_gateways script? That does seem to be inducing issues in 21.05 for some reasons. The author just updated it though.

                Steve

                J 1 Reply Last reply Reply Quote 0
                • J
                  jonybat @stephenw10
                  last edited by

                  No, like i said before, these are fresh gateways, with only a couple of months, and the only package that we installed was zabbix-agent 5.2.

                  J 1 Reply Last reply Reply Quote 0
                  • J
                    jonybat @jonybat
                    last edited by

                    Just came to mind that this could be triggered by the zabbix monitoring scripts. We are using https://github.com/rbicelli/pfsense-zabbix-template

                    The default monitoring template calls the functions get_system_pkg_version(), get_system_pkg_version()['version'] and get_system_pkg_version()['installed_version'] from pkg-utils.inc once per day, which i think that could explain why is the update_pkg_metadata script being executed.

                    What it doesn't explain is why it gets stuck...

                    Disabled those monitoring items in the meanwhile.

                    C 1 Reply Last reply Reply Quote 0
                    • C
                      cneep @jonybat
                      last edited by

                      @jonybat Ran across this thread, which described my own situation very closely and thought I'd chime in with perhaps another datapoint.

                      This is a single, non-HA SG-3100 21.02.2-RELEASE with 181 days continuous uptime since last reboot. Three Packages: Cron, zabbix-agent52, zabbix-proxy52

                      Like you, I'm seeing a steady increase in processes. Mine started on ~8/16/2021 and has steadily increased for the past ~70 days up to ~400 processes as of today. Prior to 8/16/2021, the number of processes was consistent at ~100 for the entire monitored history (just over 1 year).

                      In my case:

                      ps axl | wc -l
                           406
                      
                      ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l
                           144
                      

                      ...all in wait state, like yours

                      ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l
                           70
                      

                      ...all in iircreq state, like yours

                      ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l
                           72
                      

                      ...all in wait state, like yours

                      I have not yet rebooted or upgraded but those were going to be my initial actions after a quick bit of research on the issue (which led me here). Your comments don't give me a lot of hope that either will actually resolve the issue, though.

                      No particularly useful information from me so far other than to simply indicate that you don't appear to be a unique case.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, what changed on that date to cause them to start rising? Something locally to you?

                        Steve

                        C 1 Reply Last reply Reply Quote 0
                        • C
                          cneep @stephenw10
                          last edited by

                          @stephenw10 Nothing at all that I can determine. The last intended change would have been around the last reboot, which is confirmed by the Config History. That occurred ~4 months before Zabbix shows the processes started increasing. The firewall would have been left alone to fend for itself for ~4 months prior and a 1.5-2 months after the number of processes suddenly started increasing.

                          Based on my calculations, the OP's problem started ~June 15, 2021. My seemingly identical problem started ~August 16, 2021. I had originally thought that perhaps the process that checks for updates triggered the problem perhaps at the discovery of an update. It would be an external influence on an otherwise "static" firewall (config-wise). But I don't think the release dates match up exactly with when the problem started. Close...a couple of weeks, maybe, but not exact, I don't think. Just a thought that didn't seem to pan out...

                          1 Reply Last reply Reply Quote 0
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Mmm, that seems likely but there was nothing released on either of those dates that might have caused it. Unless the system was tracking dev snapshots.

                            If you reboot do the processes immediately start stacking up again?

                            I assume if you disable Zabbix it stops?

                            Steve

                            1 Reply Last reply Reply Quote 0
                            • W
                              wblanton
                              last edited by wblanton

                              Coming here to mention that this is currently occurring on my SG-3100 running 21.05.1. It started on 10/29/2021 on the 00:01 cron event to run "rc.update_pkg_metadata". That event is still showing in my processes showing with a state of "IN". Later that day, I show the same three processes listed above that all started at the same time and all are showing with state of "IN".

                              This patterns has repeated every day since, where is this an "update_pkg_metadata" at 00:01, and then the set of three processes at some point randomly that day. According to the status monitoring page, I was steady around ~164 processes running before 10/29. As of right now, I have 250 processes.

                              Also, I do not have zabbix installed, but I do have NRPE (which is what alerted me to the issue).

                              Edited to show SW version.

                              1 Reply Last reply Reply Quote 0
                              • W
                                wblanton
                                last edited by

                                I'm not able to physically access this device, and no one else will be until next week. In the meantime, is there any reason I can't run the following commands to just kill these processes?

                                pkill -f "sh /etc/rc.update_pkg_metadata"
                                pkill -f "sh /usr/local/sbin/pfSense-led.sh update 1"
                                pkill gpioctl
                                
                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Yeah, you should be able to do that. Though those processes are not actually doing anything.

                                  Power cycling the appliance when you can is what you should be looking to do.

                                  Steve

                                  W 1 Reply Last reply Reply Quote 0
                                  • W
                                    wblanton @stephenw10
                                    last edited by

                                    @stephenw10 Thanks. We will definitely be trying to do that next week when someone is one site. Right now, I'm just trying to at least get Nagios happy. lol

                                    I know that @jimp said that this only happens on rare occasions, but I'm really curious about how often this happens without the users ever noticing. I'm sure most users aren't monitoring things like process counts, so this may really be happening more than one would think.

                                    In any event, please let me know if there are any logs or anything like that I can get over to y'all for diagnostics.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Thanks. If we think of anything that might be available to check I'll ask.
                                      I've managed to hit it a few times myself and never found anything unusual beyond the gpio driver failing to return. I agree, I think I found it only because I was looking for it.

                                      Steve

                                      1 Reply Last reply Reply Quote 1
                                      • J
                                        jonybat
                                        last edited by

                                        Just as an update for whoever might run into this, we haven't experienced it since the "available version" zabbix items have been disabled and rebooting the gateways afterwards. That has been over 2 months ago.

                                        You can check more info in my previous comment.

                                        There is still the open question on why do the processes get stuck during this check, but since this isn't that important, I'm going to leave it like this.

                                        1 Reply Last reply Reply Quote 1
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.