Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Number of running processes increasing

    Scheduled Pinned Locked Moved Official Netgate® Hardware
    18 Posts 5 Posters 3.3k Views 6 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ Offline
      jimp Rebel Alliance Developer Netgate
      last edited by

      I have not seen it happen that often, but on very rare occasions I've seen a 3100 get stuck trying to set the LED state, which is what appears to be happening in your case.

      It happens so rarely, though, I can't recall if a reboot fixed it or if I had to shut it down and unplug/replug power.

      Either way, it's unlikely to resolve itself once the hardware is in that state, so reboot one but have someone ready in case it needs unplugged locally.

      That said, you're also on an outdated release, so updating to 21.05 is advisable.

      Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      1 Reply Last reply Reply Quote 1
      • stephenw10S Offline
        stephenw10 Netgate Administrator
        last edited by

        Hmm, two of those are triggered by the 1st one because there is an update available and it's setting the LED to flash to tell you. I wouldn't expect them to stall out like that.

        You should upgrade though when you can. If not then you could disable the update check in System > Updates > Settings.

        Steve

        1 Reply Last reply Reply Quote 1
        • J Offline
          jonybat
          last edited by

          Will reboot, update and disable dashboard update check. I'll report back if this happens again.

          Thanks

          J 1 Reply Last reply Reply Quote 0
          • J Offline
            jonybat @jonybat
            last edited by

            This started happening again in one of the gateways. All of them were rebooted and updated 2 weeks ago. Dashboard update check has also been disabled.

            ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l
                  20
            
            ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l
                   9
            
            ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l
                  10
            
            uptime
            11:50AM  up 14 days, 21:06, 3 users, load averages: 0.76, 0.79, 0.75
            

            Any ideas?

            1 Reply Last reply Reply Quote 0
            • stephenw10S Offline
              stephenw10 Netgate Administrator
              last edited by

              Are you running the 3rd party led_gateways script? That does seem to be inducing issues in 21.05 for some reasons. The author just updated it though.

              Steve

              J 1 Reply Last reply Reply Quote 0
              • J Offline
                jonybat @stephenw10
                last edited by

                No, like i said before, these are fresh gateways, with only a couple of months, and the only package that we installed was zabbix-agent 5.2.

                J 1 Reply Last reply Reply Quote 0
                • J Offline
                  jonybat @jonybat
                  last edited by

                  Just came to mind that this could be triggered by the zabbix monitoring scripts. We are using https://github.com/rbicelli/pfsense-zabbix-template

                  The default monitoring template calls the functions get_system_pkg_version(), get_system_pkg_version()['version'] and get_system_pkg_version()['installed_version'] from pkg-utils.inc once per day, which i think that could explain why is the update_pkg_metadata script being executed.

                  What it doesn't explain is why it gets stuck...

                  Disabled those monitoring items in the meanwhile.

                  C 1 Reply Last reply Reply Quote 0
                  • C Offline
                    cneep @jonybat
                    last edited by

                    @jonybat Ran across this thread, which described my own situation very closely and thought I'd chime in with perhaps another datapoint.

                    This is a single, non-HA SG-3100 21.02.2-RELEASE with 181 days continuous uptime since last reboot. Three Packages: Cron, zabbix-agent52, zabbix-proxy52

                    Like you, I'm seeing a steady increase in processes. Mine started on ~8/16/2021 and has steadily increased for the past ~70 days up to ~400 processes as of today. Prior to 8/16/2021, the number of processes was consistent at ~100 for the entire monitored history (just over 1 year).

                    In my case:

                    ps axl | wc -l
                         406
                    
                    ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l
                         144
                    

                    ...all in wait state, like yours

                    ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l
                         70
                    

                    ...all in iircreq state, like yours

                    ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l
                         72
                    

                    ...all in wait state, like yours

                    I have not yet rebooted or upgraded but those were going to be my initial actions after a quick bit of research on the issue (which led me here). Your comments don't give me a lot of hope that either will actually resolve the issue, though.

                    No particularly useful information from me so far other than to simply indicate that you don't appear to be a unique case.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S Offline
                      stephenw10 Netgate Administrator
                      last edited by

                      Hmm, what changed on that date to cause them to start rising? Something locally to you?

                      Steve

                      C 1 Reply Last reply Reply Quote 0
                      • C Offline
                        cneep @stephenw10
                        last edited by

                        @stephenw10 Nothing at all that I can determine. The last intended change would have been around the last reboot, which is confirmed by the Config History. That occurred ~4 months before Zabbix shows the processes started increasing. The firewall would have been left alone to fend for itself for ~4 months prior and a 1.5-2 months after the number of processes suddenly started increasing.

                        Based on my calculations, the OP's problem started ~June 15, 2021. My seemingly identical problem started ~August 16, 2021. I had originally thought that perhaps the process that checks for updates triggered the problem perhaps at the discovery of an update. It would be an external influence on an otherwise "static" firewall (config-wise). But I don't think the release dates match up exactly with when the problem started. Close...a couple of weeks, maybe, but not exact, I don't think. Just a thought that didn't seem to pan out...

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Mmm, that seems likely but there was nothing released on either of those dates that might have caused it. Unless the system was tracking dev snapshots.

                          If you reboot do the processes immediately start stacking up again?

                          I assume if you disable Zabbix it stops?

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • W Offline
                            wblanton
                            last edited by wblanton

                            Coming here to mention that this is currently occurring on my SG-3100 running 21.05.1. It started on 10/29/2021 on the 00:01 cron event to run "rc.update_pkg_metadata". That event is still showing in my processes showing with a state of "IN". Later that day, I show the same three processes listed above that all started at the same time and all are showing with state of "IN".

                            This patterns has repeated every day since, where is this an "update_pkg_metadata" at 00:01, and then the set of three processes at some point randomly that day. According to the status monitoring page, I was steady around ~164 processes running before 10/29. As of right now, I have 250 processes.

                            Also, I do not have zabbix installed, but I do have NRPE (which is what alerted me to the issue).

                            Edited to show SW version.

                            1 Reply Last reply Reply Quote 0
                            • W Offline
                              wblanton
                              last edited by

                              I'm not able to physically access this device, and no one else will be until next week. In the meantime, is there any reason I can't run the following commands to just kill these processes?

                              pkill -f "sh /etc/rc.update_pkg_metadata"
                              pkill -f "sh /usr/local/sbin/pfSense-led.sh update 1"
                              pkill gpioctl
                              
                              1 Reply Last reply Reply Quote 0
                              • stephenw10S Offline
                                stephenw10 Netgate Administrator
                                last edited by

                                Yeah, you should be able to do that. Though those processes are not actually doing anything.

                                Power cycling the appliance when you can is what you should be looking to do.

                                Steve

                                W 1 Reply Last reply Reply Quote 0
                                • W Offline
                                  wblanton @stephenw10
                                  last edited by

                                  @stephenw10 Thanks. We will definitely be trying to do that next week when someone is one site. Right now, I'm just trying to at least get Nagios happy. lol

                                  I know that @jimp said that this only happens on rare occasions, but I'm really curious about how often this happens without the users ever noticing. I'm sure most users aren't monitoring things like process counts, so this may really be happening more than one would think.

                                  In any event, please let me know if there are any logs or anything like that I can get over to y'all for diagnostics.

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S Offline
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Thanks. If we think of anything that might be available to check I'll ask.
                                    I've managed to hit it a few times myself and never found anything unusual beyond the gpio driver failing to return. I agree, I think I found it only because I was looking for it.

                                    Steve

                                    1 Reply Last reply Reply Quote 1
                                    • J Offline
                                      jonybat
                                      last edited by

                                      Just as an update for whoever might run into this, we haven't experienced it since the "available version" zabbix items have been disabled and rebooting the gateways afterwards. That has been over 2 months ago.

                                      You can check more info in my previous comment.

                                      There is still the open question on why do the processes get stuck during this check, but since this isn't that important, I'm going to leave it like this.

                                      1 Reply Last reply Reply Quote 1
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.