Number of running processes increasing
-
Are you running the 3rd party led_gateways script? That does seem to be inducing issues in 21.05 for some reasons. The author just updated it though.
Steve
-
No, like i said before, these are fresh gateways, with only a couple of months, and the only package that we installed was zabbix-agent 5.2.
-
Just came to mind that this could be triggered by the zabbix monitoring scripts. We are using https://github.com/rbicelli/pfsense-zabbix-template
The default monitoring template calls the functions
get_system_pkg_version()
,get_system_pkg_version()['version']
andget_system_pkg_version()['installed_version']
frompkg-utils.inc
once per day, which i think that could explain why is the update_pkg_metadata script being executed.What it doesn't explain is why it gets stuck...
Disabled those monitoring items in the meanwhile.
-
@jonybat Ran across this thread, which described my own situation very closely and thought I'd chime in with perhaps another datapoint.
This is a single, non-HA SG-3100 21.02.2-RELEASE with 181 days continuous uptime since last reboot. Three Packages: Cron, zabbix-agent52, zabbix-proxy52
Like you, I'm seeing a steady increase in processes. Mine started on ~8/16/2021 and has steadily increased for the past ~70 days up to ~400 processes as of today. Prior to 8/16/2021, the number of processes was consistent at ~100 for the entire monitored history (just over 1 year).
In my case:
ps axl | wc -l 406
ps axl | grep "/bin/sh /etc/rc.update_pkg_metadata" | wc -l 144
...all in wait state, like yours
ps axl | grep "/usr/sbin/gpioctl -f /dev/gpioc2 3 duty 150" | wc -l 70
...all in iircreq state, like yours
ps axl | grep "/bin/sh /usr/local/sbin/pfSense-led.sh update 1" | wc -l 72
...all in wait state, like yours
I have not yet rebooted or upgraded but those were going to be my initial actions after a quick bit of research on the issue (which led me here). Your comments don't give me a lot of hope that either will actually resolve the issue, though.
No particularly useful information from me so far other than to simply indicate that you don't appear to be a unique case.
-
Hmm, what changed on that date to cause them to start rising? Something locally to you?
Steve
-
@stephenw10 Nothing at all that I can determine. The last intended change would have been around the last reboot, which is confirmed by the Config History. That occurred ~4 months before Zabbix shows the processes started increasing. The firewall would have been left alone to fend for itself for ~4 months prior and a 1.5-2 months after the number of processes suddenly started increasing.
Based on my calculations, the OP's problem started ~June 15, 2021. My seemingly identical problem started ~August 16, 2021. I had originally thought that perhaps the process that checks for updates triggered the problem perhaps at the discovery of an update. It would be an external influence on an otherwise "static" firewall (config-wise). But I don't think the release dates match up exactly with when the problem started. Close...a couple of weeks, maybe, but not exact, I don't think. Just a thought that didn't seem to pan out...
-
Mmm, that seems likely but there was nothing released on either of those dates that might have caused it. Unless the system was tracking dev snapshots.
If you reboot do the processes immediately start stacking up again?
I assume if you disable Zabbix it stops?
Steve
-
Coming here to mention that this is currently occurring on my SG-3100 running 21.05.1. It started on 10/29/2021 on the 00:01 cron event to run "rc.update_pkg_metadata". That event is still showing in my processes showing with a state of "IN". Later that day, I show the same three processes listed above that all started at the same time and all are showing with state of "IN".
This patterns has repeated every day since, where is this an "update_pkg_metadata" at 00:01, and then the set of three processes at some point randomly that day. According to the status monitoring page, I was steady around ~164 processes running before 10/29. As of right now, I have 250 processes.
Also, I do not have zabbix installed, but I do have NRPE (which is what alerted me to the issue).
Edited to show SW version.
-
I'm not able to physically access this device, and no one else will be until next week. In the meantime, is there any reason I can't run the following commands to just kill these processes?
pkill -f "sh /etc/rc.update_pkg_metadata" pkill -f "sh /usr/local/sbin/pfSense-led.sh update 1" pkill gpioctl
-
Yeah, you should be able to do that. Though those processes are not actually doing anything.
Power cycling the appliance when you can is what you should be looking to do.
Steve
-
@stephenw10 Thanks. We will definitely be trying to do that next week when someone is one site. Right now, I'm just trying to at least get Nagios happy. lol
I know that @jimp said that this only happens on rare occasions, but I'm really curious about how often this happens without the users ever noticing. I'm sure most users aren't monitoring things like process counts, so this may really be happening more than one would think.
In any event, please let me know if there are any logs or anything like that I can get over to y'all for diagnostics.
-
Thanks. If we think of anything that might be available to check I'll ask.
I've managed to hit it a few times myself and never found anything unusual beyond the gpio driver failing to return. I agree, I think I found it only because I was looking for it.Steve
-
Just as an update for whoever might run into this, we haven't experienced it since the "available version" zabbix items have been disabled and rebooting the gateways afterwards. That has been over 2 months ago.
You can check more info in my previous comment.
There is still the open question on why do the processes get stuck during this check, but since this isn't that important, I'm going to leave it like this.