24.03 causes sustained rise in processes count and memory usage.
-
@kprovost said in 24.03 causes sustained rise in processes count and memory usage.:
The only thing I don't quite understand is why I can't reproduce it on my 2100. It should be happening there too.
That seems to suggest it might also be related to what data my Zabbix is requesting or?
I collect quite a few data points as the template for zabbix is fairly comprehensive (in as so much as you can be comprehensive with only SNMP compared to a Zabbix Agent)
-
@keyser I've created https://redmine.pfsense.org/issues/15481 to track this.
It may be related to the data you request, although I'd expect snmpwalk to trigger it as well, and I've not managed to reproduce it that way.
Sadly there won't be a patch, because the problem is in a system library, not in a port or in php, which are easier to update.
-
Nice!
-
@keyser bsnmpd. Didn't see that one coming.
FWIW, I do a lot of SNMP with pfSense, but I use net-snmp. Have you tried net-snmp?
-
@dennypage No, I kinda don't need it, so as part of KISS I decided not to install that package. I probably could do it now, and it might be a workaround, but I'd really rather just stick to the build-in SNMP daemon.
-
@kprovost said in 24.03 causes sustained rise in processes count and memory usage.:
It may be related to the data you request, although I'd expect snmpwalk to trigger it as well, and I've not managed to reproduce it that way.
Sadly there won't be a patch, because the problem is in a system library, not in a port or in php, which are easier to update.
There seems to be a relationship between what data I request and the pace at which processes are stranded. I tried dialing down the number of requested datapoints and now the growth rate of stranded processes is much slower.
So it seems my zabbix template requests data that triggers this issue whereas your librenms does not.
Since it is a bug in a library that needs fixing regardless, I guess it makes no further sense to track down what data requests actually triggers the problem?Just out of curiosity - did this error sneak in with the update to FreeBSD 15 as base? My zabbix/snmp setup didn't change, so the error wasn't in 23.09.1's bsnmpd and dependant library modules.
-
@keyser net-snmp doesn't support pf data, which of course means that you could not trigger the bug.
FWIW, even though net-snmp doesn't handle pf data it does have some advantages. In particular SNMPv3 and extensions. Extensions allow other things like dpinger, ntp, nut, unbound and temperatures to be monitored via SNMP.
-
@dennypage Yeah I know there are more options with net-snmp, but all I'm really looking for is availability, performance and performance trends over time. That is pretty well covered with bsnmpd.
I do latency statistics, link availability and ups/nut monitoring from a raspberry pi2 zabbix client behind pfsense. I need that pi for other purposes as well, so this gives me a "clients" perspective instead of 1. hand data out of the firewall.
It also saves me the need to have netsnmp or a zabbix client installed on pfsense. -
@keyser said in 24.03 causes sustained rise in processes count and memory usage.:
Since it is a bug in a library that needs fixing regardless, I guess it makes no further sense to track down what data requests actually triggers the problem?
Yeah, I'm not going to dig further. The fix has been merged to the relevant branches and will be in the next release.
Just out of curiosity - did this error sneak in with the update to FreeBSD 15 as base? My zabbix/snmp setup didn't change, so the error wasn't in 23.09.1's bsnmpd and dependant library modules.
We've been on FreeBSD main since 23.01. Talking about 14 vs. 15 is not meaningful.
The bug was introduced as part of ongoing maintenance work. I'm in the process of migrating the configuration interface over to netlink, and that work produced some fallout. -
Hmm, on my SG-1100 this is actually a rather critical problem, it only has 1GB of RAM and runs out within 2-4 days... Every time bsnmpd gets killed:
pid 84330 (bsnmpd), jid 0, uid 0, was killed: failed to reclaim memoryWhich puts a stop to my monitoring untill I start it again, which kind of breaks the purpose.
My SG-4860 still eats it up like a champ lol.
Gotta live with this one I think untill the next release.
-
@randommen Yeah, I agree. It’s pretty hard on my SG-2100 that will get to the chokepoint in about 14 days. So I’m starting to contemplate a scripted service restart on a weekly basis to prevent this from happening.
So I’m hoping there will be a 24.03.1 release which includes the fix for this issue along with other fixes/patches for discovered issues.
-
Mmm, the fix looks like it would need a new build. Not possible to use a patch or pkg.
A cronjob to restart it should be easy enough though.
-
@stephenw10 Is it possible to download/install the fixed binary manually, or are there dependencies that makes it a very difficult undertaking?
-
It's not just the bsnmpd binary. There are also changes to pfctl. I'm not sure if that would be possible.
-
@randommen 26 days - then my SG-2100 goes belly up and becomes borderline unresponsive because memory consumption kills it.
-
Restarting bsnmpd before that doesn't free it?
-
@stephenw10 It does, so it’s only a annoying inconvinence. I was just trying to gauge how lt would react if I didn’t, and how long it would take.
-