SG-3100 stops responding every 2 days on 24.03
-
@stephenw10
I see that since the upgrade to 24.03, the system log if full of these messages every 10-30 seconds:ugen1.2: <CPS ST Series> at usbus1 ugen1.2: <CPS ST Series> at usbus1 (disconnected)
From what I can tell, this is a Cyberpower UPS connected to the USB port. I've found some reports that some USB devices, including some Cyberpower units, will reset if they don't establish a connection within a certain amount of time.
Could this repeated USB connection & disconnection cause a memory leak of some kind?
-
Potentially it could. Try unplugging it and see if the leak stops.
I assume you are running NUT? Do you have the current package version installed?
-
@stephenw10 The UPS is at a remote site, so I can't unplug the UPS.
I was able to disable the USB port using usbconfig and that has stopped the log entries.I tried to setup NUT but it wouldn't connect to the UPS. I don't know if it couldn't connect because the USB connection kept resetting so frequently or if that is unrelated. I tried changing the polling settings to some suggestions I found but that didn't help. NUT is currently uninstalled.
-
Hmm, OK, interesting. Well I guess you'll know in a few hours.
-
@andrew_cb I vaguely recall posts about Zabbix being a problem of some kind, but I don't use it and couldn't find it in a quick search. This may just be a red herring, and if so I apologize, but you could try disabling that for a while and just looking at pfSense's graphs.
-
Could be this if Zabbix is using SNMP: https://redmine.pfsense.org/issues/15481
-
We have Zabbix on 40 other Netgates without issue, including two SG-3100 running 24.03.
I don't think we're doing any SNMP monitoring, just Zabbix Agent (active) and Zabbix Proxy both running on the firewalls.
Memory usage is holding flat (it's actually decreased slightly) so it might be that disabling USB to workaround the UPS issues is the fix?
I will be interesting to see how it looks tomorrow morning.
-
Mmm, interesting indeed. It's not something we'd ever normally see so it could have a leak that's simply never been hit.
-
So disabling the "flapping" USB seems to have resolved the escalating memory usage. I don't know if the non-responsive issue is resolved though, as we replaced the affected unit earlier today with a 4100 because couldn't "see how it goes" and risk any further interruptions and downtime at this customer.
Past 90 days. The memory usage on all 10 of our SG-3100 units was flat and nearly identical.
Past 11 days. The affected unit was upgraded on 05/28 and the USB ports were disabled on 06/04, and the memory usage remained flat afterward.
I should be able to play with the 3100 next week and will try to reproduce the issue on the test bench. Hopefully, that will shed light on what's happening and lead to identifying the root cause.
-
Interesting. How did you disable the USB exactly in that case?
-
I don't recall the exact command but it was something like
usbconfig -i ugen0.2 detach_kernel_driver