Mysterious malfunction of SG-1100 running 21.05.2
-
Since updating my dad’s SG-1100 to 21.05.2 I have seen at least two incidents of this kind of behavior:
- his internet connection was severed,
- I was able to connect to the SG-1100 through OpenVPN from the Internet,
- attempting to log into the Web GUI through OpenVPN never got me past the log-in page, and
- a power-cycle fixed the problem.
In the second instance, with the VPN up, I was able to make an ssh connection to the SG-1100 and initiate the reboot procedure. The reboot appeared to not get beyond
Netgate pfSense Plus will reboot. This may take a few minutes, depending on your hardware. Do you want to proceed? Y/y: Reboot normally R/r: Reroot (Stop processes, remount disks, re-run startup sequence) Enter: Abort Enter an option: Y Netgate pfSense Plus is rebooting now. Stopping package AWS VPC Wizard...done. Stopping package IPsec Profile Wizard...done. Stopping package OpenVPN Client Export Utility...done. Stopping package Avahi...done. Stopping package nmap...done.
Is there any good way to check, what might have caused the malfunction?
I would like to apply any fix, if there is one, as my dad is more or less technically illiterate. Furthermore, I have to provide tech support to him across the Atlantic to Germany (I am in the U.S.).
-
Based only on the above info I'd guess ntopng exhausted some resource, probably RAM, and hung php.
Was nothing logged after it was rebooted? No crash report shown?
What packages are you running? Only those shown?
Check the Status > Monitoring graphs for historical memory usage..
Steve
-
Thanks for the tip, @stephenw10! Sounds like a reasonable possibility to me. If that is what the problem was, there is a memory leak somewhere. While the SG-1100 is not particularly beefy, my dad has a single laptop on his LAN, and the appliance should be more than well-matched for that.
Here is what I see:
15:00 CET is about, when I had my dad pull the power plug and plug it back in. I do not know, for how long his SG-1100 had been misbehaving. It may have started prior to the 24-h interval shown.
Is there a way to look beyond the previous 24 h?
-
Yes, hit the 'wrench' icon and you can set the time period and select the graph type. Check the System > Memory graph.
You can also unselect various data to make the graph more readable by clicking on the dot above the graph. For example the number of processes there is swamping the other data and might be hiding other changes.
Steve
-
@stephenw10: With a one-week window the graph tells a different story:
I have to ask my dad, whether he was aware of the fact that he had no internet connection on all of St. Nicholas Day (December 6).
Regardless, there is no sudden up-tick in processor or memory use. Do you have a different interpretation?
-
Try viewing over a longer period. You can see the free memory is being consumed over time.
You may be hitting this: https://redmine.pfsense.org/issues/12095
Which can be resolved by this: https://redmine.pfsense.org/issues/11933#note-7It may have still been providing internet access during that missing period but just not logging. Or logging the ram that was lost at reboot.
Steve
-
@stephenw10: I may have pcscd running. Excerpt from Diagnostics→System Activity:
last pid: 45177; load averages: 0.39, 0.15, 0.12 up 0+04:31:51 19:22:07 130 threads: 3 running, 108 sleeping, 19 waiting CPU: 0.3% user, 0.2% nice, 1.0% system, 0.2% interrupt, 98.2% idle Mem: 22M Active, 48M Inact, 186M Wired, 74M Buf, 705M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11285 root 20 0 19M 7620K nanslp 1 0:11 0.00% /usr/local/sbin/pcscd{pcscd} 11285 root 20 0 19M 7620K select 0 0:01 0.00% /usr/local/sbin/pcscd{pcscd} 11285 root 52 0 19M 7620K piperd 1 0:00 0.00% /usr/local/sbin/pcscd{pcscd} Netgate pfSense Plus is developed and maintained by Netgate. © ESF 2004 - 2021 View license.
I don’t have an SD card in there. How do I make sure the process doesn’t run?
By the way, here is a 1-month graph:
-
Yeah, it will always be running in 21.05.2. You can apply the linked patch above to stop it running by default. However your free ram never gets close to 0 so that's probably not the cause here.
Steve