pfSense nic freeze
I have pfSense running on a fitlet2 mini pc (Celeron J3455, 4GB mem, 64GB ssd, dual Intel NIC). Usual packages installed: pfBlockerNG, Snort and ntopng. Works fine, but in the last 2 weeks the device froze 3 times. In every case the power was still up, but both nic's (LAN and WAN) were frozen. The status lights were off and no traffic was passing. The GUI was not reachable. The only option to recover was to reboot the device. There is nothing in the systemlogs to go on. I searched the internet but can't find a similar case so far.
Any idea what might be going on here?
The intel i211 looks good, I mean it's perfect for pfSense
Filet2 also looks good....
pfSense with the igb (4) intel driver is very stable, so not the NIC freeze
there is something more serious in the background that you can find out from the shell after you "freeze"
you don't see anything from the GUI in the logs, because it doesn't work either "freeze"
the shell is usually yes
I mean, because the hardware is good enough, at least for basic SOHO use...
I wouldn't think the NIC would freeze (you wrote this in the header), it is thus a rather unprofessional wording
it is certain that there is no network traffic if LEDs do not show (although this is not entirely true either)
some process - kernel, application, hardware driver can cause an error that you can't see from the GUI because it is the result of a secondary process
the shell, in this respect, is the guideline, because here through the console, you can see things that indicate the status even if the GUI does not start
so the next time you freeze, connect to the console and look for a crash report (crash dump)
When it froze, I tried to connect to the shell and GUI but was not able to. There was also no crash dump mentioned in the GUI after the restart. The only thing I can try next time, is hook up a keyboard and display and see if I get into the thing.
it’s a good idea, but in the meantime, pay attention to that as well:
SSH into the box and look for the information, for example with WinSCP, Putty, etc
checked, but there is no crash dump in /var/crash. So it really seems the nic's just froze, as there was no way to communicate with pfSense anymore in any way.
if there was a problem with the NIC, you would also see it from the shell
the NIC is not a separate animal in your pfSense box that lives a separate life
so observation remains when the collapse occurs.... (via shell)
Disable all those package's and try it for a period of time.
Probably using all the memory when Snort or pfBlocker update's.
Memory never comes above 50% for the last 2 weeks.
believe me it's just a guess from now on
one thing you can do is throw up a Linux on it and pressure it a hard hardware stress test
-or you wait until the next collapse
Just ran a stress test for 10 minutes.
stress-ng --metrics --cpu 4 --vm 4 --vm-bytes 2G --io 2
CPU 100%, mem 99%. Rock stable, not a glitch. Still being able to download at 250Mbps.
yes these mistakes are the worst
like looking for a needle in a haystack
I would otherwise sharpen the test to network transmission....
since the CPU and MEM test is good in the short term, but think about it if it’s a longer term thing
for example, a thermal heat run error and only occurs if the temperature on one of the active elements on the MOBO is higher for a long time
@DaddyGo Good point. But the load on the device is usually very low. It's just an home router/firewall. CPU is normally <5% and memory around 40. With my usage, I don't expect high temperatures very soon :)
what you are writing about is a relative state of rest
routing does not require power machines
I understand all this and and I know the pfSense resource needs..
I have seen huge pfBlockerNG lists loading with 98% CPU and 4GB RAM usage (on APU 4d4 board)
it was of course an overloaded pfSense box and an unreasonably large list
this can happen when you are not supervising the box and not seeing it
but eventually kills the OP system
it randomly occurs, as your problem
I understand. However, pfBlockerNG is updating every hour, as are more things. The monitor log does show a little increase on memory usage at that moment, nothing else. Will keep an eye on it.
it really remains to seize the moment and and you can do any investigation when the incident happens.
for now - because nowhere to find anything (crash report, suspicious logs) you can only wait.
BTW: I don't think that pfSense problem is this...
one more idea:
what about the latest BIOS?
since the ACPI FW code can cause such a stupid situation
The device is on the lastest bios from 2018. No newer bios was released since then.
this can be a problem (looks like old BIOS)
what do you see after that(?):
dmesg | grep 'error'
dmesg | grep 'Firmware*'
module_register_init: MOD_LOAD (vesa, 0xffffffff812d9960, 0) error 19
WARNING: /: mount pending error: blocks 48 files 2
dmesg | grep 'Firmware*'
Are you on UFS?
why not ZFS?
this can be due to a lot of violent downtime (crash)
the VESA error is fine (it's strong, hihihi), who also has a VGA controller on the MOBO
but it is "WARNING: /: mount pending error: blocks 48 files 2"
read it and then:
yes that's okay, I also wrote this too, but
fix the file system before you scan further the box
everything must be ruled out when searching for such an error..
- poor disk fragmentation, a typical cause of random crashes
I know you think of the NIC because the LEDs don't flash but like I said it could be part of a process