pfSense nic freeze



  • I have pfSense running on a fitlet2 mini pc (Celeron J3455, 4GB mem, 64GB ssd, dual Intel NIC). Usual packages installed: pfBlockerNG, Snort and ntopng. Works fine, but in the last 2 weeks the device froze 3 times. In every case the power was still up, but both nic's (LAN and WAN) were frozen. The status lights were off and no traffic was passing. The GUI was not reachable. The only option to recover was to reboot the device. There is nothing in the systemlogs to go on. I searched the internet but can't find a similar case so far.
    Any idea what might be going on here?



  • @microkid

    Hi,

    The intel i211 looks good, I mean it's perfect for pfSense
    Filet2 also looks good....

    pfSense with the igb (4) intel driver is very stable, so not the NIC freeze

    there is something more serious in the background that you can find out from the shell after you "freeze"

    you don't see anything from the GUI in the logs, because it doesn't work either "freeze"
    the shell is usually yes



  • @DaddyGo said in pfSense nic freeze:

    there is something more serious in the background that you can find out from the shell after you "freeze"
    you don't see anything from the GUI in the logs, because it doesn't work either "freeze"
    the shell is usually yes

    What do you mean with that?



  • @microkid

    I mean, because the hardware is good enough, at least for basic SOHO use...

    I wouldn't think the NIC would freeze (you wrote this in the header), it is thus a rather unprofessional wording 😉
    it is certain that there is no network traffic if LEDs do not show (although this is not entirely true either)

    some process - kernel, application, hardware driver can cause an error that you can't see from the GUI because it is the result of a secondary process

    the shell, in this respect, is the guideline, because here through the console, you can see things that indicate the status even if the GUI does not start

    so the next time you freeze, connect to the console and look for a crash report (crash dump)

    30cf12d2-986b-4b4d-8a0c-d75674b8cc9d-image.png

    b8759683-4821-4fb7-9a45-db2ab320a4eb-image.png



  • When it froze, I tried to connect to the shell and GUI but was not able to. There was also no crash dump mentioned in the GUI after the restart. The only thing I can try next time, is hook up a keyboard and display and see if I get into the thing.



  • @microkid

    it’s a good idea, but in the meantime, pay attention to that as well:

    https://docs.netgate.com/pfsense/en/latest/development/obtaining-panic-information-for-developers.html

    2a18c285-f3b4-4abd-9786-b326d001ec12-image.png

    SSH into the box and look for the information, for example with WinSCP, Putty, etc



  • checked, but there is no crash dump in /var/crash. So it really seems the nic's just froze, as there was no way to communicate with pfSense anymore in any way.



  • @microkid

    if there was a problem with the NIC, you would also see it from the shell
    the NIC is not a separate animal in your pfSense box that lives a separate life 😉

    so observation remains when the collapse occurs.... (via shell)



  • Disable all those package's and try it for a period of time.

    Probably using all the memory when Snort or pfBlocker update's.



  • Memory never comes above 50% for the last 2 weeks.



  • @microkid

    believe me it's just a guess from now on
    one thing you can do is throw up a Linux on it and pressure it a hard hardware stress test

    -or you wait until the next collapse



  • @microkid

    +++++
    by the way @Impatient is trying to tell you that updating the described packages has high memory usage ...
    well, you don't even see these when they happen and the system crashes (randomly)



  • Just ran a stress test for 10 minutes.
    stress-ng --metrics --cpu 4 --vm 4 --vm-bytes 2G --io 2

    CPU 100%, mem 99%. Rock stable, not a glitch. Still being able to download at 250Mbps.



  • @microkid

    yes these mistakes are the worst
    like looking for a needle in a haystack

    I would otherwise sharpen the test to network transmission....

    since the CPU and MEM test is good in the short term, but think about it if it’s a longer term thing
    for example, a thermal heat run error and only occurs if the temperature on one of the active elements on the MOBO is higher for a long time



  • @DaddyGo Good point. But the load on the device is usually very low. It's just an home router/firewall. CPU is normally <5% and memory around 40. With my usage, I don't expect high temperatures very soon :)



  • @microkid

    what you are writing about is a relative state of rest
    routing does not require power machines 😉

    I understand all this and and I know the pfSense resource needs..
    I have seen huge pfBlockerNG lists loading with 98% CPU and 4GB RAM usage (on APU 4d4 board)
    it was of course an overloaded pfSense box and an unreasonably large list

    this can happen when you are not supervising the box and not seeing it

    but eventually kills the OP system
    it randomly occurs, as your problem



  • I understand. However, pfBlockerNG is updating every hour, as are more things. The monitor log does show a little increase on memory usage at that moment, nothing else. Will keep an eye on it.



  • @microkid

    hard, hard...

    it really remains to seize the moment and and you can do any investigation when the incident happens.
    for now - because nowhere to find anything (crash report, suspicious logs) you can only wait.

    BTW: I don't think that pfSense problem is this...



  • @microkid

    one more idea:
    what about the latest BIOS?
    since the ACPI FW code can cause such a stupid situation



  • The device is on the lastest bios from 2018. No newer bios was released since then.



  • @microkid

    this can be a problem (looks like old BIOS)
    what do you see after that(?):

    dmesg | grep 'error'
    dmesg | grep 'Firmware*'



  • dmesg|grep 'error'
    module_register_init: MOD_LOAD (vesa, 0xffffffff812d9960, 0) error 19
    WARNING: /: mount pending error: blocks 48 files 2

    dmesg | grep 'Firmware*'
    <no output>



  • @microkid

    áááhhhááá,

    Are you on UFS?
    why not ZFS?

    this can be due to a lot of violent downtime (crash)

    the VESA error is fine (it's strong, hihihi), who also has a VGA controller on the MOBO
    but it is "WARNING: /: mount pending error: blocks 48 files 2"

    https://forums.freebsd.org/threads/mount-pending-error.67573/

    read it and then:

    https://docs.netgate.com/pfsense/en/latest/hardware/troubleshooting-disk-check-errors-fsck.html
    Youtube Video



  • @DaddyGo said in pfSense nic freeze:

    WARNING: /: mount pending error: blocks 48 files 2"

    AFAIK, his is expected, as I had to pull the power to reboot the device. So the disk was not clean shutdown.



  • @microkid

    yes that's okay, I also wrote this too, but
    fix the file system before you scan further the box

    everything must be ruled out when searching for such an error..

    • poor disk fragmentation, a typical cause of random crashes

    I know you think of the NIC because the LEDs don't flash but like I said it could be part of a process


Log in to reply