Every couple of weeks pfSense completly stops responding?



  • Hello,

    This happened about 2 weeks ago where in the morning I did not had internet on any of the devices on my network and initially I was thinking it was my modem, rebooted it but no joy. Tried accessing pfSense GUI but got no response. As soon as I rebooted pfSense (pressing power button once) I got internet back and everything was back to normal.
    Today evening out of nowhere same issue. Obviously only rebooting pfSense fixed it and I was able to get online and access GUI.
    How can I troubleshoot and figure out what is causing this?
    Logs only show what happened after the reboot but nothing before.

    TIA!



  • Hi,

    If the problems wasn't a software issue, chances are good that you'll be having "dirty disk" issues very sson : hitting the reset or power button isn't a clean shutdown.
    Do an fsck after reboot every time you have to pull the plug.

    Some thoughts :
    Use an UPS.
    Set up an external log collector.
    Open at least a SSH connection - or better : console access and leave it open. See what it captures.
    Use a tool like Munin (many other exists) so you can follow memory usage, disk usage, processor usage file descriptors, etc etc on a close really time manor. Example.
    During testing : disable/remove all packages : just keep a clean pfSense.

    Keep in mind that the software part (pfSense) can run for month if not years (if you don't upgrade, which is a bad choice). Mine runs on an old (10 years ?) former, stripped down Dell desktop PC. Classic hard disk. Never saw a crash ...



  • Thanks for the reply!
    pfSense is already on UPS and I just setup syslog to my other Ubuntu server but I'm not sure what from these log would tell me the cause of no response?

    root@ts-ubuntu:~# ls -la /var/log/pfsense.mydomain.net/
    total 568
    drwxr-xr-x  2 syslog syslog   4096 Feb 16 13:57 .
    drwxrwxr-x 20 root   syslog   4096 Feb 16 13:48 ..
    -rw-r-----  1 syslog adm      2724 Feb 16 14:13 dhcpd.log
    -rw-r-----  1 syslog adm       440 Feb 16 14:13 dhcpleases.log
    -rw-r-----  1 syslog adm     46517 Feb 16 14:14 filterlog.log
    -rw-r-----  1 syslog adm       917 Feb 16 14:10 .log
    -rw-r-----  1 syslog adm    496753 Feb 16 14:14 nginx.log
    -rw-r-----  1 syslog adm       261 Feb 16 13:48 php-fpm.log
    root@ts-ubuntu:~#
    

    Can you please explain this "Open at least a SSH connection - or better : console access and leave it open. See what it captures." I'm in the shell in ubuntu but I'm not seeing changes as they happen?



  • Are you using RealTek NICs?


  • Netgate Administrator

    When it stops responding on the GUI is it completely unresponsive? I.e. cannot ping it, cannot SSH into, the direct console is dead?

    What hardware is this? What pfSense version?

    Steve



  • dmesg shows "<RTL8251 1000BASE-T media interface>" so it is RealTek and its been working without this issue for few years now in a Shuttle DS437

    pfSense 2.4.4-RELEASE-p2 (amd64)



  • @johnnybegood said in Every couple of weeks pfSense completly stops responding?:

    dmesg shows "<RTL8251 1000BASE-T media interface>" so it is RealTek and its been working without this issue for few years now in a Shuttle DS437

    pfSense 2.4.4-RELEASE-p2 (amd64)

    I am incredibly biased against ReakTek NICs. I've had them take down countless pfSense/ESXi boxes over the years. So I tend to stay away from them.

    I had to ask because that was usually the RC in my random lockups.



  • Ok, if it would be a regular desktop I would put Intel NICs but this is as it looks AIO without option to swap out anything 🙄


  • Netgate Administrator

    You might try the alternative Realtek driver:
    https://forum.netgate.com/topic/135850/official-realtek-driver-binary-1-95-for-2-4-4-release

    That has resolved similar issues for some.

    Steve


  • LAYER 8 Global Moderator

    You happen to be running arpwatch?

    Not saying that is it, but had a problem on my sg4860.. After installing arpwatch.. All of sudden could not access, no gui, no ssh and even console just nothing.. Had to power cycle..

    This happened a few times... I uninstalled arpwatch, and never happened again.. So if your running arpwatch - try removing it for a few weeks and see if your problem goes away.



  • @stephenw10 I'm hesitant to mess with the drivers since it was working good for a long time.



  • @johnpoz said in Every couple of weeks pfSense completly stops responding?:

    You happen to be running arpwatch?

    No, I never used arpwatch ☹


  • Netgate Administrator

    So how many times has it done this?

    Did it start after you upgraded perhaps? Or made some other change?

    Just how dead is it when it stops? Does the console still respond?

    I would still try that driver myself. It has helped a lot of people who were seeing issues with Realtek NICs. If you can't swap out the Realtek for a real NIC of course! But with a locked NIC you usually still see traffic on the other NICs and the console remains responsive.

    Steve



  • This is 2d time with same issue. No access to GUI, does not responds to ping and every device on the network can't see each other.
    Last upgrade was to v.2.4.4 and I did not notice any issues afterwards. This started out of nowhere and I did not made any changes nor did I installed any new packages so I'm confused. Remote log files don't have any useful info http://prntscr.com/mmupuy

    I do have really old HD in it though. Can that be an issue? Should I upgrade to a small SSD?
    If I remember correctly, a while back I read that pfSense once its running it loads everything into the memory and does not use HD?


  • LAYER 8 Global Moderator

    @johnnybegood said in Every couple of weeks pfSense completly stops responding?:

    every device on the network can't see each other.

    That has nothing to do with pfsense.. Your router/gateway has ZERO to do with clients talking to each other.. Your router is how you get off the network, to a different L3.. How a client talks to another client on the same L2 has zero to do with the L3 router.

    So unless your box failed in such a fashion that is was flooding the network with so much traffic that prevents others from talking.. You can turn pfsense off, unplug it from the network and box A can still talk to B.. Only at such time that their dhcp lease expires and the dhcp server is not there (pfsense) would it matter, or if they were trying to resolve a name and pfsense is not there to resolve it, etc.



  • @johnpoz said in Every couple of weeks pfSense completly stops responding?:

    So unless your box failed in such a fashion that is was flooding the network with so much traffic that prevents others from talking.. You can turn pfsense off, unplug it from the network and box A can still talk to B.. Only at such time that their dhcp lease expires and the dhcp server is not there (pfsense) would it matter, or if they were trying to resolve a name and pfsense is not there to resolve it, etc.

    Thanks for the explanation, makes perfect sense.

    I remember that pfSense was not responding to pings nor I could get GUI to respond.


  • LAYER 8 Global Moderator

    @johnnybegood said in Every couple of weeks pfSense completly stops responding?:

    I remember that pfSense was not responding to pings nor I could get GUI to respond.

    Ok that has nothing to do with box A pinging box B on your network - if that is not working then you got something more wrong than just pfsense locking up..

    So either it failed in spectacular fashion and is flooding your network with crap which prevents anyone else from talking.. That would be RARE!!! Or yeah it failed hard, if you can not get to console then yeah something major wrong..



  • I've tried pings from box A to pfSense and box B to pfsense. No response.
    After I did reboot (power button once) 3 days ago its working normal like before. I think I should get SSD to rule out problem with my old HD.


  • LAYER 8 Global Moderator

    And did you ping from A to B? ;)



  • Now I question my self if I did :)


  • Netgate Administrator

    @johnnybegood said in Every couple of weeks pfSense completly stops responding?:

    I remember that pfSense was not responding to pings nor I could get GUI to respond.

    Both those things would happen if the LAN NIC locked up as Realtek NICs sometimes do.

    But that would not stop the console responding and that's an important test. In the console is still responsive then you know you have a NIC issue. It is isn't you probably have some other hardware issue, bad ram overheating etc.

    Steve


  • LAYER 8 Netgate

    Could also be something else coming online with the same IP address as pfSense. But still has nothing to do with hosts on the same L2 communicating with each other.



  • @stephenw10 said in Every couple of weeks pfSense completly stops responding?:

    console

    You guys are referring to being physically at the pfSense connected with keyboard and monitor? I don't have either one connected but I guess when it happens next I can connect monitor and keyboard to see if it responds.


  • Netgate Administrator

    Yes exactly. Or the serial console if your firewall has that. But something out of band of the Realtek NIC.

    Steve



  • Thanks, I will try that next time.



  • So this happened today again and this time I had a chance to connect to the monitor and keyboard. It was not showing anything on the monitor nor did it respond to any of the keyboard strokes.
    Only after pressing power once on Shuttle DS437 it shutdown.
    First I updated to 2.4.4-p3 and then updated to "Official Realtek Driver Binary 1.95 For 2.4.4 Release" as stephenw10 suggested.

    Can driver issue cause console access to the point it does not display any video?



  • Device Driver issues are very capable of taking the system down, or parts of it.
    So, no "more screen" is possible, the system locked up ..



  • Interesting. Keeping fingers crossed ☺


  • Netgate Administrator

    Yup, that could be it. Though that's not one of the symptoms usually seen with Realtek NICs I would not rule it out.

    Steve


Log in to reply