Every couple of weeks pfSense completly stops responding?
-
Hello,
This happened about 2 weeks ago where in the morning I did not had internet on any of the devices on my network and initially I was thinking it was my modem, rebooted it but no joy. Tried accessing pfSense GUI but got no response. As soon as I rebooted pfSense (pressing power button once) I got internet back and everything was back to normal.
Today evening out of nowhere same issue. Obviously only rebooting pfSense fixed it and I was able to get online and access GUI.
How can I troubleshoot and figure out what is causing this?
Logs only show what happened after the reboot but nothing before.TIA!
-
Hi,
If the problems wasn't a software issue, chances are good that you'll be having "dirty disk" issues very sson : hitting the reset or power button isn't a clean shutdown.
Do an fsck after reboot every time you have to pull the plug.Some thoughts :
Use an UPS.
Set up an external log collector.
Open at least a SSH connection - or better : console access and leave it open. See what it captures.
Use a tool like Munin (many other exists) so you can follow memory usage, disk usage, processor usage file descriptors, etc etc on a close really time manor. Example.
During testing : disable/remove all packages : just keep a clean pfSense.Keep in mind that the software part (pfSense) can run for month if not years (if you don't upgrade, which is a bad choice). Mine runs on an old (10 years ?) former, stripped down Dell desktop PC. Classic hard disk. Never saw a crash ...
-
Thanks for the reply!
pfSense is already on UPS and I just setup syslog to my other Ubuntu server but I'm not sure what from these log would tell me the cause of no response?root@ts-ubuntu:~# ls -la /var/log/pfsense.mydomain.net/ total 568 drwxr-xr-x 2 syslog syslog 4096 Feb 16 13:57 . drwxrwxr-x 20 root syslog 4096 Feb 16 13:48 .. -rw-r----- 1 syslog adm 2724 Feb 16 14:13 dhcpd.log -rw-r----- 1 syslog adm 440 Feb 16 14:13 dhcpleases.log -rw-r----- 1 syslog adm 46517 Feb 16 14:14 filterlog.log -rw-r----- 1 syslog adm 917 Feb 16 14:10 .log -rw-r----- 1 syslog adm 496753 Feb 16 14:14 nginx.log -rw-r----- 1 syslog adm 261 Feb 16 13:48 php-fpm.log root@ts-ubuntu:~#
Can you please explain this "Open at least a SSH connection - or better : console access and leave it open. See what it captures." I'm in the shell in ubuntu but I'm not seeing changes as they happen?
-
Are you using RealTek NICs?
-
When it stops responding on the GUI is it completely unresponsive? I.e. cannot ping it, cannot SSH into, the direct console is dead?
What hardware is this? What pfSense version?
Steve
-
dmesg shows "<RTL8251 1000BASE-T media interface>" so it is RealTek and its been working without this issue for few years now in a Shuttle DS437
pfSense 2.4.4-RELEASE-p2 (amd64)
-
@johnnybegood said in Every couple of weeks pfSense completly stops responding?:
dmesg shows "<RTL8251 1000BASE-T media interface>" so it is RealTek and its been working without this issue for few years now in a Shuttle DS437
pfSense 2.4.4-RELEASE-p2 (amd64)
I am incredibly biased against ReakTek NICs. I've had them take down countless pfSense/ESXi boxes over the years. So I tend to stay away from them.
I had to ask because that was usually the RC in my random lockups.
-
Ok, if it would be a regular desktop I would put Intel NICs but this is as it looks AIO without option to swap out anything
-
You might try the alternative Realtek driver:
https://forum.netgate.com/topic/135850/official-realtek-driver-binary-1-95-for-2-4-4-releaseThat has resolved similar issues for some.
Steve
-
You happen to be running arpwatch?
Not saying that is it, but had a problem on my sg4860.. After installing arpwatch.. All of sudden could not access, no gui, no ssh and even console just nothing.. Had to power cycle..
This happened a few times... I uninstalled arpwatch, and never happened again.. So if your running arpwatch - try removing it for a few weeks and see if your problem goes away.
-
@stephenw10 I'm hesitant to mess with the drivers since it was working good for a long time.
-
@johnpoz said in Every couple of weeks pfSense completly stops responding?:
You happen to be running arpwatch?
No, I never used arpwatch
-
So how many times has it done this?
Did it start after you upgraded perhaps? Or made some other change?
Just how dead is it when it stops? Does the console still respond?
I would still try that driver myself. It has helped a lot of people who were seeing issues with Realtek NICs. If you can't swap out the Realtek for a real NIC of course! But with a locked NIC you usually still see traffic on the other NICs and the console remains responsive.
Steve
-
This is 2d time with same issue. No access to GUI, does not responds to ping and every device on the network can't see each other.
Last upgrade was to v.2.4.4 and I did not notice any issues afterwards. This started out of nowhere and I did not made any changes nor did I installed any new packages so I'm confused. Remote log files don't have any useful info http://prntscr.com/mmupuyI do have really old HD in it though. Can that be an issue? Should I upgrade to a small SSD?
If I remember correctly, a while back I read that pfSense once its running it loads everything into the memory and does not use HD? -
@johnnybegood said in Every couple of weeks pfSense completly stops responding?:
every device on the network can't see each other.
That has nothing to do with pfsense.. Your router/gateway has ZERO to do with clients talking to each other.. Your router is how you get off the network, to a different L3.. How a client talks to another client on the same L2 has zero to do with the L3 router.
So unless your box failed in such a fashion that is was flooding the network with so much traffic that prevents others from talking.. You can turn pfsense off, unplug it from the network and box A can still talk to B.. Only at such time that their dhcp lease expires and the dhcp server is not there (pfsense) would it matter, or if they were trying to resolve a name and pfsense is not there to resolve it, etc.
-
@johnpoz said in Every couple of weeks pfSense completly stops responding?:
So unless your box failed in such a fashion that is was flooding the network with so much traffic that prevents others from talking.. You can turn pfsense off, unplug it from the network and box A can still talk to B.. Only at such time that their dhcp lease expires and the dhcp server is not there (pfsense) would it matter, or if they were trying to resolve a name and pfsense is not there to resolve it, etc.
Thanks for the explanation, makes perfect sense.
I remember that pfSense was not responding to pings nor I could get GUI to respond.
-
@johnnybegood said in Every couple of weeks pfSense completly stops responding?:
I remember that pfSense was not responding to pings nor I could get GUI to respond.
Ok that has nothing to do with box A pinging box B on your network - if that is not working then you got something more wrong than just pfsense locking up..
So either it failed in spectacular fashion and is flooding your network with crap which prevents anyone else from talking.. That would be RARE!!! Or yeah it failed hard, if you can not get to console then yeah something major wrong..
-
I've tried pings from box A to pfSense and box B to pfsense. No response.
After I did reboot (power button once) 3 days ago its working normal like before. I think I should get SSD to rule out problem with my old HD. -
And did you ping from A to B? ;)
-
Now I question my self if I did :)