pfSense locks up when PPPOE connection is lost. No Logs, No crashdump
-
Hi,
Over the last several weeks, I have had issues where my pfSense firewall would lock up randomly. No crash dump, no errors displayed on the screen when connected to a monitor. Whilst reviewing the logs, I only notice that the PPPOE connection is lost and attempts to reconnect the PPPOE session. Looking at the PPP logs, it is most likely due to an IP Address change.
The Internet is FTTP (UK-based) using PPPOE to connect, with an ethernet cable from the ONT to the pfSense Firewall. The lights on the ONT for the ethernet interface were solid green when pfsense crashed (it should be flashing to show link activity), indicating that when pfsense crashes, no link is established between pfsense and the ONT. I lost access to the entire network. There is no SSH, routing, or DNS. I have another wireguard interface as well for VPN.
pfSense version 2.7.2 - All recommended patches applied, and all packages up to date.
Specs of firewall:
HP T730
32GB SSD
8GB RAM
Intel I350-T2 (igb)What I have done thus far:
- Put an unmanaged switch between the ONT and pfSense
- Followed the pfSense Guide on Hardware Troubleshooting and Tuning
- Set a restart interval in the PPPOE interface.
- Disabled gateway actions and have now disabled gateway monitoring
- SMART test on SSD. Memtest86 on RAM for 2+ hours
- Tried different ethernet cables
- Replaced I350-T2 with another I350-T2, which is genuine (has the Yottamark sticker and "Delta" is embossed into the ethernet chip)
- Disabled flow control via system tunables
- No crash dump in /var/cash
- Fresh install with the config file restored.
Packages installed:
Acme - management of SSL cert for pfsense GUI (LetsEncrypt)
Avahi - mDNS and mDNS across VLANS
Cron - Cron Job viewing and managing.
iperf - testing network throughput, loss, and jitter.
pfBlockerNG-devel - DNS and IP blocking (ads etc)
System Patches
WireguardI am desperate and even thinking of forking out some cash to get Pfsense Plus to test the if_pppoe backend.
Edit: pfSense 2.8 beta is now available, so I may decide to test the beta as it has the if_pppoe backend.
-
If it completely stops responding, even at the console dircetly, then it's probably a hardware issue.
Does the keyboard caplock/numlock LED stop working?
Does it still respond to
ctl+t
? That can respond when nothing else does. -
@stephenw10 I haven't tested that, to be honest, but I will test it the next time it happens. If the console is indeed responsive, would you recommend any steps to investigate? Again, it only occurs once every couple of days, at random times with variences in network activity (throughput etc) so when it does happen I want to maximise my chances of getting useful info (other than any dumps on screen)
The T730 thin client is not exactly an out-and-out firewall device, nor was it ever meant to be. So I am looking to replace it soon.
-
If the console is still responsive I'd run
top -HaSP
at the command line and see if some process is using all the CPU cycles.I'd also check
ps -auxwwd
for the ppp processes to see if that's stuck. -
It has been four days. I looked at the logs more closely, and the UFS filesystem is uncleaned. Did a full reboot with filecheck, and has now been up for more than 5 days now, whereas before it would crash within a day or two.
Will keep an eye out and follow your steps if the issue recurs.