pfSense locks up when PPPOE connection is lost. No Logs, No crashdump
-
Hi,
Over the last several weeks, I have had issues where my pfSense firewall would lock up randomly. No crash dump, no errors displayed on the screen when connected to a monitor. Whilst reviewing the logs, I only notice that the PPPOE connection is lost and attempts to reconnect the PPPOE session. Looking at the PPP logs, it is most likely due to an IP Address change.
The Internet is FTTP (UK-based) using PPPOE to connect, with an ethernet cable from the ONT to the pfSense Firewall. The lights on the ONT for the ethernet interface were solid green when pfsense crashed (it should be flashing to show link activity), indicating that when pfsense crashes, no link is established between pfsense and the ONT. I lost access to the entire network. There is no SSH, routing, or DNS. I have another wireguard interface as well for VPN.
pfSense version 2.7.2 - All recommended patches applied, and all packages up to date.
Specs of firewall:
HP T730
32GB SSD
8GB RAM
Intel I350-T2 (igb)What I have done thus far:
- Put an unmanaged switch between the ONT and pfSense
- Followed the pfSense Guide on Hardware Troubleshooting and Tuning
- Set a restart interval in the PPPOE interface.
- Disabled gateway actions and have now disabled gateway monitoring
- SMART test on SSD. Memtest86 on RAM for 2+ hours
- Tried different ethernet cables
- Replaced I350-T2 with another I350-T2, which is genuine (has the Yottamark sticker and "Delta" is embossed into the ethernet chip)
- Disabled flow control via system tunables
- No crash dump in /var/cash
- Fresh install with the config file restored.
Packages installed:
Acme - management of SSL cert for pfsense GUI (LetsEncrypt)
Avahi - mDNS and mDNS across VLANS
Cron - Cron Job viewing and managing.
iperf - testing network throughput, loss, and jitter.
pfBlockerNG-devel - DNS and IP blocking (ads etc)
System Patches
WireguardI am desperate and even thinking of forking out some cash to get Pfsense Plus to test the if_pppoe backend.
Edit: pfSense 2.8 beta is now available, so I may decide to test the beta as it has the if_pppoe backend.
-
If it completely stops responding, even at the console dircetly, then it's probably a hardware issue.
Does the keyboard caplock/numlock LED stop working?
Does it still respond to
ctl+t
? That can respond when nothing else does. -
@stephenw10 I haven't tested that, to be honest, but I will test it the next time it happens. If the console is indeed responsive, would you recommend any steps to investigate? Again, it only occurs once every couple of days, at random times with variences in network activity (throughput etc) so when it does happen I want to maximise my chances of getting useful info (other than any dumps on screen)
The T730 thin client is not exactly an out-and-out firewall device, nor was it ever meant to be. So I am looking to replace it soon.
-
If the console is still responsive I'd run
top -HaSP
at the command line and see if some process is using all the CPU cycles.I'd also check
ps -auxwwd
for the ppp processes to see if that's stuck. -
It has been four days. I looked at the logs more closely, and the UFS filesystem is uncleaned. Did a full reboot with filecheck, and has now been up for more than 5 days now, whereas before it would crash within a day or two.
Will keep an eye out and follow your steps if the issue recurs.
-
Hi,
It has happened again, 13 days of solid performance, and the same issue happened again. I have no serial, so I connected the monitor, but no output. I then connected the keyboard, and the numlock, caps lock, and scroll lock keys flash up when triggered.
Looking at the logs at the time of the incident, it logs when I have connected the keyboard. Logs available here
Apr 23 19:27:29 kernel uhid0: <USB Keyboard> on usbus0 Apr 23 19:27:29 kernel uhid0 on uhub1 Apr 23 19:27:29 kernel kbd2 at ukbd0 Apr 23 19:27:29 kernel ukbd0: <USB Keyboard> on usbus0 Apr 23 19:27:29 kernel ukbd0 on uhub1 Apr 23 19:27:29 kernel ugen0.2: <Logitech USB Keyboard> at usbus0
The light on the back of the i350-T2 card is just a solid orange light. No activity between the ONT and the PFSENSE firewall.
The i350-t2 in question is genuine (has the Yottamark sticker and "Delta" is embossed into the ethernet chip), so it can't be that (unless it is a very good fake)My gut is telling me it is the SSD. Can someone make sense of the SMART Data below linked here?
-
Hmm, nothing much is shown there. The PPPoE connection stops responding to LCP echos:
Apr 23 19:09:16 ppp 58143 [wan_link0] LCP: peer not responding to echo requests Apr 23 19:09:16 ppp 58143 [wan_link0] LCP: no reply to 5 echo request(s) Apr 23 19:09:06 ppp 58143 [wan_link0] LCP: no reply to 4 echo request(s) Apr 23 19:08:56 ppp 58143 [wan_link0] LCP: no reply to 3 echo request(s) Apr 23 19:08:46 ppp 58143 [wan_link0] LCP: no reply to 2 echo request(s) Apr 23 19:08:36 ppp 58143 [wan_link0] LCP: no reply to 1 echo request(s)
Then pfSense tries to reconnect and keeps trying until it starts booting again.
It doesn't show a reboot request though, how did you reboot it?
It doesn't look like a drive issue. It's still logging to it and no errors are shown.
I would remove or disable any hardware components you're not using there so:
Apr 23 19:31:09 kernel hdaa0: <ATI R6xx Audio Function Group> at nid 1 on hdacc0 Apr 23 19:31:09 kernel hdacc0: <ATI R6xx HDA CODEC> at cad 0 on hdac0 Apr 23 19:31:09 kernel iwm0: <Intel(R) Dual Band Wireless AC 3160> mem 0xfea00000-0xfea01fff irq 40 at device 0.0 on pci2 Apr 23 19:31:09 kernel hdac0: <ATI (0x1308) HDA Controller> mem 0xfeb60000-0xfeb63fff irq 27 at device 1.1 on pci0
Unless you're actually using that wifi card?
Is that device using HDMI only for the monitor? Those sometime only show output if they are connected at boot. If you connect a monitor during normal operaton do you see any output?
-
@stephenw10 I had to hold down the power button to turn it off. Then turn it on again
I will remove the wireless card.
The HP T730 only has DisplayPort connectors. When I connect DisplayPort during normal operation, I get an output. but during yesterday's crash, nothing.
Im looking into getting some better kit to run pfsense on.