PfSense hangs randomly with watchdog timeout messages
-
I have tested pfSense BETA2 I think end of last year sometime, and though there were some issues with transparent proxies and squid, in general the firewall worked fine. I reverted back to pfSense 1.2.3 back then as that was more stable.
Recently (4th of April) I tried pfSense BETA2 again. I used the snapshots from 4th and 5th to test. First I performed an in place upgrade - which worked in general, however transparent proxy still did not work and more importantly, the OS was now crashing my firewall at random. It could go 1 hour to 8 hours, but in that timespan I start seeing "re0: Watchdog timeout" messages on my console (with almost no load on the system). Existing connections still work, but no new connections can be made. The console on the firewall hangs (I cannot ssh in to it and on the console itself I cannot type anything). The web configurator is dead as well. Like I said existing connections keep on working. Only recourse is then to hard reset it, and if I do this twice the filesystem gets messed up so badly it looses all my configuration and refuses to work - I have to reinstall again.
To ensure it is indeed something in pfSense BETA2, I also tried a clean install of pfSense. I did not restore anything, just reconfigured by hand. After about 7 hours the machine died just as before. I rebooted, tried again - and after about 5 hours it dies once more.
So I reverted back to pfSense 1.2.3 and it has been stable as a rock. In my gut I think something in FreeBSD 8 is breaking it, however I cannot be certain. I see no kernel panics so I cannot post anything other than the details of my system:
re0: <realtek 10="" 8101e="" 8102e="" 8102el="" pcie="" 100basetx="">port 0xd800-0xd8ff mem 0xfeaff000-0xfeafffff irq 19 at device 0.0 on pci2 vr0: <via 10="" vt6105="" rhine="" iii="" 100basetx="">port 0xe800-0xe8ff mem 0xfebffc00-0xfebffcff irq 18 at device 1.0 on pci3 vr1: <via 10="" vt6105="" rhine="" iii="" 100basetx="">port 0xe400-0xe4ff mem 0xfebff800-0xfebff8ff irq 19 at device 2.0 on pci3</via></via></realtek>
I used the uniprocessor kernel.
Any others having the same problems? Any suggestions?
-
I haven't seen or heard of anything like this, but stranger things have happened.
Did you try the SMP kernel? Most people, even with single CPUs, are better off on the SMP kernel these days.
-
I haven't seen or heard of anything like this, but stranger things have happened.
Did you try the SMP kernel? Most people, even with single CPUs, are better off on the SMP kernel these days.
Nope I have not. I will do that, and if that fails I will try to disable the re0 network adapter in the BIOS. Will post back soon.
-
I;ve seen some lockups with 2.x versions and the common factor seems to be VIA Rhine based machines. On my VIA NAB7500 it locks when I access the Webconfig and on another machine also when there are RHINE adapters in. Leme laod freebsd 8 on the NAB7500 and see if its related to 8.0
-
Well I've got some bad news. I upgraded to BETA2 again, this time choosing the SMP kernel. After about 9 hours it died the same way - lots of re0: Watchdog timeout errors on the console and nothing working - keyboard on console unresponsive, existing connections still fine but cannot open up any new TCP connections to the firewall:
# uname -a FreeBSD gw.pwnconsulting.za.net 8.0-STABLE FreeBSD 8.0-STABLE #1: Mon Apr 5 12:40:40 EDT 2010 ermal@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8 i386 #
-
For my second test I disabled the onboard re0 adapter (Realtek). So far it has been about 12 hours and the firewall is still working - bu far the longest I have had pfSense BETA2 running on this hardware. Will post in a day or two again. But at this time my gut tells me the Realtek on FreeBSD 8 is not happy.
-
It has been more than 35 hours that my pfSense box is working - so I have to conclude that there is a bug in FreeBSD 8 that was not present in FreeBSD 7 with my Realtek card. Please post here if anyone has similar experiences. This Realtek is an onboard NIC controller
-
It has been more than 35 hours that my pfSense box is working - so I have to conclude that there is a bug in FreeBSD 8 that was not present in FreeBSD 7 with my Realtek card. Please post here if anyone has similar experiences. This Realtek is an onboard NIC controller
You may want to either alter the title of this thread or start a new thread with a more specific one so people can find it easier. If you do start a new thread, it might help to put a link to that thread at the end of this one.
-
It has been more than 35 hours that my pfSense box is working - so I have to conclude that there is a bug in FreeBSD 8 that was not present in FreeBSD 7 with my Realtek card. Please post here if anyone has similar experiences. This Realtek is an onboard NIC controller
You should post to stable@freebsd.org with the information, so the people who can fix it will see it. Include what you've described here, and that it works in 7.2, but not RELENG_8. Include the full dmesg from 2.0, and the output of 'pciconf -lv'. They may need additional info but that's a good starting point.
-
Thanks. I posted to the FreeBSD project all the details you mentioned. Hope they can get to the root of this issue.
-
It has been more than 35 hours that my pfSense box is working - so I have to conclude that there is a bug in FreeBSD 8 that was not present in FreeBSD 7 with my Realtek card. Please post here if anyone has similar experiences. This Realtek is an onboard NIC controller
I don't think it is RealTek nic that causes the problem. I have had these hangs with multiple different hardware. I don't think any of these have had re0 nic's.
Otherwise my experiences unfortunately are alike your's. Crashing occurs randomly, about every eight hours.
Is there a version somewhere that really works (for pfSense 2.0)?
-
I don't think it is RealTek nic that causes the problem.
It is in his case. If you're getting a bunch of watchdog timeout messages with Realtek NICs that's probably the same. If you aren't getting watchdog timeouts that's something different. You'll need to start a new thread with a backtrace assuming you're getting a kernel panic.