Watchdog timeout
-
Hello I have just installed pfSense 1.0-RC3 onto a Dell OptiPlex 240 system here. I am running 4 NICs, 3 are 3com 3C905-TX and the fouth is the Onboard NIC. They all use the XL driver provided by FreeBSD (from reading your site it is 6.1).
I am using them for Load Balancing with Failover (as documented in your Tutorial Section). I keep running into this "watchdog timeout" error ever few hours on the box.
When the errors start occuring the Internet connection begins to slow until all traffic is finally stopped. The three modems I am using (All provided by Comcast) are still up and running (they have multiple ports and when I go into a different one with a laptop it can still get out) so I know it is not a Modem Issue. Also when this occurs I am unable to access the Web Interface. I have to go to the console and reboot using the provided selection (or shell depending on where I am in my troubleshooting when I finally decide to reboot).
I read the posts on this site and they all say to try different NICs and unfortunately I don't have that as an option right now. I have also tried the same setup on an OptiPlex 270 and the same errors occur. Unfortunately the errors occur on random interfaces and aren't subject to just one (so I know it's not one specific card causing my problem).
I Googled around with this error and found that FreeBSD 5 had issues with NICs sharing IRQs. It does seem that the NICs that share IRQs are the ones that have the trouble (IRQs are assigned by the Dell BIOS arbitrarily at bootup). The only solution I found on Google was to turn off PnP. Unfortunately the Dell BIOS for both the 240's and 270's don't have this option, and neither have the option to manually assign IRQs either!
I read about a "check_status_reload" command by Mr. Ullrich, and unfortunately this file is no longer in his directory here at pfSense so I was unable to try that solution. However, I did notice that that solution was tried using a "Snapshot" of pfsense, of which I don't have and can only find 1.0-RC3 so I would assume that the "check_reload_status" that he had loaded is now builtin to the current release.
This is my first time running a BSD system so I am quite unfamiliar with some of the terminology that is used. However, I am an avid Linux user and my preference is Slackware which to my surprise is strikingly similar to FreeBSD so I am not a completely lost soul :).
Any help would be greatly appreciated. Thanks in advance.
-
A side note. I read in another forum www.linuxquestions.org that perhaps I should disable ACPI. The post again references FreeBSD 5 (this time 5.2.1). I went to the FreeBSD hand book and it said to add the line
hint.apic.0.disabled="1"
to loader.conf Unfortunately, on pfsense I find two separate loader.confs available. One is located in /boot/defaults/ and one just under /boot. The one in /boot/defaults looks to be the wrong place for the above line (as the syntax doesn't match what is actually in the file) so I am hesitant to place it there. I tried the one under just /boot. But when I run a```
ps ax | grep acpi -
Disabled ACPI. Got the wrong page in the manual. The correct way of disabling ACPI is:
hint.acpi.0.disabled="1" ```and place that in the file /boot/device.hints Now I must wait…..
-
http://wiki.pfsense.com/wikka.php?wakka=BootOptions
-
Oh ok. I guess I had it in the wrong place. Unfortunately, I have to leave work in a few minutes. I will post tomorrow the findings and see if disabling ACPI helped with my errors.
Thanks again!
-
Disabling ACPI was a no go. The Connection still went down. All issues reported in first post still exist. Anything that I can post that might help diagnose my issue?
Thanks again.
-
Noticed a quirk. It seems that I was overrunning the default state table size. I've noticed it for a while but didn't think anything of it until I read this post http://forum.pfsense.org/index.php/topic,71.0.html. I have over 400 users here so I would assume based on what was posted there 10k States just isn't enough. This also correlates with the times that the connection "goes down", as it normally goes down during high traffic periods(noon/after dinner). I have reset the Max States to 1million. As the post above stated it was under System -> Advanced.
Thanks.
Back to waiting…
-
Well it's been up and zippy for about 3.5 hours and stable. Seems the "States" table limitation was my problem after all. I am still geting the "watchdog timeout" errors but that might have to do with a switch that the server is plugged into.
Thanks again.
-
Noticed a quirk. It seems that I was overrunning the default state table size. I've noticed it for a while but didn't think anything of it until I read this post http://forum.pfsense.org/index.php/topic,71.0.html. I have over 400 users here so I would assume based on what was posted there 10k States just isn't enough. This also correlates with the times that the connection "goes down", as it normally goes down during high traffic periods(noon/after dinner). I have reset the Max States to 1million. As the post above stated it was under System -> Advanced.
Thanks.
Back to waiting…
10K -> 1 million is a helluva jump. You might want to scale that back some…say to 100K or 200K? :) This thread might be useful to you http://marc.theaimsgroup.com/?l=pfsense-support&m=114771925530407&w=2
–Bill