Watchguard firebox watchdog errors
-
Has anyone ever found out whether or not the tunables or the shell command fixed their watchdog timeouts once and for all?
Since I upgraded to 2.0 b4 a while back, my firebox appeared to be cured of the watchdog timout affliction, until a week or two ago, when it started popping up again all over the log.
It seems to be most prevalent when a device coming from the wireless network tries to resolve dns, but I've seen it happen from wired workstations as well.Also, just when the first timeout occurs after an update or reboot, calcru errors in just about every process occur.
These look like the following:
kernel: calcru: runtime went backwards from 3257 usec to 3065 usec for pid 16027 (unlinkd)
kernel: calcru: runtime went backwards from 10622 usec to 10111 usec for pid 0 (kernel)
etcetera.
These only show up once, for what seems to me for every process on the machine and don't come back until after a reboot or upgrade (which obviously also reboots the machine).So while it seemed that the watchdog timeout errors had been fixed, they are back, at least for me.
The questions are why, what has changed that the errors have returned and what can we do to fix the errors his time around?Edit:
After a few days I completely reinstalled the machine with 2.0b4 and packages because I figured a clean install might help.
Unfortunately, the watchdog errors were still popping up, but only when a machine accessed the internet through the wireless (wifi) connection.That prompted me to decide for a return to an as barebones install as possible, uninstalling all installe packages one at a time.
After having uninstalled squid (the third package I uninstalled), the timeouts and calcru errors disappeared and I have not seen errors since (forty days ago).
To be sure I uninstalled all other packages as well.The strange thing about this is, that all packages and pfsense 2b4 worked great till around september the 4th.
Regardless, I'm quite glad that I found a way to fix the watchdog errors.
If anyone else has been having these problems as well, try uninstalling packages from a fully configured system.
-
I am getting the watchdog time outs on my firebox but only occasionally on re3 (connected to a wireless ap)
Touch wood I haven't had any on the other interfaces even under massive load.
Im on a recent 2.0 build and it seems a big step forward for us Watchguard users.
I have no packages installed, they don't seem to install properly on 2.0 embedded. I also have net.inet.tcp.tso set to 0 as well as Disable hardware checksum offload, Disable hardware TCP segmentation offload and Disable hardware large receive offload selected in the settings.
I haven't noticed the watchdog time-outs causing any connectivity issues on my wireless devices, it seems to recover from the errors ok so im not too worried at the moment.
-
When I was trying to decide what hardware to use for my pfsense box I considered the Watchguard X Core but eventually went for the peak instead as it has all Intel NICs. Anyway while thinking I spent ages Googling the watchdog timeout problem and found an interesting thread (which I now can't find ::)) in which the problem was decribed as being caused by the drivers inability to handle fragmented packets correctly. The result of this was that the problem only really shows up if you are connected directly to the NIC. If you are connected via a switch, in which packets are received and resent then the may never happen.
I'd be interested in others thoughts on this.
How are your networks arranged?
Are you connected via a switch?Steve
-
Yes I saw something similar, they were saying if you have a decent switch in between it will rebuild the packets?
My interfaces are set up like this:
RE0 - to ADSL modem via powerline plug set
RE1 - to cable modem
RE2 - to cheap gig switch
RE3 - to wireless ap
RE4 - to wireless ap
RE5 - unusedI have so far seen the errors on RE1, RE2 and RE3 they are mainly on RE2 though.
As I said though with pfsense 2.0 they seem to be few and far between and they dont seem to take the interface down. They just get logged as an "out" error.
It is a shame as this hardware is ideal, I have upgraded my ram to 512mb and the processor to a P3 1.4ghz and it flys :)
You would struggle to get something of a similar spec/format for £60!!!!
My cable ISP announced a 100meg service yesterday so I think I am going to have to look to upgrade to a Peak/E series at some point.
The only thing is I want to keep power consumption under 50w if possible ???
-
Hmm, I wish I could find that post. >:(
Anyway it doesn't seem to be solving the problem for you.
I agree the Watchguard boxes are ideal for pfsense, timeouts aside. I've ended up doing almost the opposite to you. I have an X peak box but have swapped out the 2.8G P4 for a P4-M which is underclocked to 1.2GHz. The whole box runs about 40W. <£50. 8)
Do you think 100M cable is going to push your box?
Do you have load sharing between adsl and cable?Steve
-
Yeah im going to keep an eye out for a Peak but they don't come up on fleabay as much as the X cores, it would be good to ditch the realtek nics!
The main reason I upgraded the processor was heat, the P3 runs a lot cooler than the Celeron with the bonus that it is more powerful :)
that means I can run less/slower fans as it is bloody noisy stock!
Im not sure about the 100meg pushing the box, I think it could well push the realteks with all the optimisations disabled.
Yes I am running load balancing between the 2 wans I have seen download speeds of 7.7Mb a second which is pretty quick the load balancing and fail over is fantastic, much better than on my old Draytek Vigor 2930.
The firebox seems to handle these speeds fine even with Snort using 75% of the rules on both interfaces and that was just with the Celeron and pfsense 1.2.3 (as I cant get snort running on 2.0 embedded)