Watchguard firebox watchdog errors
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
PS still no wtchdog errors with tso disabled via ifconfig re1 -tso
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
Sorry, my mistake. TSO is distinct from hardware checksumming.
I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.
In the absence of a
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
Sorry, my mistake. TSO is distinct from hardware checksumming.
I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.
In the absence of a
well, the assumption came by reading the freebsd developer forums where it was mentioned that TSO when enabled was giving watchdog errors. The solution was to disable TSO in the driver by default. So I thought TSO was disabled, expecially since it is not in the driver enabled features. So I was surprised that issuing ifconfig re1 -tso would do anything at all. To test I did a cat /dev/random that causes a high cpu utilization and whereas it would give watchdog errors to no end prior to disabling tso, it has been now running for more than 24 hours without a single error.
It would be interesting if others can try and share their experience.
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
-
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
I've seen something like that, just can't recall where is was exactly…
(there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that) -
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
I've seen something like that, just can't recall where is was exactly…
(there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that)You are right I found it at the bottom of the page for the System - Advance - System Tunables.
Now there are 2 TCP Offload Engines tunables, one is net.inet.tcp.tso, the other is hw.bce.tso_enable. I disabled both and the firebox has worked without watchdog errors. The hw.bce one should only pertain to the bce driver. is there an equivalent tunable for the re() driver?
-
You are right I found it at the bottom of the page for the System - Advance - System Tunables.
I ment the one (in v2.0 beta2) under: System - Advanced - Networking:
Disable hardware checksum offload
Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs. -
You are right I found it at the bottom of the page for the System - Advance - System Tunables.
I ment the one (in v2.0 beta2) under: System - Advanced - Networking:
Disable hardware checksum offload
Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs.I believe the one you mentioned does a different thing than TSO.
The TSO tunables are under System->Advanced->System Tunables:
net.inet.tcp.tso
hw.bce.tso_enableI set both of them to 0 and had no more watchgog errors.
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
-
No Realtek chips in my firebox so I can't help you there. ;) But I agree testing, testing and more testing.
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.
-
@Spy:
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.
I am using one of the recent 2.0 beta and again after disabling TSO I have not had an WD error since.
-
I suppose we need checkboxes to disable TSO and LRO, since certain drivers still choke on both of those.
EDIT: I opened a ticket to make sure these get added: http://redmine.pfsense.org/issues/703
-
jimp,
I see that TCP Segmentation offloading and LR Offloading are disabled (checked) in the later 2.0 builds. Do you have a list of NICs it works with? I have Intel gigabit. -
No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.
-
No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.
Why enable the option?
-
We already had the option, and it used to default to on. It made more sense to flip the default action and leave the choice in case someone decides they want to try.
It may be conceivable (especially if pfSense is used as an appliance platform) that there could be a workload where it might help on certain hardware in the future.
-
thanks.
-
Has anyone ever found out whether or not the tunables or the shell command fixed their watchdog timeouts once and for all?
Since I upgraded to 2.0 b4 a while back, my firebox appeared to be cured of the watchdog timout affliction, until a week or two ago, when it started popping up again all over the log.
It seems to be most prevalent when a device coming from the wireless network tries to resolve dns, but I've seen it happen from wired workstations as well.Also, just when the first timeout occurs after an update or reboot, calcru errors in just about every process occur.
These look like the following:
kernel: calcru: runtime went backwards from 3257 usec to 3065 usec for pid 16027 (unlinkd)
kernel: calcru: runtime went backwards from 10622 usec to 10111 usec for pid 0 (kernel)
etcetera.
These only show up once, for what seems to me for every process on the machine and don't come back until after a reboot or upgrade (which obviously also reboots the machine).So while it seemed that the watchdog timeout errors had been fixed, they are back, at least for me.
The questions are why, what has changed that the errors have returned and what can we do to fix the errors his time around?Edit:
After a few days I completely reinstalled the machine with 2.0b4 and packages because I figured a clean install might help.
Unfortunately, the watchdog errors were still popping up, but only when a machine accessed the internet through the wireless (wifi) connection.That prompted me to decide for a return to an as barebones install as possible, uninstalling all installe packages one at a time.
After having uninstalled squid (the third package I uninstalled), the timeouts and calcru errors disappeared and I have not seen errors since (forty days ago).
To be sure I uninstalled all other packages as well.The strange thing about this is, that all packages and pfsense 2b4 worked great till around september the 4th.
Regardless, I'm quite glad that I found a way to fix the watchdog errors.
If anyone else has been having these problems as well, try uninstalling packages from a fully configured system.