Watchguard firebox watchdog errors
-
I have pfSense 2.0 beta 2 installed on a firebox x500 and of course I am getting the watchdog error on the re1 interface while accessing the web configurator.
Reading on Freebsd developer forum there has been a problem with the re() driver and watchdog errors when using TCP Segmentation Offload (TSO)
Checking the option on the re1 interface with ifconfig re1, TSO does not show among the enabled options, however issuing ifconfig -tso (I did it on both re0 (my wan) and re1 (my lan) seems to have solved the watchdog error so far. I'll do more testing and report.
Now my question is how do I set this permanently without having to ssh to the box after every reboot?
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
PS still no wtchdog errors with tso disabled via ifconfig re1 -tso
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
Sorry, my mistake. TSO is distinct from hardware checksumming.
I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.
In the absence of a
-
On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.
Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.
Sorry, my mistake. TSO is distinct from hardware checksumming.
I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.
In the absence of a
well, the assumption came by reading the freebsd developer forums where it was mentioned that TSO when enabled was giving watchdog errors. The solution was to disable TSO in the driver by default. So I thought TSO was disabled, expecially since it is not in the driver enabled features. So I was surprised that issuing ifconfig re1 -tso would do anything at all. To test I did a cat /dev/random that causes a high cpu utilization and whereas it would give watchdog errors to no end prior to disabling tso, it has been now running for more than 24 hours without a single error.
It would be interesting if others can try and share their experience.
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
-
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
I've seen something like that, just can't recall where is was exactly…
(there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that) -
I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?
I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
I've seen something like that, just can't recall where is was exactly…
(there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that)You are right I found it at the bottom of the page for the System - Advance - System Tunables.
Now there are 2 TCP Offload Engines tunables, one is net.inet.tcp.tso, the other is hw.bce.tso_enable. I disabled both and the firebox has worked without watchdog errors. The hw.bce one should only pertain to the bce driver. is there an equivalent tunable for the re() driver?
-
You are right I found it at the bottom of the page for the System - Advance - System Tunables.
I ment the one (in v2.0 beta2) under: System - Advanced - Networking:
Disable hardware checksum offload
Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs. -
You are right I found it at the bottom of the page for the System - Advance - System Tunables.
I ment the one (in v2.0 beta2) under: System - Advanced - Networking:
Disable hardware checksum offload
Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs.I believe the one you mentioned does a different thing than TSO.
The TSO tunables are under System->Advanced->System Tunables:
net.inet.tcp.tso
hw.bce.tso_enableI set both of them to 0 and had no more watchgog errors.
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
-
No Realtek chips in my firebox so I can't help you there. ;) But I agree testing, testing and more testing.
-
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.
-
@Spy:
Wow, if this prooves repeatable across several machines it's great news. ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
Speculation really! :PSteve
I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.
Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.
It is important to disable the right thing. What needs to be disabled is TCP Segmentation Offloading (TSO) not the checksum offloading.
To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.
I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.
I am using one of the recent 2.0 beta and again after disabling TSO I have not had an WD error since.
-
I suppose we need checkboxes to disable TSO and LRO, since certain drivers still choke on both of those.
EDIT: I opened a ticket to make sure these get added: http://redmine.pfsense.org/issues/703
-
jimp,
I see that TCP Segmentation offloading and LR Offloading are disabled (checked) in the later 2.0 builds. Do you have a list of NICs it works with? I have Intel gigabit. -
No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.
-
No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.
Why enable the option?
-
We already had the option, and it used to default to on. It made more sense to flip the default action and leave the choice in case someone decides they want to try.
It may be conceivable (especially if pfSense is used as an appliance platform) that there could be a workload where it might help on certain hardware in the future.
-
thanks.