Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Watchguard firebox watchdog errors

    Scheduled Pinned Locked Moved Hardware
    26 Posts 9 Posters 15.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W
      wallabybob
      last edited by

      On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.

      1 Reply Last reply Reply Quote 0
      • J
        jjstecchino
        last edited by

        @wallabybob:

        On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.

        Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.

        PS still no wtchdog errors with tso disabled via ifconfig re1 -tso

        1 Reply Last reply Reply Quote 0
        • W
          wallabybob
          last edited by

          @jjstecchino:

          @wallabybob:

          On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.

          Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.

          Sorry, my mistake. TSO is distinct from hardware checksumming.

          I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.

          In the absence of a

          1 Reply Last reply Reply Quote 0
          • J
            jjstecchino
            last edited by

            @wallabybob:

            @jjstecchino:

            @wallabybob:

            On System -> Advanced scroll down to Hardware Options and check the box Disable Hardware Checksum Offloading then click on Save.

            Are you sure this will disable TSO? I tought checksum offloading only applies to TXCSUM and RXCSUM.

            Sorry, my mistake. TSO is distinct from hardware checksumming.

            I had a quick look at the re driver source and it appears TSO is disabled by default, which you have observed. In the absence of a reproducible way of producing the timeout, I'd be careful about assuming that setting TSO to its default value is going to prevent the timeout report.

            In the absence of a

            well, the assumption came by reading the freebsd developer forums where it was mentioned that TSO when enabled was giving watchdog errors. The solution was to disable TSO in the driver by default. So I thought TSO was disabled, expecially since it is not in the driver enabled features.  So I was surprised that issuing ifconfig re1 -tso would do anything at all.  To test I did a cat /dev/random that causes a high cpu utilization and whereas it would give watchdog errors to no end prior to disabling tso, it has been now running for more than 24 hours without a single error.

            It would be interesting if others can try and share their experience.

            I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?

            1 Reply Last reply Reply Quote 0
            • J
              jimmy
              last edited by

              @jjstecchino:

              I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?

              I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
              I've seen something like that, just can't recall where is was exactly…
              (there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that)

              1 Reply Last reply Reply Quote 0
              • J
                jjstecchino
                last edited by

                @jimmy:

                @jjstecchino:

                I am turning off tso through ssh. where can I set the interface parameters through a configuration file so thai it will survive a reboot?

                I'm not at my box right now, but isn't there a simple checkbox in the advanced settings?
                I've seen something like that, just can't recall where is was exactly…
                (there was even a comment under the checkbox, saying that some realtek nic's had problems that could be fixed with that)

                You are right I found it at the bottom of the page for the System - Advance - System Tunables.

                Now there are 2 TCP Offload Engines tunables, one is net.inet.tcp.tso, the other is hw.bce.tso_enable. I disabled both and the firebox has worked without watchdog errors. The hw.bce one should only pertain to the bce driver. is there an equivalent tunable for the re() driver?

                1 Reply Last reply Reply Quote 0
                • J
                  jimmy
                  last edited by

                  @jjstecchino:

                  You are right I found it at the bottom of the page for the System - Advance - System Tunables.

                  I ment the one (in v2.0 beta2) under: System - Advanced - Networking:

                  Disable hardware checksum offload
                  Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs.

                  1 Reply Last reply Reply Quote 0
                  • J
                    jjstecchino
                    last edited by

                    @jimmy:

                    @jjstecchino:

                    You are right I found it at the bottom of the page for the System - Advance - System Tunables.

                    I ment the one (in v2.0 beta2) under: System - Advanced - Networking:

                    Disable hardware checksum offload
                    Checking this option will disable hardware checksum offloading. Checksum offloading is broken in some hardware, particularly some Realtek cards. Rarely, drivers may have problems with checksum offloading and some specific NICs.

                    I believe the one you mentioned does a different thing than TSO.
                    The TSO tunables are under System->Advanced->System Tunables:
                    net.inet.tcp.tso
                    hw.bce.tso_enable

                    I set both of them to 0 and had no more watchgog errors.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
                      Speculation really!  :P

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • J
                        jjstecchino
                        last edited by

                        @stephenw10:

                        Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
                        Speculation really!  :P

                        Steve

                        I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

                        Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

                        It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
                        To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

                        I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          No Realtek chips in my firebox so I can't help you there.  ;) But I agree testing, testing and more testing.

                          1 Reply Last reply Reply Quote 0
                          • Spy AleloS
                            Spy Alelo
                            last edited by

                            @jjstecchino:

                            @stephenw10:

                            Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
                            Speculation really!  :P

                            Steve

                            I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

                            Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

                            It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
                            To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

                            I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

                            I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.

                            1 Reply Last reply Reply Quote 0
                            • J
                              jjstecchino
                              last edited by

                              @Spy:

                              @jjstecchino:

                              @stephenw10:

                              Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
                              Speculation really!  :P

                              Steve

                              I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

                              Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

                              It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
                              To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

                              I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

                              I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.

                              I am using one of the recent 2.0 beta and again after disabling TSO I have not had an WD error since.

                              1 Reply Last reply Reply Quote 0
                              • jimpJ
                                jimp Rebel Alliance Developer Netgate
                                last edited by

                                I suppose we need checkboxes to disable TSO and LRO, since certain drivers still choke on both of those.

                                EDIT: I opened a ticket to make sure these get added: http://redmine.pfsense.org/issues/703

                                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                Need help fast? Netgate Global Support!

                                Do not Chat/PM for help!

                                1 Reply Last reply Reply Quote 0
                                • valnarV
                                  valnar
                                  last edited by

                                  jimp,
                                  I see that TCP Segmentation offloading and LR Offloading are disabled (checked) in the later 2.0 builds.  Do you have a list of NICs it works with?  I have Intel gigabit.

                                  1 Reply Last reply Reply Quote 0
                                  • jimpJ
                                    jimp Rebel Alliance Developer Netgate
                                    last edited by

                                    No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.

                                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                    Need help fast? Netgate Global Support!

                                    Do not Chat/PM for help!

                                    1 Reply Last reply Reply Quote 0
                                    • valnarV
                                      valnar
                                      last edited by

                                      @jimp:

                                      No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.

                                      Why enable the option?

                                      1 Reply Last reply Reply Quote 0
                                      • jimpJ
                                        jimp Rebel Alliance Developer Netgate
                                        last edited by

                                        We already had the option, and it used to default to on. It made more sense to flip the default action and leave the choice in case someone decides they want to try.

                                        It may be conceivable (especially if pfSense is used as an appliance platform) that there could be a workload where it might help on certain hardware in the future.

                                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                                        Need help fast? Netgate Global Support!

                                        Do not Chat/PM for help!

                                        1 Reply Last reply Reply Quote 0
                                        • valnarV
                                          valnar
                                          last edited by

                                          thanks.

                                          1 Reply Last reply Reply Quote 0
                                          • I
                                            iFloris
                                            last edited by

                                            Has anyone ever found out whether or not the tunables or the shell command fixed their watchdog timeouts once and for all?

                                            Since I upgraded to 2.0 b4 a while back, my firebox appeared to be cured of the watchdog timout affliction, until a week or two ago, when it started popping up again all over the log.
                                            It seems to be most prevalent when a device coming from the wireless network tries to resolve dns, but I've seen it happen from wired workstations as well.

                                            Also, just when the first timeout occurs after an update or reboot, calcru errors in just about every process occur.
                                            These look like the following:
                                            kernel: calcru: runtime went backwards from 3257 usec to 3065 usec for pid 16027 (unlinkd)
                                            kernel: calcru: runtime went backwards from 10622 usec to 10111 usec for pid 0 (kernel)
                                            etcetera.
                                            These only show up once, for what seems to me for every process on the machine and don't come back until after a reboot or upgrade (which obviously also reboots the machine).

                                            So while it seemed that the watchdog timeout errors had been fixed, they are back, at least for me.
                                            The questions are why, what has changed that the errors have returned and what can we do to fix the errors his time around?

                                            Edit:

                                            After a few days I completely reinstalled the machine with 2.0b4 and packages because I figured a clean install might help.
                                            Unfortunately, the watchdog errors were still popping up, but only when a machine accessed the internet through the wireless (wifi) connection.

                                            That prompted me to decide for a return to an as barebones install as possible, uninstalling all installe packages one at a time.
                                            After having uninstalled squid (the third package I uninstalled), the timeouts and calcru errors disappeared and I have not seen errors since (forty days ago).
                                            To be sure I uninstalled all other packages as well.

                                            The strange thing about this is, that all packages and pfsense 2b4 worked great till around september the 4th.

                                            Regardless, I'm quite glad that I found a way to fix the watchdog errors.

                                            If anyone else has been having these problems as well, try uninstalling packages from a fully configured system.

                                            one layer of information
                                            removed

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.