Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Watchguard firebox watchdog errors

    Scheduled Pinned Locked Moved Hardware
    26 Posts 9 Posters 15.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
      Speculation really!  :P

      Steve

      1 Reply Last reply Reply Quote 0
      • J
        jjstecchino
        last edited by

        @stephenw10:

        Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
        Speculation really!  :P

        Steve

        I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

        Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

        It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
        To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

        I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          No Realtek chips in my firebox so I can't help you there.  ;) But I agree testing, testing and more testing.

          1 Reply Last reply Reply Quote 0
          • Spy AleloS
            Spy Alelo
            last edited by

            @jjstecchino:

            @stephenw10:

            Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
            Speculation really!  :P

            Steve

            I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

            Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

            It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
            To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

            I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

            I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.

            1 Reply Last reply Reply Quote 0
            • J
              jjstecchino
              last edited by

              @Spy:

              @jjstecchino:

              @stephenw10:

              Wow, if this prooves repeatable across several machines it's great news.  ;D I remember reading up on this before I bought my firebox, it's really the only reason I went for the more powerful X-peak box. It definately seemed to be a packet fragmentation issue. People who were connected through some active device (e.g. a managed switch or router) did not seem to suffer any problems as the switch rebuilds the packets.
              Speculation really!  :P

              Steve

              I have been running for a week without a single watchdog error. I have a site to site openvpn tunnel. Occasional openvpn roadwarrior. email, web server. It is not a lot of traffic, so it would be nice if people with higher traffic loads and use of more than just 2 nics would test and report.

              Also running a process with high cpu demand such as "cat /dev/random" prior to disabling TSO would cause watchdog errors within 30 seconds, now I left it going for 24h without a single error.

              It is important to disable the right thing. What needs to be disabled is TCP  Segmentation Offloading (TSO) not the checksum offloading.
              To disable TSO you can issue "ifconfig re(x) -tso" where re(x) is the name of your interface. or you can go to the Advanced->system tunables and disable net.inet.tcp.tso. I also disabled hw.bce.tso_enable.This last one should be pertinent to the BCE driver and do nothing on the RE driver, so I do not know if it is really needed.

              I would love to see more testing and reports on the issueand in particular if disabling TSO has any effect on the box performance.

              I tested this, but I still get the watchdog timeouts. Using pfSense 1.2.3 RELEASE embedded, with the command ifconfig re1 -tso that you suggested. Tried on re0 as well, the issue is still there. Also tried "Disable Hardware Checksum Offloading" for kicks, no difference.

              I am using one of the recent 2.0 beta and again after disabling TSO I have not had an WD error since.

              1 Reply Last reply Reply Quote 0
              • jimpJ
                jimp Rebel Alliance Developer Netgate
                last edited by

                I suppose we need checkboxes to disable TSO and LRO, since certain drivers still choke on both of those.

                EDIT: I opened a ticket to make sure these get added: http://redmine.pfsense.org/issues/703

                Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                Need help fast? Netgate Global Support!

                Do not Chat/PM for help!

                1 Reply Last reply Reply Quote 0
                • valnarV
                  valnar
                  last edited by

                  jimp,
                  I see that TCP Segmentation offloading and LR Offloading are disabled (checked) in the later 2.0 builds.  Do you have a list of NICs it works with?  I have Intel gigabit.

                  1 Reply Last reply Reply Quote 0
                  • jimpJ
                    jimp Rebel Alliance Developer Netgate
                    last edited by

                    No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.

                    Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                    Need help fast? Netgate Global Support!

                    Do not Chat/PM for help!

                    1 Reply Last reply Reply Quote 0
                    • valnarV
                      valnar
                      last edited by

                      @jimp:

                      No, and they don't help in a routing scenario anyhow. If you want to try, feel free, but for most people they degrade performance either because of (a) driver bugs, or (b) the fact that they are really more helpful for workstations than routers.

                      Why enable the option?

                      1 Reply Last reply Reply Quote 0
                      • jimpJ
                        jimp Rebel Alliance Developer Netgate
                        last edited by

                        We already had the option, and it used to default to on. It made more sense to flip the default action and leave the choice in case someone decides they want to try.

                        It may be conceivable (especially if pfSense is used as an appliance platform) that there could be a workload where it might help on certain hardware in the future.

                        Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                        Need help fast? Netgate Global Support!

                        Do not Chat/PM for help!

                        1 Reply Last reply Reply Quote 0
                        • valnarV
                          valnar
                          last edited by

                          thanks.

                          1 Reply Last reply Reply Quote 0
                          • I
                            iFloris
                            last edited by

                            Has anyone ever found out whether or not the tunables or the shell command fixed their watchdog timeouts once and for all?

                            Since I upgraded to 2.0 b4 a while back, my firebox appeared to be cured of the watchdog timout affliction, until a week or two ago, when it started popping up again all over the log.
                            It seems to be most prevalent when a device coming from the wireless network tries to resolve dns, but I've seen it happen from wired workstations as well.

                            Also, just when the first timeout occurs after an update or reboot, calcru errors in just about every process occur.
                            These look like the following:
                            kernel: calcru: runtime went backwards from 3257 usec to 3065 usec for pid 16027 (unlinkd)
                            kernel: calcru: runtime went backwards from 10622 usec to 10111 usec for pid 0 (kernel)
                            etcetera.
                            These only show up once, for what seems to me for every process on the machine and don't come back until after a reboot or upgrade (which obviously also reboots the machine).

                            So while it seemed that the watchdog timeout errors had been fixed, they are back, at least for me.
                            The questions are why, what has changed that the errors have returned and what can we do to fix the errors his time around?

                            Edit:

                            After a few days I completely reinstalled the machine with 2.0b4 and packages because I figured a clean install might help.
                            Unfortunately, the watchdog errors were still popping up, but only when a machine accessed the internet through the wireless (wifi) connection.

                            That prompted me to decide for a return to an as barebones install as possible, uninstalling all installe packages one at a time.
                            After having uninstalled squid (the third package I uninstalled), the timeouts and calcru errors disappeared and I have not seen errors since (forty days ago).
                            To be sure I uninstalled all other packages as well.

                            The strange thing about this is, that all packages and pfsense 2b4 worked great till around september the 4th.

                            Regardless, I'm quite glad that I found a way to fix the watchdog errors.

                            If anyone else has been having these problems as well, try uninstalling packages from a fully configured system.

                            one layer of information
                            removed

                            1 Reply Last reply Reply Quote 0
                            • J
                              jp141
                              last edited by

                              I am getting the watchdog time outs on my firebox but only occasionally on re3 (connected to a wireless ap)

                              Touch wood I haven't had any on the other interfaces even under massive load.

                              Im on a recent 2.0 build and it seems a big step forward for us Watchguard users.

                              I have no packages installed, they don't seem to install properly on 2.0 embedded. I also have net.inet.tcp.tso set to 0 as well as Disable hardware checksum offload, Disable hardware TCP segmentation offload and Disable hardware large receive offload selected in the settings.

                              I haven't noticed the watchdog time-outs causing any connectivity issues on my wireless devices, it seems to recover from the errors ok so im not too worried at the moment.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                When I was trying to decide what hardware to use for my pfsense box I considered the Watchguard X Core but eventually went for the peak instead as it has all Intel NICs. Anyway while thinking I spent ages Googling the watchdog timeout problem and found an interesting thread (which I now can't find  ::)) in which the problem was decribed as being caused by the drivers inability to handle fragmented packets correctly. The result of this was that the problem only really shows up if you are connected directly to the NIC. If you are connected via a switch, in which packets are received and resent then the may never happen.
                                I'd be interested in others thoughts on this.
                                How are your networks arranged?
                                Are you connected via a switch?

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • J
                                  jp141
                                  last edited by

                                  Yes I saw something similar, they were saying if you have a decent switch in between it will rebuild the packets?

                                  My interfaces are set up like this:

                                  RE0 - to ADSL modem via powerline plug set
                                  RE1 - to cable modem
                                  RE2 - to cheap gig switch
                                  RE3 - to wireless ap
                                  RE4 - to wireless ap
                                  RE5 - unused

                                  I have so far seen the errors on RE1, RE2 and RE3 they are mainly on RE2 though.

                                  As I said though with pfsense 2.0 they seem to be few and far between and they dont seem to take the interface down. They just get logged as an "out" error.

                                  It is a shame as this hardware is ideal, I have upgraded my ram to 512mb and the processor to a P3 1.4ghz and it flys :)

                                  You would struggle to get something of a similar spec/format for £60!!!!

                                  My cable ISP announced a 100meg service yesterday so I think I am going to have to look to upgrade to a Peak/E series at some point.

                                  The only thing is I want to keep power consumption under 50w if possible  ???

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    Hmm, I wish I could find that post.  >:(
                                    Anyway it doesn't seem to be solving the problem for you.
                                    I agree the Watchguard boxes are ideal for pfsense, timeouts aside. I've ended up doing almost the opposite to you. I have an X peak box but have swapped out the 2.8G P4 for a P4-M which is underclocked to 1.2GHz. The whole box runs about 40W. <£50.  8)
                                    Do you think 100M cable is going to  push your box?
                                    Do you have load sharing between adsl and cable?

                                    Steve

                                    1 Reply Last reply Reply Quote 0
                                    • J
                                      jp141
                                      last edited by

                                      Yeah im going to keep an eye out for a Peak but they don't come up on fleabay as much as the X cores, it would be good to ditch the realtek nics!

                                      The main reason I upgraded the processor was heat, the P3 runs a lot cooler than the Celeron with the bonus that it is more powerful :)

                                      that means I can run less/slower fans as it is bloody noisy stock!

                                      Im not sure about the 100meg pushing the box, I think it could well push the realteks with all the optimisations disabled.

                                      Yes I am running load balancing between the 2 wans I have seen download speeds of 7.7Mb a second which is pretty quick the load balancing and fail over is fantastic, much better than on my old Draytek Vigor 2930.

                                      The firebox seems to handle these speeds fine even with Snort using 75% of the rules on both interfaces and that was just with the Celeron and pfsense 1.2.3 (as I cant get snort running on 2.0 embedded)

                                      1 Reply Last reply Reply Quote 0
                                      • First post
                                        Last post
                                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.