Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    Watchdog timeout -- resetting

    General pfSense Questions
    2
    14
    232
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R
      r43K9o last edited by r43K9o

      I have PFSense Box build on HP ProLiant MicroServer Gen10 / AMD Opteron X3418 1.8GHz With INTEL Gigabit ET Dual Port (Intel® 82 576).

      Onboard NICs are used for LAN and Intel card is used for WAN.

      I'm running OpenVPN & Snort on the box.
      My problem is that every (aprox.) week one of the two port on Intel NICs suddenly goes down and I cannot get it back up until I restart the whole machine.

      System Log:

      Feb 3 13:57:04	dhcpleases		bad name in /var/dhcpd/var/db/dhcpd.leases
      Feb 3 13:57:04	check_reload_status		Reloading filter
      Feb 3 13:57:04	dhcpleases		/etc/hosts changed size from original!
      Feb 3 13:57:04	php-fpm	10544	/rc.newwanip: rc.newwanip: on (IP address: <REDACTED>) (interface: WAN1[wan]) (real interface: igb0).
      Feb 3 13:57:04	php-fpm	10544	/rc.newwanip: rc.newwanip: Info: starting on igb0.
      Feb 3 13:57:03	check_reload_status		Reloading filter
      Feb 3 13:57:03	check_reload_status		rc.newwanip starting igb0
      Feb 3 13:57:03	php-fpm	10544	/rc.linkup: Hotplug event detected for WAN1(wan) static IP (<REDACTED> )
      Feb 3 13:57:02	kernel		igb0: link state changed to UP
      Feb 3 13:57:02	check_reload_status		Linkup starting igb0
      Feb 3 13:57:02	check_reload_status		Reloading filter
      Feb 3 13:57:02	php-fpm		/rc.linkup: Hotplug event detected for WAN1(wan) static IP (<REDACTED> )
      Feb 3 13:57:01	check_reload_status		Linkup starting igb0
      Feb 3 13:57:01	kernel		igb0: link state changed to DOWN
      Feb 3 13:57:01	kernel		igb0: TX(0) desc avail = 0,Next TX to Clean = 0
      Feb 3 13:57:01	kernel		igb0: Queue(0) tdh = 0, hw tdt = 984
      Feb 3 13:57:01	kernel		igb0: Watchdog timeout -- resetting
      

      I have two WAN connections and the other one is totally fine and newer went down. I tried to change cables used copper and used optics with transceivers but link will go down every time after a while.

      I have all of the HW offloading disabled except Checksum but even with it disabled it was the same. I also tried to set

      hw.igb.num_queues=1
      

      But again, nothing changed as far as I can tell.

      Does someone have any idea what could be the culprit or is there maybe some incompatibility that i cannot find any mention about?

      I did check with my ISP, he did some testing on his equipment but he couldn't find anything.

      1 Reply Last reply Reply Quote 0
      • stephenw10
        stephenw10 Netgate Administrator last edited by

        You might try booting verbose to see if you get any additional logs. You can also check the macstats in the sysctls for igb.

        Generally though igb NICs are pretty good, if you see that it's probably something exceptional.If you're lucky that might be some odd traffic rather than a hardware issue.

        Steve

        1 Reply Last reply Reply Quote 0
        • R
          r43K9o last edited by

          Thank you I will check. One think that I know for sure is that both NICs - One on my box and one on IPS router sending data but neither of them receives anything...

          1 Reply Last reply Reply Quote 0
          • stephenw10
            stephenw10 Netgate Administrator last edited by

            Hmm, well that seems like something low level. Is there a switch in between? Just a bad cable?

            Steve

            R 1 Reply Last reply Reply Quote 0
            • R
              r43K9o @stephenw10 last edited by

              @stephenw10 There is no switch, it is direct connection between pfsense box and I believe Mikrotik SXT. I tested both copper connection and optical (cat5 -> converter -> optic -> converter -> cat5) but I had same problem with both so I do not believe that it is problem with medium.

              1 Reply Last reply Reply Quote 0
              • R
                r43K9o last edited by r43K9o

                One additional note is that until I set static/permanent mac address in ARP for my gateway I got these problems (lost link) almost every day, after that It will only occurs once per week...

                1 Reply Last reply Reply Quote 0
                • R
                  r43K9o last edited by

                  I have some new findings originally problem did not occur periodically, it was seemingly random but now 21 days in a row (three times in that time) connection dropped on Monday within 3 hours window. Se either there is something external going on, which my ISP cannot find or there something in the system that I cannot yet find or it may be just coincidence...

                  1 Reply Last reply Reply Quote 0
                  • stephenw10
                    stephenw10 Netgate Administrator last edited by

                    Check the crontab, install the Cron package for easy access to it. Anything scheduled for Mondays like that should appear there.

                    Steve

                    R 1 Reply Last reply Reply Quote 0
                    • R
                      r43K9o @stephenw10 last edited by

                      @stephenw10 Yep I already checked cron, sadly there doesn't seem to be any related items...

                      minute	hour	mday	month	wday	who	command	
                      1,31	0-5	*	*	*	root	/usr/bin/nice -n20 adjkerntz -a	 
                      1	3	1	*	*	root	/usr/bin/nice -n20 /etc/rc.update_bogons.sh	 
                      1	1	*	*	*	root	/usr/bin/nice -n20 /etc/rc.dyndns.update	 
                      */60	*	*	*	*	root	/usr/bin/nice -n20 /usr/local/sbin/expiretable -v -t 3600 virusprot	 
                      30	12	*	*	*	root	/usr/bin/nice -n20 /etc/rc.update_urltables	 
                      1	0	*	*	*	root	/usr/bin/nice -n20 /etc/rc.update_pkg_metadata	 
                      0	*	*	*	*	root	/usr/local/bin/php /usr/local/www/pfblockerng/pfblockerng.php cron >> /var/log/pfblockerng/pfblockerng.log 2>&1	 
                      0	12	4-10	*	*	root	/usr/local/bin/php /usr/local/www/pfblockerng/pfblockerng.php dcc >> /var/log/pfblockerng/extras.log 2>&1	 
                      */5	*	*	*	*	root	/usr/bin/nice -n20 /usr/local/bin/php -f /usr/local/pkg/snort/snort_check_cron_misc.inc	 
                      15	3,15	*	*	*	root	/usr/bin/nice -n20 /usr/local/bin/php -f /usr/local/pkg/snort/snort_check_for_rule_updates.php	 
                      */2	*	*	*	*	root	/usr/bin/nice -n20 /sbin/pfctl -q -t snort2c -T expire 900
                      
                      1 Reply Last reply Reply Quote 0
                      • R
                        r43K9o last edited by

                        During latest connection drop I also grabed netstat -m just because...:

                        12457/4508/16965 mbufs in use (current/cache/total)
                        10428/2988/13416/1000000 mbuf clusters in use (current/cache/total/max)
                        10428/2981 mbuf+clusters out of packet secondary zone in use (current/cache)
                        1/174/175/524288 4k (page size) jumbo clusters in use (current/cache/total/max)
                        0/0/0/524288 9k jumbo clusters in use (current/cache/total/max)
                        0/0/0/39393 16k jumbo clusters in use (current/cache/total/max)
                        23979K/7799K/31778K bytes allocated to network (current/cache/total)
                        0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
                        0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
                        0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
                        0/0/0 requests for jumbo clusters denied (4k/9k/16k)
                        0 sendfile syscalls
                        0 sendfile syscalls completed without I/O request
                        0 requests for I/O initiated by sendfile
                        0 pages read by sendfile as part of a request
                        0 pages were valid at time of a sendfile request
                        0 pages were requested for read ahead by applications
                        0 pages were read ahead by sendfile
                        0 times sendfile encountered an already busy page
                        0 requests for sfbufs denied
                        0 requests for sfbufs delayed
                        

                        I also tried to force-restart interface without restarting the whole machine. Using service netif restart igb0 did nothing. Is there a way to do hard reset of the whole card from within OS so I can quickly fix the problem until I find temporary solution?

                        1 Reply Last reply Reply Quote 0
                        • R
                          r43K9o last edited by

                          Today I also found this: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240 It might be relevant to my problem.

                          1 Reply Last reply Reply Quote 0
                          • stephenw10
                            stephenw10 Netgate Administrator last edited by

                            That appears to be a FreeBSD 12 issue, are you running a pfSense 2.5 snapshot? If not it's probably unrelated.

                            Steve

                            R 1 Reply Last reply Reply Quote 0
                            • R
                              r43K9o @stephenw10 last edited by

                              @stephenw10 Oh, I have got the impression that it is also related to older versions. Thank you.

                              1 Reply Last reply Reply Quote 0
                              • R
                                r43K9o last edited by r43K9o

                                This post is deleted!
                                1 Reply Last reply Reply Quote 0
                                • First post
                                  Last post