WAN Interface keeps dropping - "Watchdog Timeout – Restarting"
-
Hi all,
3 times in the last day or so a newly deployed pfsense box has shut its WAN interface (bge0 in this case) and then brought it back up about a minute later. From the logs, I've garnered the following. These log lines are common each event this has happened.
Jun 5 19:08:24 pfSense kernel: bge0: watchdog timeout -- resetting Jun 5 19:08:24 pfSense kernel: bge0: link state changed to DOWN Jun 5 19:08:24 pfSense check_reload_status: Linkup starting bge0 Jun 5 19:08:25 pfSense php-fpm[19434]: /rc.linkup: Hotplug event detected for PUBLICWAN(wan) but ignoring since interface is configured with static IP (185.43.111.4 ) Jun 5 19:08:26 pfSense kernel: bge0: link state changed to UP Jun 5 19:08:26 pfSense check_reload_status: Linkup starting bge0 Jun 5 19:08:27 pfSense php-fpm[19434]: /rc.linkup: Hotplug event detected for PUBLICWAN(wan) but ignoring since interface is configured with static IP (185.43.111.4 ) Jun 5 19:08:27 pfSense check_reload_status: rc.newwanip starting bge0 Jun 5 19:08:28 pfSense php-fpm[19434]: /rc.newwanip: rc.newwanip: Info: starting on bge0. Jun 5 19:08:28 pfSense php-fpm[19434]: /rc.newwanip: rc.newwanip: on (IP address: 185.43.111.4) (interface: PUBLICWAN[wan]) (real interface: bge0). Jun 5 19:08:28 pfSense check_reload_status: Reloading filter Jun 5 19:09:07 pfSense check_reload_status: updating dyndns PublicWANGW Jun 5 19:09:07 pfSense check_reload_status: Restarting ipsec tunnels Jun 5 19:09:07 pfSense check_reload_status: Restarting OpenVPN tunnels/interfaces Jun 5 19:09:07 pfSense check_reload_status: Reloading filter
Any ideas as to whats going on with this? The environment is fairly normal, pfsense box has 2 interfaces, public and private, both interfaces on a separate VLANs. 2 hosts sit behind the firewall, those hosts are in a different rack, so we utilise trunk ports across 2 Cisco 2960G's and a Cisco 6500 router. None of the cisco hardware is reporting any errors or events. Simply that the pfsense box flaps its interface and then recovers.
Any help appreciated as the customer has noticed the outages, and I'm at a loss as to what would be causing this. I've not seen "watchdog timeout" errors before.
Thanks,
Dave. -
Could be a bad NIC or cable, or a driver issue of some sort. Try the suggestion from here.
https://forums.freebsd.org/threads/fixing-dell-broadcom-bge0-watchdog-timeout-errors.44667/except put those two lines in /boot/loader.conf.local instead of loader.conf so it won't get touched by upgrades. Then reboot.
-
Hi cmb, Thanks for that. The server is a HP, but I imagine if this issue is hardware based it will be because of the broadcom chipset that is common to that post you linked to. I'll get those lines added in to the .local file and see how we go. If it continues to be an issue then we'll fit a more trusted NIC to the machine.
Thanks for your help.
Dave
-
Yeah I'm sure that issue isn't specific to Dell, rather to that Broadcom chipset under certain circumstances, which would likely apply across any number of vendors. It's not universal to those in all Dell machines either, as the onboard Broadcom NICs in a range of Dell servers are widely used, with few problems experienced.
That sounds like a good plan.
-
Hi cmb,
I've added in those lines to loader.conf.local. The file did not exist, and I had to create it. Is this expected? Or should the config be placed into loader.conf also? Are there any steps to forcibly include the new .local file into the config? Or will it do it automatically on boot?
Thanks,
Dave. -
I get the same on VMX0 (vmxnet3) driver and the console is flooded with timeouts.
Changing the VM to E1000 NIC instead makes it go away instantly.
-
I get the same on VMX0 (vmxnet3) driver and the console is flooded with timeouts.
Changing the VM to E1000 NIC instead makes it go away instantly.
That has no relation to OP's issue, please post your own thread if you'd like to pursue (and if you're still running some ancient ESX version, upgrade it first).