WAN Interface keeps dropping - "Watchdog Timeout – Restarting"

dave_vooservers

Hi all,

3 times in the last day or so a newly deployed pfsense box has shut its WAN interface (bge0 in this case) and then brought it back up about a minute later. From the logs, I've garnered the following. These log lines are common each event this has happened.

Jun 5 19:08:24 	pfSense kernel: bge0: watchdog timeout -- resetting
Jun 5 19:08:24 	pfSense kernel: bge0: link state changed to DOWN
Jun 5 19:08:24 	pfSense check_reload_status: Linkup starting bge0
Jun 5 19:08:25 	pfSense php-fpm[19434]: /rc.linkup: Hotplug event detected for PUBLICWAN(wan) but ignoring since interface is configured with static IP (185.43.111.4 )
Jun 5 19:08:26 	pfSense kernel: bge0: link state changed to UP
Jun 5 19:08:26 	pfSense check_reload_status: Linkup starting bge0
Jun 5 19:08:27 	pfSense php-fpm[19434]: /rc.linkup: Hotplug event detected for PUBLICWAN(wan) but ignoring since interface is configured with static IP (185.43.111.4 )
Jun 5 19:08:27 	pfSense check_reload_status: rc.newwanip starting bge0
Jun 5 19:08:28 	pfSense php-fpm[19434]: /rc.newwanip: rc.newwanip: Info: starting on bge0.
Jun 5 19:08:28 	pfSense php-fpm[19434]: /rc.newwanip: rc.newwanip: on (IP address: 185.43.111.4) (interface: PUBLICWAN[wan]) (real interface: bge0).
Jun 5 19:08:28 	pfSense check_reload_status: Reloading filter
Jun 5 19:09:07 	pfSense check_reload_status: updating dyndns PublicWANGW
Jun 5 19:09:07 	pfSense check_reload_status: Restarting ipsec tunnels
Jun 5 19:09:07 	pfSense check_reload_status: Restarting OpenVPN tunnels/interfaces
Jun 5 19:09:07 	pfSense check_reload_status: Reloading filter

Any ideas as to whats going on with this? The environment is fairly normal, pfsense box has 2 interfaces, public and private, both interfaces on a separate VLANs. 2 hosts sit behind the firewall, those hosts are in a different rack, so we utilise trunk ports across 2 Cisco 2960G's and a Cisco 6500 router. None of the cisco hardware is reporting any errors or events. Simply that the pfsense box flaps its interface and then recovers.

Any help appreciated as the customer has noticed the outages, and I'm at a loss as to what would be causing this. I've not seen "watchdog timeout" errors before.

Thanks,
Dave.

cmb

Could be a bad NIC or cable, or a driver issue of some sort. Try the suggestion from here.
https://forums.freebsd.org/threads/fixing-dell-broadcom-bge0-watchdog-timeout-errors.44667/

except put those two lines in /boot/loader.conf.local instead of loader.conf so it won't get touched by upgrades. Then reboot.

dave_vooservers

Hi cmb, Thanks for that. The server is a HP, but I imagine if this issue is hardware based it will be because of the broadcom chipset that is common to that post you linked to. I'll get those lines added in to the .local file and see how we go. If it continues to be an issue then we'll fit a more trusted NIC to the machine.

Thanks for your help.

Dave

cmb

Yeah I'm sure that issue isn't specific to Dell, rather to that Broadcom chipset under certain circumstances, which would likely apply across any number of vendors. It's not universal to those in all Dell machines either, as the onboard Broadcom NICs in a range of Dell servers are widely used, with few problems experienced.

That sounds like a good plan.

dave_vooservers

Hi cmb,

I've added in those lines to loader.conf.local. The file did not exist, and I had to create it. Is this expected? Or should the config be placed into loader.conf also? Are there any steps to forcibly include the new .local file into the config? Or will it do it automatically on boot?

Thanks,
Dave.

Supermule

I get the same on VMX0 (vmxnet3) driver and the console is flooded with timeouts.

Changing the VM to E1000 NIC instead makes it go away instantly.

cmb

@Supermule:

I get the same on VMX0 (vmxnet3) driver and the console is flooded with timeouts.

Changing the VM to E1000 NIC instead makes it go away instantly.

That has no relation to OP's issue, please post your own thread if you'd like to pursue (and if you're still running some ancient ESX version, upgrade it first).