Bce Watchdog timeout occurred post 2.3 upgrade
-
I just upgraded my final box to 2.3 last night from 2.2.6 where it has been running fine for what seems like forever (it's been months since I've touched the box, and I only ever touch it to do upgrades). However, within 6 hours of me upgrading, one of my two NIC's died. I saw the following errors in the logs for the past 2 hours until I could reboot it. And despite what it says, it never worked again past 3:26am when it died.. bce1 was dead, but luckily bce0 was still working and I was able to get in to reboot it.
May 2 03:26:26 shgw kernel: bce1: /builder/pfsense-230/tmp/FreeBSD-src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting!
May 2 03:26:26 shgw kernel: bce1: link state changed to DOWN
May 2 03:26:30 shgw kernel: bce1:
May 2 03:26:30 shgw kernel: bce1: link state changed to UP
May 2 03:26:30 shgw kernel: Gigabit link up!
May 2 03:26:35 shgw kernel: bce1: /builder/pfsense-230/tmp/FreeBSD-src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting!
May 2 03:26:35 shgw kernel: bce1: link state changed to DOWN
May 2 03:26:39 shgw kernel: bce1:
May 2 03:26:39 shgw kernel: bce1: link state changed to UP
May 2 03:26:39 shgw kernel: Gigabit link up!
May 2 03:27:51 shgw kernel: bce1: /builder/pfsense-230/tmp/FreeBSD-src/sys/dev/bce/if_bce.c(7869): Watchdog timeout occurred, resetting!
May 2 03:27:51 shgw kernel: bce1: link state changed to DOWN
May 2 03:27:55 shgw kernel: bce1:
May 2 03:27:55 shgw kernel: bce1: link state changed to UP
May 2 03:27:55 shgw kernel: Gigabit link up!I double checked and the loader.conf.local is still the same (it contains the following)..
kern.ipc.nmbclusters="1048576"
hw.bce.tso_enable=0
hw.pci.enable_msix=0So I'm not sure what to do here, as searching around didn't find much (short of a reference to making a kernel change and recompiling in FreeBSD 10.1 from a year ago).. https://docs.freebsd.org/cgi/getmsg.cgi?fetch=145859+0+archive/2015/freebsd-stable/20150419.freebsd-stable Plus looking at the source, this has been changed, so didn't solve it for me. :(
-
It might be related to the issues with SMP causing problems with something. If disabling all CPUs except 1 (including disabling hyperthreading) stops your system from having problems it is likely the same issue.
You can read about it here.
https://forum.pfsense.org/index.php?topic=110710.msg618388#msg618388Bug report here….
https://redmine.pfsense.org/issues/6296 -
I can try that. I already have SMP disabled on 2 of my other systems due to it causing networking to hang on them. But they have igb NIC's. I'll try it on this one as well. What do I have to lose..