Hotplug Events and WAN interface down
-
Hi Folks,
We have a pfsense device (Sense A10 Quad Core rack) running 2.2.4-RELEASE in our main office. I have noticed in the last few weeks that from time to time the WAN interface will lose its connection and we fail over onto our backup link. The main link is with UPC and a Cisco cable modem, they provide a static IP which is configured on the WAN interface and normally all works as it should. The suddenly and for no apparent reason the wan link will drop.
It has done so this morning and I got onto the ISP thinking that it might be an issue on their end, they checked the modem and confirmed that it was up and working. They said that they could see the pfsense device by looking at the upstate on one of the modem ports but for some reason the pfsense device MAC address was not present in the MAC address table of the Cisco Modem. I have attached a laptop configured with a static IP, in the range supplied by the ISP to the modem and all works as you would expect it to. When this has happened before a reboot of the pfsense device will bring everything back as it was. We have a failover link so most services continue as normal but we have some IPsec tunnels which run from the main link and can only come from one IP (conditions of the people on the other end) so ideally we would like to be able to get back onto the main link without the reboot.
I have tried disabling the WAN connection and re-enabling it to see if it will force an update to the Cisco Modem but to no avail, I have also tried using the Arping package to try and update it but this is also to no avail. We have swapped out the cable to the modem as well just in case but this made no difference. Having looked through the system log the most relevant looking entries appear to be
Oct 28 09:33:22 rimdub-fw-01 kernel: em0: Watchdog timeout – resetting
Oct 28 09:33:22 rimdub-fw-01 kernel: em0: Queue(0) tdh = 574, hw tdt = 543
Oct 28 09:33:22 rimdub-fw-01 kernel: em0: TX(0) desc avail = 31,Next TX to Clean = 574
Oct 28 09:33:22 rimdub-fw-01 kernel: em0: link state changed to DOWNThese are followed on by
Oct 28 09:33:37 rimdub-fw-01 check_reload_status: Linkup starting em0
Oct 28 09:33:37 rimdub-fw-01 kernel: em0: Watchdog timeout -- resetting
Oct 28 09:33:37 rimdub-fw-01 kernel: em0: Queue(0) tdh = 0, hw tdt = 1022
Oct 28 09:33:37 rimdub-fw-01 kernel: em0: TX(0) desc avail = 2,Next TX to Clean = 0
Oct 28 09:33:37 rimdub-fw-01 kernel: em0: link state changed to DOWN
Oct 28 09:33:38 rimdub-fw-01 php-fpm[14857]: /rc.linkup: Hotplug event detected for WAN(wan) but ignoring since interface is configured with static IP (XX.XX.XX.XX )The contents of PCI Conf has the following for the Network interfaces
em0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xf0800000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x1000, size 32, disabled
bar [1c] = type Memory, range 32, base 0xf0820000, size 16384, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x1(x1)
speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[a0] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
em1@pci0:2:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xf0900000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x2000, size 32, enabled
bar [1c] = type Memory, range 32, base 0xf0920000, size 16384, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x1(x1)
speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[a0] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
em2@pci0:3:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xf0a00000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x3000, size 32, enabled
bar [1c] = type Memory, range 32, base 0xf0a20000, size 16384, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x1(x1)
speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[a0] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
em3@pci0:4:0:0: class=0x020000 card=0x00008086 chip=0x10d38086 rev=0x00 hdr=0x00
class = network
subclass = ethernet
bar [10] = type Memory, range 32, base 0xf0b00000, size 131072, enabled
bar [18] = type I/O Port, range 32, base 0x4000, size 32, enabled
bar [1c] = type Memory, range 32, base 0xf0b20000, size 16384, enabled
cap 01[c8] = powerspec 2 supports D0 D3 current D0
cap 05[d0] = MSI supports 1 message, 64 bit
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) link x1(x1)
speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[a0] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]Has anyone ever seen anything like this before ?
-
Looks alot like this: https://forum.pfsense.org/index.php?topic=96325.0
Try going to Advanced, Networking and disable TSO. There seems to be a patch for FreeBSD that fixes this with TSO and the em driver but it hasn't made it into the released versions of BSD yet.
https://reviews.freebsd.org/D3192
-
Hi Engineer, thanks for the reply. Seems that TSO is disabled by default in the version we are running.
-
Can you give your readings from:
ifconfig em0
Also, it was suggested in one of the pfsense NIC troubleshooting guides, to check this (under system tunables or add to boot.conf.local:
net.inet.tcp.tso=0
Even though TSO was checked as "disabled" on mine, it was still set to "1" on the net.inet.tcp.tso section of tunables.
-
Hi Engineer,
em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=4209b <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso>ether f4:90:ea:10:01:dc
inet6 XXXX::XXXX:XXXX:XXXX:1dc%em0 prefixlen 64 scopeid 0x1
inet XX.XXX.XXX.XXX netmask 0xfffffff8 broadcast XX.XXX.XXX.XXX
nd6 options=21 <performnud,auto_linklocal>media: Ethernet autoselect (1000baseT <full-duplex>)
status: activeYou are right with regard to the system tunables
net.inet.tcp.tso Enable TCP Segmentation Offload 1
I will amend this evening once I can get an outage window.
Thank you again for you replies …</full-duplex></performnud,auto_linklocal></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,vlan_hwcsum,wol_magic,vlan_hwtso></up,broadcast,running,simplex,multicast>
-
Hi all,
we run exactly in the same problem with the same hardware (even the UPC modem) like the first poster.
following tunables are set:
net.inet.tcp.tso = 0
hw.pci.enable_msix = 0
hw.pci.enable_msi = 0but we still have the dead wan connection. Is there anything more we can try to get rid of this?
-
Looks alot like this: https://forum.pfsense.org/index.php?topic=96325.0
Try going to Advanced, Networking and disable TSO. There seems to be a patch for FreeBSD that fixes this with TSO and the em driver but it hasn't made it into the released versions of BSD yet.
https://reviews.freebsd.org/D3192
D3192 and subsequent fixes should appear in FreeBSD 10.3 and 11.0 when they are released. pfSense 2.3 snapshots are currently based on 10.3-BETA, so should incorporate this fix.
-
Was this fixed in 2.3? I seem to be running into something similar when running 2.3.
-
There are a lot of changes in the Intel Gigabit Ethernet drivers em(4) and igb(4) in 10.3-RELEASE. I believe all of them appear in pfSense 2.3-RELEASE (which is based on 10.3-RELEASE). It would be unwise to single out this one change from the many.