LAN hotplug errors after upgrade to 2.5.0
-
Upgraded my Supermicro build from 2.4.5-p1 to 2.5.0, and I'm now seeing hotplug errors in the logs. LAN clients briefly disconnect, like the link has been broken, SSH times out, file transfers fail, etc. If I leave for work, and everything in my house sits idle, I don't really see the problem here. however, when I'm running a file transfer, or otherwise hitting the LAN with a heavier load, I see this problem occur.
From what I can tell, I'm seeing also "newwwanip" messages, which seems to suggest some gateway is triggering the reload of interfaces, but I can't for the life of me find the culprit. Here's the setup:
- My ISP doesn't offer IPv6. I used to use a tunnel broker, but had issues getting IPv6 working even with that, so I have since disabled it. There is no GIF configured, and no interface assigned to it. IPv6 is globally disabled in pfSense, so only IPv4 is in play here.
- Build is a Supermicro X9SCM-F with an E3-1220v2, and 2x4GB of DDR3, if that matters.
- System only has two interfaces, so I'm rocking router on a stick here. Meaning when there's hotplug evens on LAN, every client on my network loses access to pfSense briefly.
System log messages, or at least the relevant block of them from the interface reload, is as follows:
Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1 Mar 2 19:33:17 hydrogen kernel: em1: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.100: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.70: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.101: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.99: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.80: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.20: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.999: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.10: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.199: link state changed to DOWN Mar 2 19:33:17 hydrogen kernel: em1.30: link state changed to DOWN Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.100 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.70 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.101 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.99 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.80 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.20 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.999 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.10 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.199 Mar 2 19:33:17 hydrogen check_reload_status[392]: Linkup starting em1.30 Mar 2 19:33:18 hydrogen php-fpm[56091]: /rc.linkup: Hotplug event detected for LAN(lan) static IP ( ) Mar 2 19:33:18 hydrogen php-fpm[50435]: /rc.linkup: Hotplug event detected for END_DEVICES(opt7) static IP (10.0.100.1 ) Mar 2 19:33:18 hydrogen php-fpm[26512]: /rc.linkup: Hotplug event detected for MANAGEMENT(opt6) static IP (10.99.0.1 ) Mar 2 19:33:18 hydrogen php-fpm[353]: /rc.linkup: Hotplug event detected for SECURITY(opt4) static IP (10.0.70.1 ) Mar 2 19:33:18 hydrogen php-fpm[354]: /rc.linkup: Hotplug event detected for IOT(opt8) static IP (10.0.101.1 ) Mar 2 19:33:18 hydrogen php-fpm[87653]: /rc.linkup: Hotplug event detected for DMZ(opt5) static IP (10.0.80.1 ) Mar 2 19:33:18 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:18 hydrogen php-fpm[56091]: /rc.linkup: Hotplug event detected for STORAGE(opt2) static IP (10.0.20.1 ) Mar 2 19:33:18 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:18 hydrogen php-fpm[26512]: /rc.linkup: Hotplug event detected for TESTNET(opt11) static IP (10.100.0.1 ) Mar 2 19:33:18 hydrogen php-fpm[50435]: /rc.linkup: Hotplug event detected for SERVERS(opt1) static IP (10.0.10.1 ) Mar 2 19:33:18 hydrogen php-fpm[354]: /rc.linkup: Hotplug event detected for GUEST(opt9) static IP (10.0.199.1 ) Mar 2 19:33:18 hydrogen php-fpm[353]: /rc.linkup: Hotplug event detected for MEDIA(opt3) static IP (10.0.30.1 ) Mar 2 19:33:20 hydrogen avahi-daemon[45996]: Leaving mDNS multicast group on interface em1.100.IPv6 with address fe80::ec4:7aff:fe01:3ba0. Mar 2 19:33:20 hydrogen kernel: em1: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.100: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.70: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.101: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.99: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.80: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.20: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.999: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.10: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.199: link state changed to UP Mar 2 19:33:20 hydrogen kernel: em1.30: link state changed to UP Mar 2 19:33:20 hydrogen avahi-daemon[45996]: Leaving mDNS multicast group on interface em1.100.IPv4 with address 10.0.100.1. Mar 2 19:33:20 hydrogen avahi-daemon[45996]: Leaving mDNS multicast group on interface em1.101.IPv6 with address fe80::ec4:7aff:fe01:3ba0. Mar 2 19:33:20 hydrogen avahi-daemon[45996]: Leaving mDNS multicast group on interface em1.101.IPv4 with address 10.0.101.1. Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.100 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.70 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.101 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.99 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.80 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.20 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.999 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.10 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.199 Mar 2 19:33:20 hydrogen check_reload_status[392]: Linkup starting em1.30 Mar 2 19:33:21 hydrogen php-fpm[26512]: /rc.linkup: Hotplug event detected for LAN(lan) static IP ( ) Mar 2 19:33:21 hydrogen php-fpm[50435]: /rc.linkup: Hotplug event detected for END_DEVICES(opt7) static IP (10.0.100.1 ) Mar 2 19:33:21 hydrogen php-fpm[354]: /rc.linkup: Hotplug event detected for SECURITY(opt4) static IP (10.0.70.1 ) Mar 2 19:33:21 hydrogen php-fpm[353]: /rc.linkup: Hotplug event detected for IOT(opt8) static IP (10.0.101.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1 Mar 2 19:33:21 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.100 Mar 2 19:33:21 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:21 hydrogen php-fpm[26512]: /rc.linkup: Hotplug event detected for MANAGEMENT(opt6) static IP (10.99.0.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.70 Mar 2 19:33:21 hydrogen php-fpm[50435]: /rc.linkup: Hotplug event detected for STORAGE(opt2) static IP (10.0.20.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.101 Mar 2 19:33:21 hydrogen php-fpm[354]: /rc.linkup: Hotplug event detected for TESTNET(opt11) static IP (10.100.0.1 ) Mar 2 19:33:21 hydrogen php-fpm[87653]: /rc.linkup: Hotplug event detected for DMZ(opt5) static IP (10.0.80.1 ) Mar 2 19:33:21 hydrogen php-fpm[353]: /rc.linkup: Hotplug event detected for SERVERS(opt1) static IP (10.0.10.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.99 Mar 2 19:33:21 hydrogen php-fpm[26512]: /rc.linkup: Hotplug event detected for MEDIA(opt3) static IP (10.0.30.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.20 Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.999 Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.10 Mar 2 19:33:21 hydrogen php-fpm[56091]: /rc.linkup: Hotplug event detected for GUEST(opt9) static IP (10.0.199.1 ) Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.80 Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.30 Mar 2 19:33:21 hydrogen check_reload_status[392]: rc.newwanip starting em1.199 Mar 2 19:33:22 hydrogen php-fpm[50435]: /rc.newwanip: rc.newwanip: Info: starting on em1. Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: Info: starting on em1.100. Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.100.1) (interface: END_DEVICES[opt7]) (real interface: em1.100). Mar 2 19:33:22 hydrogen php-fpm[50435]: /rc.newwanip: rc.newwanip: on (IP address: ) (interface: LAN[lan]) (real interface: em1). Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: Info: starting on em1.70. Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: Info: starting on em1.101. Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.70.1) (interface: SECURITY[opt4]) (real interface: em1.70). Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.101.1) (interface: IOT[opt8]) (real interface: em1.101). Mar 2 19:33:22 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: Info: starting on em1.99. Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: on (IP address: 10.99.0.1) (interface: MANAGEMENT[opt6]) (real interface: em1.99). Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: Info: starting on em1.20. Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.20.1) (interface: STORAGE[opt2]) (real interface: em1.20). Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: Info: starting on em1.999. Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: on (IP address: 10.100.0.1) (interface: TESTNET[opt11]) (real interface: em1.999). Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: Info: starting on em1.10. Mar 2 19:33:22 hydrogen php-fpm[353]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.10.1) (interface: SERVERS[opt1]) (real interface: em1.10). Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: Info: starting on em1.80. Mar 2 19:33:22 hydrogen php-fpm[87653]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.80.1) (interface: DMZ[opt5]) (real interface: em1.80). Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: Info: starting on em1.30. Mar 2 19:33:22 hydrogen php-fpm[56091]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.30.1) (interface: MEDIA[opt3]) (real interface: em1.30). Mar 2 19:33:22 hydrogen php-fpm[78576]: /rc.newwanip: rc.newwanip: Info: starting on em1.199. Mar 2 19:33:22 hydrogen php-fpm[78576]: /rc.newwanip: rc.newwanip: on (IP address: 10.0.199.1) (interface: GUEST[opt9]) (real interface: em1.199). Mar 2 19:33:24 hydrogen php-fpm[50435]: /rc.newwanip: Removing static route for monitor 1.1.1.1 and adding a new route through [REDACTED] Mar 2 19:33:26 hydrogen rc.gateway_alarm[18331]: >>> Gateway alarm: Testnet (Addr:10.100.0.2 Alarm:1 RTT:0.000ms RTTsd:0.000ms Loss:100%) Mar 2 19:33:26 hydrogen check_reload_status[392]: updating dyndns Testnet Mar 2 19:33:26 hydrogen check_reload_status[392]: Restarting ipsec tunnels Mar 2 19:33:26 hydrogen check_reload_status[392]: Restarting OpenVPN tunnels/interfaces Mar 2 19:33:26 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:33:26 hydrogen php-fpm[50435]: /rc.newwanip: Gateway, NONE AVAILABLE Mar 2 19:33:26 hydrogen php-fpm[50435]: /rc.newwanip: IP Address has changed, killing states on former IP Address 0.0.0.0. Mar 2 19:33:27 hydrogen php-fpm[87653]: /rc.openvpn: Gateway, NONE AVAILABLE Mar 2 19:33:27 hydrogen php-fpm[87653]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use Testnet. Mar 2 19:33:29 hydrogen php-fpm[50435]: /rc.newwanip: Resyncing OpenVPN instances for interface LAN. Mar 2 19:33:29 hydrogen php-fpm[50435]: /rc.newwanip: Creating rrd update script Mar 2 19:33:31 hydrogen php-fpm[50435]: /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 0.0.0.0 -> - Restarting packages. Mar 2 19:33:31 hydrogen check_reload_status[392]: Starting packages Mar 2 19:33:31 hydrogen snmpd[40712]: disk_OS_get_disks: adding device 'ada0' to device list Mar 2 19:33:32 hydrogen php-fpm[78576]: /rc.start_packages: Restarting/Starting all packages. Mar 2 19:33:32 hydrogen php-fpm[78576]: /rc.start_packages: Stopping service avahi Mar 2 19:33:32 hydrogen avahi-daemon[45996]: Got SIGTERM, quitting. Mar 2 19:33:32 hydrogen avahi-daemon[45996]: avahi-daemon 0.8 exiting. Mar 2 19:33:32 hydrogen php-fpm[78576]: /rc.start_packages: Starting service avahi Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Found user 'avahi' (UID 558) and group 'avahi' (GID 558). Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Successfully dropped root privileges. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: avahi-daemon 0.8 starting up. Mar 2 19:33:33 hydrogen avahi-daemon[52729]: Found user 'avahi' (UID 558) and group 'avahi' (GID 558). Mar 2 19:33:33 hydrogen avahi-daemon[52729]: Successfully dropped root privileges. Mar 2 19:33:33 hydrogen avahi-daemon[52729]: open(/var/run/avahi-daemon//pid): File exists Mar 2 19:33:33 hydrogen avahi-daemon[52729]: Failed to create PID file: File exists Mar 2 19:33:33 hydrogen avahi-daemon[52933]: WARNING: No NSS support for mDNS detected, consider installing nss-mdns! Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Loading service file /usr/local/etc/avahi/services/sftp-ssh.service. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Loading service file /usr/local/etc/avahi/services/ssh.service. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Joining mDNS multicast group on interface em1.101.IPv6 with address fe80::ec4:7aff:fe01:3ba0. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: New relevant interface em1.101.IPv6 for mDNS. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Joining mDNS multicast group on interface em1.101.IPv4 with address 10.0.101.1. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: New relevant interface em1.101.IPv4 for mDNS. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Joining mDNS multicast group on interface em1.100.IPv6 with address fe80::ec4:7aff:fe01:3ba0. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: New relevant interface em1.100.IPv6 for mDNS. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Joining mDNS multicast group on interface em1.100.IPv4 with address 10.0.100.1. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: New relevant interface em1.100.IPv4 for mDNS. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Network interface enumeration completed. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Server startup complete. Host name is hydrogen.local. Local service cookie is 3251914772. Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Failed to add service 'hydrogen' of type '_ssh._tcp', ignoring service group (/usr/local/etc/avahi/services/ssh.service): Not permitted Mar 2 19:33:33 hydrogen avahi-daemon[52933]: Failed to add service 'hydrogen' of type '_sftp-ssh._tcp', ignoring service group (/usr/local/etc/avahi/services/sftp-ssh.service): Not permitted Mar 2 19:33:36 hydrogen kernel: em1.10: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.999: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.20: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.30: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.70: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.80: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.99: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.100: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.101: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1: promiscuous mode disabled Mar 2 19:33:36 hydrogen kernel: em1.199: promiscuous mode disabled Mar 2 19:33:41 hydrogen kernel: em1: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.10: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.999: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.20: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.30: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.70: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.80: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.99: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.100: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.101: promiscuous mode enabled Mar 2 19:33:41 hydrogen kernel: em1.199: promiscuous mode enabled Mar 2 19:34:01 hydrogen lighttpd_pfb[88517]: [pfBlockerNG] DNSBL Webserver started Mar 2 19:34:01 hydrogen php[89697]: [pfBlockerNG] filterlog daemon started Mar 2 19:34:01 hydrogen ntopng[83291]: [HTTPserver.cpp:1229] ERROR: [HTTP] set_ports_option: cannot bind to 3000s: Address already in use Mar 2 19:34:01 hydrogen ntopng[83291]: [mongoose.c:4642] ERROR: set_ports_option: cannot bind to 3000s: No error: 0 Mar 2 19:34:01 hydrogen ntopng[83291]: [HTTPserver.cpp:1519] ERROR: Unable to start HTTP server (IPv4) on ports 3000s Mar 2 19:34:01 hydrogen ntopng[83291]: [HTTPserver.cpp:1525] ERROR: Either port in use or another ntopng instance is running (using the same port) Mar 2 19:34:47 hydrogen rc.gateway_alarm[46448]: >>> Gateway alarm: Testnet (Addr:10.100.0.2 Alarm:0 RTT:.982ms RTTsd:3.926ms Loss:5%) Mar 2 19:34:47 hydrogen check_reload_status[392]: updating dyndns Testnet Mar 2 19:34:47 hydrogen check_reload_status[392]: Restarting ipsec tunnels Mar 2 19:34:47 hydrogen check_reload_status[392]: Restarting OpenVPN tunnels/interfaces Mar 2 19:34:47 hydrogen check_reload_status[392]: Reloading filter Mar 2 19:34:49 hydrogen php-fpm[354]: /rc.openvpn: Gateway, NONE AVAILABLE Mar 2 19:34:49 hydrogen php-fpm[354]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use Testnet.
I'm seeing a gateway alarm for "Testnet" which is a VLAN interface that's connected to a pfSense test VM, but it seems that this is in the middle of the log messages. I don't believe this is the problem, and I think it's just a result of that physical interface being reloaded with the hotplug errors.
Anyway, if anyone could make any sense of this, and possibly point me in the direction of solving this, that would be great!
-
A "hotplug" event is typically physical -- it lost link with whatever it was connected to.
We have also seen issues with this in the past in some rare cases where a driver doesn't like some parameter being set on it, such as a spoofed MAC address, which makes it cycle link endlessly. But if that were the case your connectivity would be completely broken and it wouldn't come and go.
I don't think I've ever seen something like that correlate with load unless there was a hardware issue compounding it (e.g. something overheating).
The easiest and cheapest things to try are swapping the ethernet cable out completely and maybe switching to another switchport.
If it is connected to a managed switch you could check the logs there as well and see what it says.
-
@jimp Oh yeah, I've tried swapping the cable several times already. It never did this in 2.4.5. The only change that would have caused that to start is the upgrade. Hardware was left as is. Swapping the cable does nothing. I know for a fact, it's not the cable, switch, or any of the hardware or physical connection.
My guess is it's something then to do with a change in 2.5 that it doesn't like something, or some timings on an alarm or something are more sensitive now, but I'm not seeing anything jumping out in the logs.
-
Are you able to check whatever it's connected to to see if it is actually dropping the link for some reason?
Steve
-
@stephenw10 The switch it's connected to does indeed show disconnect and reconnect log messages (Dell 5548). That's what initially led me to believe it's a hardware thing, but I've tried swapping with a known good cable, and same result.
Still only happens really under load though. If I leave it idle while I'm at work, I don't really see the disconnect messages or anything.
-
You might check the stats from
sysctl dev.em.1
and the settings insysctl hw.em
.Steve
-
sysctl dev.em.1
dev.em.1.wake: 0 dev.em.1.interrupts.rx_overrun: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.rx_pkt_timer: 0 dev.em.1.interrupts.asserts: 216 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.mac_stats.tso_txd: 0 dev.em.1.mac_stats.tx_frames_1024_1522: 1495503167 dev.em.1.mac_stats.tx_frames_512_1023: 16495480 dev.em.1.mac_stats.tx_frames_256_511: 14855170 dev.em.1.mac_stats.tx_frames_128_255: 25114525 dev.em.1.mac_stats.tx_frames_65_127: 206999058 dev.em.1.mac_stats.tx_frames_64: 85971859 dev.em.1.mac_stats.mcast_pkts_txd: 441585 dev.em.1.mac_stats.bcast_pkts_txd: 13798 dev.em.1.mac_stats.good_pkts_txd: 1844939137 dev.em.1.mac_stats.total_pkts_txd: 1844939231 dev.em.1.mac_stats.good_octets_txd: 2312840063914 dev.em.1.mac_stats.good_octets_recvd: 2129400200140 dev.em.1.mac_stats.rx_frames_1024_1522: 1367756134 dev.em.1.mac_stats.rx_frames_512_1023: 16606429 dev.em.1.mac_stats.rx_frames_256_511: 5846534 dev.em.1.mac_stats.rx_frames_128_255: 32959675 dev.em.1.mac_stats.rx_frames_65_127: 346945240 dev.em.1.mac_stats.rx_frames_64: 1566882 dev.em.1.mac_stats.mcast_pkts_recvd: 1942808 dev.em.1.mac_stats.bcast_pkts_recvd: 478650 dev.em.1.mac_stats.good_pkts_recvd: 1771680901 dev.em.1.mac_stats.total_pkts_recvd: 1771681360 dev.em.1.mac_stats.xoff_txd: 1 dev.em.1.mac_stats.xoff_recvd: 0 dev.em.1.mac_stats.xon_txd: 1 dev.em.1.mac_stats.xon_recvd: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.crc_errs: 35 dev.em.1.mac_stats.recv_errs: 22 dev.em.1.mac_stats.recv_jabber: 22 dev.em.1.mac_stats.recv_oversize: 295 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_no_buff: 91985175 dev.em.1.mac_stats.missed_packets: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.queue_rx_1.rx_irq: 0 dev.em.1.queue_rx_1.rxd_tail: 452 dev.em.1.queue_rx_1.rxd_head: 455 dev.em.1.queue_rx_0.rx_irq: 0 dev.em.1.queue_rx_0.rxd_tail: 155 dev.em.1.queue_rx_0.rxd_head: 158 dev.em.1.queue_tx_1.tx_irq: 0 dev.em.1.queue_tx_1.txd_tail: 609 dev.em.1.queue_tx_1.txd_head: 605 dev.em.1.queue_tx_0.tx_irq: 0 dev.em.1.queue_tx_0.txd_tail: 533 dev.em.1.queue_tx_0.txd_head: 532 dev.em.1.fc_low_water: 16932 dev.em.1.fc_high_water: 18432 dev.em.1.rx_control: 67403806 dev.em.1.device_control: 1477444168 dev.em.1.watchdog_timeouts: 0 dev.em.1.rx_overruns: 0 dev.em.1.link_irq: 74 dev.em.1.dropped: 72 dev.em.1.eee_control: 1 dev.em.1.itr: 488 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_int_delay: 66 dev.em.1.rx_int_delay: 0 dev.em.1.rs_dump: 0 dev.em.1.reg_dump: General Registers CTRL 58100248 STATUS 00080783 CTRL_EXIT 80580000 Interrupt Registers ICR 80a00083 RX Registers RCTL 0404801e RDLEN 00004000 RDH 000000a3 RDT 000000a2 RXDCTL 01050420 RDBAL 04414000 RDBAH 00000000 TX Registers TCTL 3103f0fa TDBAL 057cc000 TDBAH 00000000 TDLEN 00004000 TDH 0000022d TDT 0000022d TXDCTL 0341011f TDFH 00000b97 TDFT 0000137c TDFHS 00000bea TDFPC 0000000a dev.em.1.fc: 3 dev.em.1.debug: -1 dev.em.1.nvm: -1 dev.em.1.iflib.rxq1.rxq_fl0.buf_size: 2048 dev.em.1.iflib.rxq1.rxq_fl0.credits: 1023 dev.em.1.iflib.rxq1.rxq_fl0.cidx: 520 dev.em.1.iflib.rxq1.rxq_fl0.pidx: 519 dev.em.1.iflib.rxq0.rxq_fl0.buf_size: 2048 dev.em.1.iflib.rxq0.rxq_fl0.credits: 1023 dev.em.1.iflib.rxq0.rxq_fl0.cidx: 166 dev.em.1.iflib.rxq0.rxq_fl0.pidx: 165 dev.em.1.iflib.txq1.r_abdications: 0 dev.em.1.iflib.txq1.r_restarts: 0 dev.em.1.iflib.txq1.r_stalls: 0 dev.em.1.iflib.txq1.r_starts: 13052377 dev.em.1.iflib.txq1.r_drops: 0 dev.em.1.iflib.txq1.r_enqueues: 13058272 dev.em.1.iflib.txq1.ring_state: pidx_head: 0699 pidx_tail: 0699 cidx: 0699 state: IDLE dev.em.1.iflib.txq1.txq_cleaned: 13726331 dev.em.1.iflib.txq1.txq_processed: 13726379 dev.em.1.iflib.txq1.txq_in_use: 50 dev.em.1.iflib.txq1.txq_cidx_processed: 685 dev.em.1.iflib.txq1.txq_cidx: 645 dev.em.1.iflib.txq1.txq_pidx: 695 dev.em.1.iflib.txq1.no_tx_dma_setup: 0 dev.em.1.iflib.txq1.txd_encap_efbig: 0 dev.em.1.iflib.txq1.tx_map_failed: 0 dev.em.1.iflib.txq1.no_desc_avail: 0 dev.em.1.iflib.txq1.mbuf_defrag_failed: 0 dev.em.1.iflib.txq1.m_pullups: 1844 dev.em.1.iflib.txq1.mbuf_defrag: 0 dev.em.1.iflib.txq0.r_abdications: 0 dev.em.1.iflib.txq0.r_restarts: 0 dev.em.1.iflib.txq0.r_stalls: 0 dev.em.1.iflib.txq0.r_starts: 679332 dev.em.1.iflib.txq0.r_drops: 0 dev.em.1.iflib.txq0.r_enqueues: 679664 dev.em.1.iflib.txq0.ring_state: pidx_head: 1840 pidx_tail: 1840 cidx: 1840 state: IDLE dev.em.1.iflib.txq0.txq_cleaned: 1355334 dev.em.1.iflib.txq0.txq_processed: 1355378 dev.em.1.iflib.txq0.txq_in_use: 54 dev.em.1.iflib.txq0.txq_cidx_processed: 626 dev.em.1.iflib.txq0.txq_cidx: 586 dev.em.1.iflib.txq0.txq_pidx: 640 dev.em.1.iflib.txq0.no_tx_dma_setup: 0 dev.em.1.iflib.txq0.txd_encap_efbig: 0 dev.em.1.iflib.txq0.tx_map_failed: 0 dev.em.1.iflib.txq0.no_desc_avail: 0 dev.em.1.iflib.txq0.mbuf_defrag_failed: 0 dev.em.1.iflib.txq0.m_pullups: 1912 dev.em.1.iflib.txq0.mbuf_defrag: 0 dev.em.1.iflib.override_nrxds: 0 dev.em.1.iflib.override_ntxds: 0 dev.em.1.iflib.separate_txrx: 0 dev.em.1.iflib.core_offset: 1 dev.em.1.iflib.tx_abdicate: 0 dev.em.1.iflib.rx_budget: 0 dev.em.1.iflib.disable_msix: 0 dev.em.1.iflib.override_qs_enable: 0 dev.em.1.iflib.override_nrxqs: 0 dev.em.1.iflib.override_ntxqs: 0 dev.em.1.iflib.driver_version: 7.6.1-k dev.em.1.%parent: pci2 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x0000 class=0x020000 dev.em.1.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.RP05.PXSX dev.em.1.%driver: em dev.em.1.%desc: Intel(R) PRO/1000 Network Connection
sysctl hw.em
hw.em.max_interrupt_rate: 8000 hw.em.eee_setting: 1 hw.em.rx_process_limit: 100 hw.em.sbp: 1 hw.em.smart_pwr_down: 0 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 hw.em.disable_crc_stripping: 0
-
Hmm, nothing jumps out there. Pretty much what I see on a test box here.
At this point I would probably be swapping em0 and em1 to be sure it's not a hardware issue.
Steve
-
@stephenw10 I will give that a shot when I get home.
Out of curiosity, if I'm understanding this correctly, it seems like the log messages are indicating a new WAN IP is triggering an interface reload. Am I correct there? Meaning swapping them would go from triggering a reload to the LAN just periodically dropping connectivity?
Also, the motherboard is a Supermicro X9SCM, so both NICs are on board, not that it makes much of a difference.
-
@stephenw10 Okay, so swapping the two, first thing I noticed is the clip on the WAN cable isn't broken off, but it doesn't click. I was planning on re-crimping that end, but out of curiosity, I let everything come back up after a config change (backup config, swap em0 and em1, and import modified config), and I unplugged, left, and plugged back in the WAN on em1.
I see the log is much shorter for hotplug on WAN compared to LAN, though I suspect that's just cause of all the LAN subinterfaces. I did noticed that instead of a "hotplug" message, this time the singular message is "HOTPLUG," and there's only one mention of it. I don't quite know if that's a different thing or not.
I'm currently hammering everything with a 240GB file transfer to see if I can get the error to pop up again on its own with the NIC swap. In the mean time, any idea why a physical unplug generates different log messages than what I was seeing before? Logs are as follows:
Mar 7 19:22:24 hydrogen check_reload_status[309]: Linkup starting em1 Mar 7 19:22:24 hydrogen kernel: em1: link state changed to DOWN Mar 7 19:22:25 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet detached event for wan Mar 7 19:22:26 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:22:43 hydrogen rc.gateway_alarm[90687]: >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:696.418ms RTTsd:130.285ms Loss:26%) Mar 7 19:22:43 hydrogen check_reload_status[309]: updating dyndns WAN_DHCP Mar 7 19:22:43 hydrogen check_reload_status[309]: Restarting ipsec tunnels Mar 7 19:22:43 hydrogen check_reload_status[309]: Restarting OpenVPN tunnels/interfaces Mar 7 19:22:43 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: Gateway, NONE AVAILABLE Mar 7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Mar 7 19:22:44 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (522789) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:45 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (home.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:46 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (mail.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:47 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (bitwarden.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:23:27 hydrogen kernel: ovpns2: link state changed to DOWN Mar 7 19:23:27 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:23:53 hydrogen check_reload_status[309]: Linkup starting em1 Mar 7 19:23:53 hydrogen kernel: em1: link state changed to UP Mar 7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet attached event for wan Mar 7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: HOTPLUG: Configuring interface wan Mar 7 19:23:56 hydrogen check_reload_status[309]: rc.newwanip starting em1 Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: calling interface_dhcpv6_configure. Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Accept router advertisements on interface em1 Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Starting rtsold process Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: Info: starting on em1. Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: on (IP address: 99.198.53.111) (interface: WAN[wan]) (real interface: em1). Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Accept router advertisements on interface em1 Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Starting rtsold process Mar 7 19:23:58 hydrogen php-fpm[270]: /rc.linkup: Gateway, NONE AVAILABLE Mar 7 19:23:58 hydrogen check_reload_status[309]: Restarting ipsec tunnels Mar 7 19:23:58 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:23:59 hydrogen php-fpm[10424]: /rc.newwanip: The command '/usr/sbin/rtsold -1 -p /var/run/rtsold_em1.pid -M /var/etc/rtsold_em1_script.sh -O /var/etc/rtsold_em1_script.sh em1' returned exit code '1', the output was 'rtsold: failed to open pidfile: File exists' Mar 7 19:23:59 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:24:01 hydrogen check_reload_status[309]: updating dyndns wan Mar 7 19:24:01 hydrogen php-fpm[270]: /rc.linkup: Removing static route for monitor 1.1.1.1 and adding a new route through [REDACTED] Mar 7 19:24:01 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:24:02 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:24:03 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (522789): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:04 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (home.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:05 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (mail.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:06 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:24:06 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (bitwarden.REDACTED![hr-H-T-123456-unload-standup-base.png](/assets/uploads/files/1615167452915-hr-h-t-123456-unload-standup-base.png) ): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
-
@stephenw10 Just an update, I swapped the NICs, and re-crimped the end on the WAN cable, and have seen no issues at all since then.
Out of curiosity, to see if it was the cable or not, I swapped them back now that the cable was crimped, and when I hammered it with a file transfer, I saw the same issues pop up again.
So now the question is, these NICs I believe use the same controller. So if the controller had an issue, they'd both be weird. If it was physical, I would normally expect to barely get anything working, if at all, not for it to disconnect under load.
I know heat is a thing, but we're connecting to a switch here, so heat output of the clients wouldn't cause a link on the pfSense box to disconnect. That being said, is there some other thing you can think of that would cause this? Or does it sound correct in assuming that that NIC is disconnecting under load, almost like it's a software issue?
-
If it was a software issue though you would expect the NICs to behave the same. Swapping them would make no difference.
Hard to say what might be causing that.
Steve
-
I agree, if it was software the behavior would be identical on all ports.
There are many subtle ways electronics can fail under load, it's difficult to speculate about why, but usually boils down to heat or some kind of electrical interference which can't be compensated for at high load vs low load.