LAN hotplug errors after upgrade to 2.5.0
-
Are you able to check whatever it's connected to to see if it is actually dropping the link for some reason?
Steve
-
@stephenw10 The switch it's connected to does indeed show disconnect and reconnect log messages (Dell 5548). That's what initially led me to believe it's a hardware thing, but I've tried swapping with a known good cable, and same result.
Still only happens really under load though. If I leave it idle while I'm at work, I don't really see the disconnect messages or anything.
-
You might check the stats from
sysctl dev.em.1
and the settings insysctl hw.em
.Steve
-
sysctl dev.em.1
dev.em.1.wake: 0 dev.em.1.interrupts.rx_overrun: 0 dev.em.1.interrupts.rx_desc_min_thresh: 0 dev.em.1.interrupts.tx_queue_min_thresh: 0 dev.em.1.interrupts.tx_queue_empty: 0 dev.em.1.interrupts.tx_abs_timer: 0 dev.em.1.interrupts.tx_pkt_timer: 0 dev.em.1.interrupts.rx_abs_timer: 0 dev.em.1.interrupts.rx_pkt_timer: 0 dev.em.1.interrupts.asserts: 216 dev.em.1.mac_stats.tso_ctx_fail: 0 dev.em.1.mac_stats.tso_txd: 0 dev.em.1.mac_stats.tx_frames_1024_1522: 1495503167 dev.em.1.mac_stats.tx_frames_512_1023: 16495480 dev.em.1.mac_stats.tx_frames_256_511: 14855170 dev.em.1.mac_stats.tx_frames_128_255: 25114525 dev.em.1.mac_stats.tx_frames_65_127: 206999058 dev.em.1.mac_stats.tx_frames_64: 85971859 dev.em.1.mac_stats.mcast_pkts_txd: 441585 dev.em.1.mac_stats.bcast_pkts_txd: 13798 dev.em.1.mac_stats.good_pkts_txd: 1844939137 dev.em.1.mac_stats.total_pkts_txd: 1844939231 dev.em.1.mac_stats.good_octets_txd: 2312840063914 dev.em.1.mac_stats.good_octets_recvd: 2129400200140 dev.em.1.mac_stats.rx_frames_1024_1522: 1367756134 dev.em.1.mac_stats.rx_frames_512_1023: 16606429 dev.em.1.mac_stats.rx_frames_256_511: 5846534 dev.em.1.mac_stats.rx_frames_128_255: 32959675 dev.em.1.mac_stats.rx_frames_65_127: 346945240 dev.em.1.mac_stats.rx_frames_64: 1566882 dev.em.1.mac_stats.mcast_pkts_recvd: 1942808 dev.em.1.mac_stats.bcast_pkts_recvd: 478650 dev.em.1.mac_stats.good_pkts_recvd: 1771680901 dev.em.1.mac_stats.total_pkts_recvd: 1771681360 dev.em.1.mac_stats.xoff_txd: 1 dev.em.1.mac_stats.xoff_recvd: 0 dev.em.1.mac_stats.xon_txd: 1 dev.em.1.mac_stats.xon_recvd: 0 dev.em.1.mac_stats.coll_ext_errs: 0 dev.em.1.mac_stats.alignment_errs: 0 dev.em.1.mac_stats.crc_errs: 35 dev.em.1.mac_stats.recv_errs: 22 dev.em.1.mac_stats.recv_jabber: 22 dev.em.1.mac_stats.recv_oversize: 295 dev.em.1.mac_stats.recv_fragmented: 0 dev.em.1.mac_stats.recv_undersize: 0 dev.em.1.mac_stats.recv_no_buff: 91985175 dev.em.1.mac_stats.missed_packets: 0 dev.em.1.mac_stats.defer_count: 0 dev.em.1.mac_stats.sequence_errors: 0 dev.em.1.mac_stats.symbol_errors: 0 dev.em.1.mac_stats.collision_count: 0 dev.em.1.mac_stats.late_coll: 0 dev.em.1.mac_stats.multiple_coll: 0 dev.em.1.mac_stats.single_coll: 0 dev.em.1.mac_stats.excess_coll: 0 dev.em.1.queue_rx_1.rx_irq: 0 dev.em.1.queue_rx_1.rxd_tail: 452 dev.em.1.queue_rx_1.rxd_head: 455 dev.em.1.queue_rx_0.rx_irq: 0 dev.em.1.queue_rx_0.rxd_tail: 155 dev.em.1.queue_rx_0.rxd_head: 158 dev.em.1.queue_tx_1.tx_irq: 0 dev.em.1.queue_tx_1.txd_tail: 609 dev.em.1.queue_tx_1.txd_head: 605 dev.em.1.queue_tx_0.tx_irq: 0 dev.em.1.queue_tx_0.txd_tail: 533 dev.em.1.queue_tx_0.txd_head: 532 dev.em.1.fc_low_water: 16932 dev.em.1.fc_high_water: 18432 dev.em.1.rx_control: 67403806 dev.em.1.device_control: 1477444168 dev.em.1.watchdog_timeouts: 0 dev.em.1.rx_overruns: 0 dev.em.1.link_irq: 74 dev.em.1.dropped: 72 dev.em.1.eee_control: 1 dev.em.1.itr: 488 dev.em.1.tx_abs_int_delay: 66 dev.em.1.rx_abs_int_delay: 66 dev.em.1.tx_int_delay: 66 dev.em.1.rx_int_delay: 0 dev.em.1.rs_dump: 0 dev.em.1.reg_dump: General Registers CTRL 58100248 STATUS 00080783 CTRL_EXIT 80580000 Interrupt Registers ICR 80a00083 RX Registers RCTL 0404801e RDLEN 00004000 RDH 000000a3 RDT 000000a2 RXDCTL 01050420 RDBAL 04414000 RDBAH 00000000 TX Registers TCTL 3103f0fa TDBAL 057cc000 TDBAH 00000000 TDLEN 00004000 TDH 0000022d TDT 0000022d TXDCTL 0341011f TDFH 00000b97 TDFT 0000137c TDFHS 00000bea TDFPC 0000000a dev.em.1.fc: 3 dev.em.1.debug: -1 dev.em.1.nvm: -1 dev.em.1.iflib.rxq1.rxq_fl0.buf_size: 2048 dev.em.1.iflib.rxq1.rxq_fl0.credits: 1023 dev.em.1.iflib.rxq1.rxq_fl0.cidx: 520 dev.em.1.iflib.rxq1.rxq_fl0.pidx: 519 dev.em.1.iflib.rxq0.rxq_fl0.buf_size: 2048 dev.em.1.iflib.rxq0.rxq_fl0.credits: 1023 dev.em.1.iflib.rxq0.rxq_fl0.cidx: 166 dev.em.1.iflib.rxq0.rxq_fl0.pidx: 165 dev.em.1.iflib.txq1.r_abdications: 0 dev.em.1.iflib.txq1.r_restarts: 0 dev.em.1.iflib.txq1.r_stalls: 0 dev.em.1.iflib.txq1.r_starts: 13052377 dev.em.1.iflib.txq1.r_drops: 0 dev.em.1.iflib.txq1.r_enqueues: 13058272 dev.em.1.iflib.txq1.ring_state: pidx_head: 0699 pidx_tail: 0699 cidx: 0699 state: IDLE dev.em.1.iflib.txq1.txq_cleaned: 13726331 dev.em.1.iflib.txq1.txq_processed: 13726379 dev.em.1.iflib.txq1.txq_in_use: 50 dev.em.1.iflib.txq1.txq_cidx_processed: 685 dev.em.1.iflib.txq1.txq_cidx: 645 dev.em.1.iflib.txq1.txq_pidx: 695 dev.em.1.iflib.txq1.no_tx_dma_setup: 0 dev.em.1.iflib.txq1.txd_encap_efbig: 0 dev.em.1.iflib.txq1.tx_map_failed: 0 dev.em.1.iflib.txq1.no_desc_avail: 0 dev.em.1.iflib.txq1.mbuf_defrag_failed: 0 dev.em.1.iflib.txq1.m_pullups: 1844 dev.em.1.iflib.txq1.mbuf_defrag: 0 dev.em.1.iflib.txq0.r_abdications: 0 dev.em.1.iflib.txq0.r_restarts: 0 dev.em.1.iflib.txq0.r_stalls: 0 dev.em.1.iflib.txq0.r_starts: 679332 dev.em.1.iflib.txq0.r_drops: 0 dev.em.1.iflib.txq0.r_enqueues: 679664 dev.em.1.iflib.txq0.ring_state: pidx_head: 1840 pidx_tail: 1840 cidx: 1840 state: IDLE dev.em.1.iflib.txq0.txq_cleaned: 1355334 dev.em.1.iflib.txq0.txq_processed: 1355378 dev.em.1.iflib.txq0.txq_in_use: 54 dev.em.1.iflib.txq0.txq_cidx_processed: 626 dev.em.1.iflib.txq0.txq_cidx: 586 dev.em.1.iflib.txq0.txq_pidx: 640 dev.em.1.iflib.txq0.no_tx_dma_setup: 0 dev.em.1.iflib.txq0.txd_encap_efbig: 0 dev.em.1.iflib.txq0.tx_map_failed: 0 dev.em.1.iflib.txq0.no_desc_avail: 0 dev.em.1.iflib.txq0.mbuf_defrag_failed: 0 dev.em.1.iflib.txq0.m_pullups: 1912 dev.em.1.iflib.txq0.mbuf_defrag: 0 dev.em.1.iflib.override_nrxds: 0 dev.em.1.iflib.override_ntxds: 0 dev.em.1.iflib.separate_txrx: 0 dev.em.1.iflib.core_offset: 1 dev.em.1.iflib.tx_abdicate: 0 dev.em.1.iflib.rx_budget: 0 dev.em.1.iflib.disable_msix: 0 dev.em.1.iflib.override_qs_enable: 0 dev.em.1.iflib.override_nrxqs: 0 dev.em.1.iflib.override_ntxqs: 0 dev.em.1.iflib.driver_version: 7.6.1-k dev.em.1.%parent: pci2 dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x0000 class=0x020000 dev.em.1.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.RP05.PXSX dev.em.1.%driver: em dev.em.1.%desc: Intel(R) PRO/1000 Network Connection
sysctl hw.em
hw.em.max_interrupt_rate: 8000 hw.em.eee_setting: 1 hw.em.rx_process_limit: 100 hw.em.sbp: 1 hw.em.smart_pwr_down: 0 hw.em.rx_abs_int_delay: 66 hw.em.tx_abs_int_delay: 66 hw.em.rx_int_delay: 0 hw.em.tx_int_delay: 66 hw.em.disable_crc_stripping: 0
-
Hmm, nothing jumps out there. Pretty much what I see on a test box here.
At this point I would probably be swapping em0 and em1 to be sure it's not a hardware issue.
Steve
-
@stephenw10 I will give that a shot when I get home.
Out of curiosity, if I'm understanding this correctly, it seems like the log messages are indicating a new WAN IP is triggering an interface reload. Am I correct there? Meaning swapping them would go from triggering a reload to the LAN just periodically dropping connectivity?
Also, the motherboard is a Supermicro X9SCM, so both NICs are on board, not that it makes much of a difference.
-
@stephenw10 Okay, so swapping the two, first thing I noticed is the clip on the WAN cable isn't broken off, but it doesn't click. I was planning on re-crimping that end, but out of curiosity, I let everything come back up after a config change (backup config, swap em0 and em1, and import modified config), and I unplugged, left, and plugged back in the WAN on em1.
I see the log is much shorter for hotplug on WAN compared to LAN, though I suspect that's just cause of all the LAN subinterfaces. I did noticed that instead of a "hotplug" message, this time the singular message is "HOTPLUG," and there's only one mention of it. I don't quite know if that's a different thing or not.
I'm currently hammering everything with a 240GB file transfer to see if I can get the error to pop up again on its own with the NIC swap. In the mean time, any idea why a physical unplug generates different log messages than what I was seeing before? Logs are as follows:
Mar 7 19:22:24 hydrogen check_reload_status[309]: Linkup starting em1 Mar 7 19:22:24 hydrogen kernel: em1: link state changed to DOWN Mar 7 19:22:25 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet detached event for wan Mar 7 19:22:26 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:22:43 hydrogen rc.gateway_alarm[90687]: >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:696.418ms RTTsd:130.285ms Loss:26%) Mar 7 19:22:43 hydrogen check_reload_status[309]: updating dyndns WAN_DHCP Mar 7 19:22:43 hydrogen check_reload_status[309]: Restarting ipsec tunnels Mar 7 19:22:43 hydrogen check_reload_status[309]: Restarting OpenVPN tunnels/interfaces Mar 7 19:22:43 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: Gateway, NONE AVAILABLE Mar 7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Mar 7 19:22:44 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (522789) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:45 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (home.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:46 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (mail.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:22:47 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (bitwarden.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ). Mar 7 19:23:27 hydrogen kernel: ovpns2: link state changed to DOWN Mar 7 19:23:27 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:23:53 hydrogen check_reload_status[309]: Linkup starting em1 Mar 7 19:23:53 hydrogen kernel: em1: link state changed to UP Mar 7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet attached event for wan Mar 7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: HOTPLUG: Configuring interface wan Mar 7 19:23:56 hydrogen check_reload_status[309]: rc.newwanip starting em1 Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: calling interface_dhcpv6_configure. Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Accept router advertisements on interface em1 Mar 7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Starting rtsold process Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: Info: starting on em1. Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: on (IP address: 99.198.53.111) (interface: WAN[wan]) (real interface: em1). Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Accept router advertisements on interface em1 Mar 7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Starting rtsold process Mar 7 19:23:58 hydrogen php-fpm[270]: /rc.linkup: Gateway, NONE AVAILABLE Mar 7 19:23:58 hydrogen check_reload_status[309]: Restarting ipsec tunnels Mar 7 19:23:58 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:23:59 hydrogen php-fpm[10424]: /rc.newwanip: The command '/usr/sbin/rtsold -1 -p /var/run/rtsold_em1.pid -M /var/etc/rtsold_em1_script.sh -O /var/etc/rtsold_em1_script.sh em1' returned exit code '1', the output was 'rtsold: failed to open pidfile: File exists' Mar 7 19:23:59 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:24:01 hydrogen check_reload_status[309]: updating dyndns wan Mar 7 19:24:01 hydrogen php-fpm[270]: /rc.linkup: Removing static route for monitor 1.1.1.1 and adding a new route through [REDACTED] Mar 7 19:24:01 hydrogen check_reload_status[309]: Reloading filter Mar 7 19:24:02 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:24:03 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (522789): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:04 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (home.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:05 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (mail.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Mar 7 19:24:06 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied Mar 7 19:24:06 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (bitwarden.REDACTED![hr-H-T-123456-unload-standup-base.png](/assets/uploads/files/1615167452915-hr-h-t-123456-unload-standup-base.png) ): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
-
@stephenw10 Just an update, I swapped the NICs, and re-crimped the end on the WAN cable, and have seen no issues at all since then.
Out of curiosity, to see if it was the cable or not, I swapped them back now that the cable was crimped, and when I hammered it with a file transfer, I saw the same issues pop up again.
So now the question is, these NICs I believe use the same controller. So if the controller had an issue, they'd both be weird. If it was physical, I would normally expect to barely get anything working, if at all, not for it to disconnect under load.
I know heat is a thing, but we're connecting to a switch here, so heat output of the clients wouldn't cause a link on the pfSense box to disconnect. That being said, is there some other thing you can think of that would cause this? Or does it sound correct in assuming that that NIC is disconnecting under load, almost like it's a software issue?
-
If it was a software issue though you would expect the NICs to behave the same. Swapping them would make no difference.
Hard to say what might be causing that.
Steve
-
I agree, if it was software the behavior would be identical on all ports.
There are many subtle ways electronics can fail under load, it's difficult to speculate about why, but usually boils down to heat or some kind of electrical interference which can't be compensated for at high load vs low load.