Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    LAN hotplug errors after upgrade to 2.5.0

    Scheduled Pinned Locked Moved Hardware
    13 Posts 3 Posters 893 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • jimpJ
      jimp Rebel Alliance Developer Netgate
      last edited by

      A "hotplug" event is typically physical -- it lost link with whatever it was connected to.

      We have also seen issues with this in the past in some rare cases where a driver doesn't like some parameter being set on it, such as a spoofed MAC address, which makes it cycle link endlessly. But if that were the case your connectivity would be completely broken and it wouldn't come and go.

      I don't think I've ever seen something like that correlate with load unless there was a hardware issue compounding it (e.g. something overheating).

      The easiest and cheapest things to try are swapping the ethernet cable out completely and maybe switching to another switchport.

      If it is connected to a managed switch you could check the logs there as well and see what it says.

      Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

      Need help fast? Netgate Global Support!

      Do not Chat/PM for help!

      T 1 Reply Last reply Reply Quote 1
      • T
        TechGeek01 @jimp
        last edited by

        @jimp Oh yeah, I've tried swapping the cable several times already. It never did this in 2.4.5. The only change that would have caused that to start is the upgrade. Hardware was left as is. Swapping the cable does nothing. I know for a fact, it's not the cable, switch, or any of the hardware or physical connection.

        My guess is it's something then to do with a change in 2.5 that it doesn't like something, or some timings on an alarm or something are more sensitive now, but I'm not seeing anything jumping out in the logs.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Are you able to check whatever it's connected to to see if it is actually dropping the link for some reason?

          Steve

          T 1 Reply Last reply Reply Quote 1
          • T
            TechGeek01 @stephenw10
            last edited by

            @stephenw10 The switch it's connected to does indeed show disconnect and reconnect log messages (Dell 5548). That's what initially led me to believe it's a hardware thing, but I've tried swapping with a known good cable, and same result.

            Still only happens really under load though. If I leave it idle while I'm at work, I don't really see the disconnect messages or anything.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              You might check the stats from sysctl dev.em.1 and the settings in sysctl hw.em.

              Steve

              T 1 Reply Last reply Reply Quote 1
              • T
                TechGeek01 @stephenw10
                last edited by

                @stephenw10

                sysctl dev.em.1

                dev.em.1.wake: 0
                dev.em.1.interrupts.rx_overrun: 0
                dev.em.1.interrupts.rx_desc_min_thresh: 0
                dev.em.1.interrupts.tx_queue_min_thresh: 0
                dev.em.1.interrupts.tx_queue_empty: 0
                dev.em.1.interrupts.tx_abs_timer: 0
                dev.em.1.interrupts.tx_pkt_timer: 0
                dev.em.1.interrupts.rx_abs_timer: 0
                dev.em.1.interrupts.rx_pkt_timer: 0
                dev.em.1.interrupts.asserts: 216
                dev.em.1.mac_stats.tso_ctx_fail: 0
                dev.em.1.mac_stats.tso_txd: 0
                dev.em.1.mac_stats.tx_frames_1024_1522: 1495503167
                dev.em.1.mac_stats.tx_frames_512_1023: 16495480
                dev.em.1.mac_stats.tx_frames_256_511: 14855170
                dev.em.1.mac_stats.tx_frames_128_255: 25114525
                dev.em.1.mac_stats.tx_frames_65_127: 206999058
                dev.em.1.mac_stats.tx_frames_64: 85971859
                dev.em.1.mac_stats.mcast_pkts_txd: 441585
                dev.em.1.mac_stats.bcast_pkts_txd: 13798
                dev.em.1.mac_stats.good_pkts_txd: 1844939137
                dev.em.1.mac_stats.total_pkts_txd: 1844939231
                dev.em.1.mac_stats.good_octets_txd: 2312840063914
                dev.em.1.mac_stats.good_octets_recvd: 2129400200140
                dev.em.1.mac_stats.rx_frames_1024_1522: 1367756134
                dev.em.1.mac_stats.rx_frames_512_1023: 16606429
                dev.em.1.mac_stats.rx_frames_256_511: 5846534
                dev.em.1.mac_stats.rx_frames_128_255: 32959675
                dev.em.1.mac_stats.rx_frames_65_127: 346945240
                dev.em.1.mac_stats.rx_frames_64: 1566882
                dev.em.1.mac_stats.mcast_pkts_recvd: 1942808
                dev.em.1.mac_stats.bcast_pkts_recvd: 478650
                dev.em.1.mac_stats.good_pkts_recvd: 1771680901
                dev.em.1.mac_stats.total_pkts_recvd: 1771681360
                dev.em.1.mac_stats.xoff_txd: 1
                dev.em.1.mac_stats.xoff_recvd: 0
                dev.em.1.mac_stats.xon_txd: 1
                dev.em.1.mac_stats.xon_recvd: 0
                dev.em.1.mac_stats.coll_ext_errs: 0
                dev.em.1.mac_stats.alignment_errs: 0
                dev.em.1.mac_stats.crc_errs: 35
                dev.em.1.mac_stats.recv_errs: 22
                dev.em.1.mac_stats.recv_jabber: 22
                dev.em.1.mac_stats.recv_oversize: 295
                dev.em.1.mac_stats.recv_fragmented: 0
                dev.em.1.mac_stats.recv_undersize: 0
                dev.em.1.mac_stats.recv_no_buff: 91985175
                dev.em.1.mac_stats.missed_packets: 0
                dev.em.1.mac_stats.defer_count: 0
                dev.em.1.mac_stats.sequence_errors: 0
                dev.em.1.mac_stats.symbol_errors: 0
                dev.em.1.mac_stats.collision_count: 0
                dev.em.1.mac_stats.late_coll: 0
                dev.em.1.mac_stats.multiple_coll: 0
                dev.em.1.mac_stats.single_coll: 0
                dev.em.1.mac_stats.excess_coll: 0
                dev.em.1.queue_rx_1.rx_irq: 0
                dev.em.1.queue_rx_1.rxd_tail: 452
                dev.em.1.queue_rx_1.rxd_head: 455
                dev.em.1.queue_rx_0.rx_irq: 0
                dev.em.1.queue_rx_0.rxd_tail: 155
                dev.em.1.queue_rx_0.rxd_head: 158
                dev.em.1.queue_tx_1.tx_irq: 0
                dev.em.1.queue_tx_1.txd_tail: 609
                dev.em.1.queue_tx_1.txd_head: 605
                dev.em.1.queue_tx_0.tx_irq: 0
                dev.em.1.queue_tx_0.txd_tail: 533
                dev.em.1.queue_tx_0.txd_head: 532
                dev.em.1.fc_low_water: 16932
                dev.em.1.fc_high_water: 18432
                dev.em.1.rx_control: 67403806
                dev.em.1.device_control: 1477444168
                dev.em.1.watchdog_timeouts: 0
                dev.em.1.rx_overruns: 0
                dev.em.1.link_irq: 74
                dev.em.1.dropped: 72
                dev.em.1.eee_control: 1
                dev.em.1.itr: 488
                dev.em.1.tx_abs_int_delay: 66
                dev.em.1.rx_abs_int_delay: 66
                dev.em.1.tx_int_delay: 66
                dev.em.1.rx_int_delay: 0
                dev.em.1.rs_dump: 0
                dev.em.1.reg_dump: General Registers
                        CTRL     58100248
                        STATUS   00080783
                        CTRL_EXIT        80580000
                
                Interrupt Registers
                        ICR      80a00083
                
                RX Registers
                        RCTL     0404801e
                        RDLEN    00004000
                        RDH      000000a3
                        RDT      000000a2
                        RXDCTL   01050420
                        RDBAL    04414000
                        RDBAH    00000000
                
                TX Registers
                        TCTL     3103f0fa
                        TDBAL    057cc000
                        TDBAH    00000000
                        TDLEN    00004000
                        TDH      0000022d
                        TDT      0000022d
                        TXDCTL   0341011f
                        TDFH     00000b97
                        TDFT     0000137c
                        TDFHS    00000bea
                        TDFPC    0000000a
                
                
                dev.em.1.fc: 3
                dev.em.1.debug: -1
                dev.em.1.nvm: -1
                dev.em.1.iflib.rxq1.rxq_fl0.buf_size: 2048
                dev.em.1.iflib.rxq1.rxq_fl0.credits: 1023
                dev.em.1.iflib.rxq1.rxq_fl0.cidx: 520
                dev.em.1.iflib.rxq1.rxq_fl0.pidx: 519
                dev.em.1.iflib.rxq0.rxq_fl0.buf_size: 2048
                dev.em.1.iflib.rxq0.rxq_fl0.credits: 1023
                dev.em.1.iflib.rxq0.rxq_fl0.cidx: 166
                dev.em.1.iflib.rxq0.rxq_fl0.pidx: 165
                dev.em.1.iflib.txq1.r_abdications: 0
                dev.em.1.iflib.txq1.r_restarts: 0
                dev.em.1.iflib.txq1.r_stalls: 0
                dev.em.1.iflib.txq1.r_starts: 13052377
                dev.em.1.iflib.txq1.r_drops: 0
                dev.em.1.iflib.txq1.r_enqueues: 13058272
                dev.em.1.iflib.txq1.ring_state: pidx_head: 0699 pidx_tail: 0699 cidx: 0699 state: IDLE
                dev.em.1.iflib.txq1.txq_cleaned: 13726331
                dev.em.1.iflib.txq1.txq_processed: 13726379
                dev.em.1.iflib.txq1.txq_in_use: 50
                dev.em.1.iflib.txq1.txq_cidx_processed: 685
                dev.em.1.iflib.txq1.txq_cidx: 645
                dev.em.1.iflib.txq1.txq_pidx: 695
                dev.em.1.iflib.txq1.no_tx_dma_setup: 0
                dev.em.1.iflib.txq1.txd_encap_efbig: 0
                dev.em.1.iflib.txq1.tx_map_failed: 0
                dev.em.1.iflib.txq1.no_desc_avail: 0
                dev.em.1.iflib.txq1.mbuf_defrag_failed: 0
                dev.em.1.iflib.txq1.m_pullups: 1844
                dev.em.1.iflib.txq1.mbuf_defrag: 0
                dev.em.1.iflib.txq0.r_abdications: 0
                dev.em.1.iflib.txq0.r_restarts: 0
                dev.em.1.iflib.txq0.r_stalls: 0
                dev.em.1.iflib.txq0.r_starts: 679332
                dev.em.1.iflib.txq0.r_drops: 0
                dev.em.1.iflib.txq0.r_enqueues: 679664
                dev.em.1.iflib.txq0.ring_state: pidx_head: 1840 pidx_tail: 1840 cidx: 1840 state: IDLE
                dev.em.1.iflib.txq0.txq_cleaned: 1355334
                dev.em.1.iflib.txq0.txq_processed: 1355378
                dev.em.1.iflib.txq0.txq_in_use: 54
                dev.em.1.iflib.txq0.txq_cidx_processed: 626
                dev.em.1.iflib.txq0.txq_cidx: 586
                dev.em.1.iflib.txq0.txq_pidx: 640
                dev.em.1.iflib.txq0.no_tx_dma_setup: 0
                dev.em.1.iflib.txq0.txd_encap_efbig: 0
                dev.em.1.iflib.txq0.tx_map_failed: 0
                dev.em.1.iflib.txq0.no_desc_avail: 0
                dev.em.1.iflib.txq0.mbuf_defrag_failed: 0
                dev.em.1.iflib.txq0.m_pullups: 1912
                dev.em.1.iflib.txq0.mbuf_defrag: 0
                dev.em.1.iflib.override_nrxds: 0
                dev.em.1.iflib.override_ntxds: 0
                dev.em.1.iflib.separate_txrx: 0
                dev.em.1.iflib.core_offset: 1
                dev.em.1.iflib.tx_abdicate: 0
                dev.em.1.iflib.rx_budget: 0
                dev.em.1.iflib.disable_msix: 0
                dev.em.1.iflib.override_qs_enable: 0
                dev.em.1.iflib.override_nrxqs: 0
                dev.em.1.iflib.override_ntxqs: 0
                dev.em.1.iflib.driver_version: 7.6.1-k
                dev.em.1.%parent: pci2
                dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x0000 class=0x020000
                dev.em.1.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.RP05.PXSX
                dev.em.1.%driver: em
                dev.em.1.%desc: Intel(R) PRO/1000 Network Connection
                

                sysctl hw.em

                hw.em.max_interrupt_rate: 8000
                hw.em.eee_setting: 1
                hw.em.rx_process_limit: 100
                hw.em.sbp: 1
                hw.em.smart_pwr_down: 0
                hw.em.rx_abs_int_delay: 66
                hw.em.tx_abs_int_delay: 66
                hw.em.rx_int_delay: 0
                hw.em.tx_int_delay: 66
                hw.em.disable_crc_stripping: 0
                
                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  Hmm, nothing jumps out there. Pretty much what I see on a test box here.

                  At this point I would probably be swapping em0 and em1 to be sure it's not a hardware issue.

                  Steve

                  T 3 Replies Last reply Reply Quote 1
                  • T
                    TechGeek01 @stephenw10
                    last edited by

                    @stephenw10 I will give that a shot when I get home.

                    Out of curiosity, if I'm understanding this correctly, it seems like the log messages are indicating a new WAN IP is triggering an interface reload. Am I correct there? Meaning swapping them would go from triggering a reload to the LAN just periodically dropping connectivity?

                    Also, the motherboard is a Supermicro X9SCM, so both NICs are on board, not that it makes much of a difference.

                    1 Reply Last reply Reply Quote 0
                    • T
                      TechGeek01 @stephenw10
                      last edited by

                      @stephenw10 Okay, so swapping the two, first thing I noticed is the clip on the WAN cable isn't broken off, but it doesn't click. I was planning on re-crimping that end, but out of curiosity, I let everything come back up after a config change (backup config, swap em0 and em1, and import modified config), and I unplugged, left, and plugged back in the WAN on em1.

                      I see the log is much shorter for hotplug on WAN compared to LAN, though I suspect that's just cause of all the LAN subinterfaces. I did noticed that instead of a "hotplug" message, this time the singular message is "HOTPLUG," and there's only one mention of it. I don't quite know if that's a different thing or not.

                      I'm currently hammering everything with a 240GB file transfer to see if I can get the error to pop up again on its own with the NIC swap. In the mean time, any idea why a physical unplug generates different log messages than what I was seeing before? Logs are as follows:

                      Mar  7 19:22:24 hydrogen check_reload_status[309]: Linkup starting em1
                      Mar  7 19:22:24 hydrogen kernel: em1: link state changed to DOWN
                      Mar  7 19:22:25 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet detached event for wan
                      Mar  7 19:22:26 hydrogen check_reload_status[309]: Reloading filter
                      Mar  7 19:22:43 hydrogen rc.gateway_alarm[90687]: >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:696.418ms RTTsd:130.285ms Loss:26%)
                      Mar  7 19:22:43 hydrogen check_reload_status[309]: updating dyndns WAN_DHCP
                      Mar  7 19:22:43 hydrogen check_reload_status[309]: Restarting ipsec tunnels
                      Mar  7 19:22:43 hydrogen check_reload_status[309]: Restarting OpenVPN tunnels/interfaces
                      Mar  7 19:22:43 hydrogen check_reload_status[309]: Reloading filter
                      Mar  7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: Gateway, NONE AVAILABLE
                      Mar  7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
                      Mar  7 19:22:44 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (522789) There was an error trying to determine the public IP for interface - wan (em1 ).
                      Mar  7 19:22:45 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (home.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                      Mar  7 19:22:46 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (mail.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                      Mar  7 19:22:47 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (bitwarden.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                      Mar  7 19:23:27 hydrogen kernel: ovpns2: link state changed to DOWN
                      Mar  7 19:23:27 hydrogen check_reload_status[309]: Reloading filter
                      Mar  7 19:23:53 hydrogen check_reload_status[309]: Linkup starting em1
                      Mar  7 19:23:53 hydrogen kernel: em1: link state changed to UP
                      Mar  7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet attached event for wan
                      Mar  7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: HOTPLUG: Configuring interface wan
                      Mar  7 19:23:56 hydrogen check_reload_status[309]: rc.newwanip starting em1
                      Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: calling interface_dhcpv6_configure.
                      Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Accept router advertisements on interface em1 
                      Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Starting rtsold process
                      Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: Info: starting on em1.
                      Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: on (IP address: 99.198.53.111) (interface: WAN[wan]) (real interface: em1).
                      Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Accept router advertisements on interface em1 
                      Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Starting rtsold process
                      Mar  7 19:23:58 hydrogen php-fpm[270]: /rc.linkup: Gateway, NONE AVAILABLE
                      Mar  7 19:23:58 hydrogen check_reload_status[309]: Restarting ipsec tunnels
                      Mar  7 19:23:58 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                      Mar  7 19:23:59 hydrogen php-fpm[10424]: /rc.newwanip: The command '/usr/sbin/rtsold -1 -p /var/run/rtsold_em1.pid -M /var/etc/rtsold_em1_script.sh -O /var/etc/rtsold_em1_script.sh em1' returned exit code '1', the output was 'rtsold: failed to open pidfile: File exists' 
                      Mar  7 19:23:59 hydrogen check_reload_status[309]: Reloading filter
                      Mar  7 19:24:01 hydrogen check_reload_status[309]: updating dyndns wan
                      Mar  7 19:24:01 hydrogen php-fpm[270]: /rc.linkup: Removing static route for monitor 1.1.1.1 and adding a new route through [REDACTED]
                      Mar  7 19:24:01 hydrogen check_reload_status[309]: Reloading filter
                      Mar  7 19:24:02 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                      Mar  7 19:24:03 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (522789): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                      Mar  7 19:24:04 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (home.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                      Mar  7 19:24:05 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (mail.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                      Mar  7 19:24:06 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                      Mar  7 19:24:06 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (bitwarden.REDACTED![hr-H-T-123456-unload-standup-base.png](/assets/uploads/files/1615167452915-hr-h-t-123456-unload-standup-base.png) ): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                      
                      1 Reply Last reply Reply Quote 0
                      • T
                        TechGeek01 @stephenw10
                        last edited by

                        @stephenw10 Just an update, I swapped the NICs, and re-crimped the end on the WAN cable, and have seen no issues at all since then.

                        Out of curiosity, to see if it was the cable or not, I swapped them back now that the cable was crimped, and when I hammered it with a file transfer, I saw the same issues pop up again.

                        So now the question is, these NICs I believe use the same controller. So if the controller had an issue, they'd both be weird. If it was physical, I would normally expect to barely get anything working, if at all, not for it to disconnect under load.

                        I know heat is a thing, but we're connecting to a switch here, so heat output of the clients wouldn't cause a link on the pfSense box to disconnect. That being said, is there some other thing you can think of that would cause this? Or does it sound correct in assuming that that NIC is disconnecting under load, almost like it's a software issue?

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          If it was a software issue though you would expect the NICs to behave the same. Swapping them would make no difference.

                          Hard to say what might be causing that.

                          Steve

                          1 Reply Last reply Reply Quote 1
                          • jimpJ
                            jimp Rebel Alliance Developer Netgate
                            last edited by

                            I agree, if it was software the behavior would be identical on all ports.

                            There are many subtle ways electronics can fail under load, it's difficult to speculate about why, but usually boils down to heat or some kind of electrical interference which can't be compensated for at high load vs low load.

                            Remember: Upvote with the ๐Ÿ‘ button for any user/post you find to be helpful, informative, or deserving of recognition!

                            Need help fast? Netgate Global Support!

                            Do not Chat/PM for help!

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.