Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    LAN hotplug errors after upgrade to 2.5.0

    Scheduled Pinned Locked Moved Hardware
    13 Posts 3 Posters 893 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • T
      TechGeek01 @jimp
      last edited by

      @jimp Oh yeah, I've tried swapping the cable several times already. It never did this in 2.4.5. The only change that would have caused that to start is the upgrade. Hardware was left as is. Swapping the cable does nothing. I know for a fact, it's not the cable, switch, or any of the hardware or physical connection.

      My guess is it's something then to do with a change in 2.5 that it doesn't like something, or some timings on an alarm or something are more sensitive now, but I'm not seeing anything jumping out in the logs.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Are you able to check whatever it's connected to to see if it is actually dropping the link for some reason?

        Steve

        T 1 Reply Last reply Reply Quote 1
        • T
          TechGeek01 @stephenw10
          last edited by

          @stephenw10 The switch it's connected to does indeed show disconnect and reconnect log messages (Dell 5548). That's what initially led me to believe it's a hardware thing, but I've tried swapping with a known good cable, and same result.

          Still only happens really under load though. If I leave it idle while I'm at work, I don't really see the disconnect messages or anything.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            You might check the stats from sysctl dev.em.1 and the settings in sysctl hw.em.

            Steve

            T 1 Reply Last reply Reply Quote 1
            • T
              TechGeek01 @stephenw10
              last edited by

              @stephenw10

              sysctl dev.em.1

              dev.em.1.wake: 0
              dev.em.1.interrupts.rx_overrun: 0
              dev.em.1.interrupts.rx_desc_min_thresh: 0
              dev.em.1.interrupts.tx_queue_min_thresh: 0
              dev.em.1.interrupts.tx_queue_empty: 0
              dev.em.1.interrupts.tx_abs_timer: 0
              dev.em.1.interrupts.tx_pkt_timer: 0
              dev.em.1.interrupts.rx_abs_timer: 0
              dev.em.1.interrupts.rx_pkt_timer: 0
              dev.em.1.interrupts.asserts: 216
              dev.em.1.mac_stats.tso_ctx_fail: 0
              dev.em.1.mac_stats.tso_txd: 0
              dev.em.1.mac_stats.tx_frames_1024_1522: 1495503167
              dev.em.1.mac_stats.tx_frames_512_1023: 16495480
              dev.em.1.mac_stats.tx_frames_256_511: 14855170
              dev.em.1.mac_stats.tx_frames_128_255: 25114525
              dev.em.1.mac_stats.tx_frames_65_127: 206999058
              dev.em.1.mac_stats.tx_frames_64: 85971859
              dev.em.1.mac_stats.mcast_pkts_txd: 441585
              dev.em.1.mac_stats.bcast_pkts_txd: 13798
              dev.em.1.mac_stats.good_pkts_txd: 1844939137
              dev.em.1.mac_stats.total_pkts_txd: 1844939231
              dev.em.1.mac_stats.good_octets_txd: 2312840063914
              dev.em.1.mac_stats.good_octets_recvd: 2129400200140
              dev.em.1.mac_stats.rx_frames_1024_1522: 1367756134
              dev.em.1.mac_stats.rx_frames_512_1023: 16606429
              dev.em.1.mac_stats.rx_frames_256_511: 5846534
              dev.em.1.mac_stats.rx_frames_128_255: 32959675
              dev.em.1.mac_stats.rx_frames_65_127: 346945240
              dev.em.1.mac_stats.rx_frames_64: 1566882
              dev.em.1.mac_stats.mcast_pkts_recvd: 1942808
              dev.em.1.mac_stats.bcast_pkts_recvd: 478650
              dev.em.1.mac_stats.good_pkts_recvd: 1771680901
              dev.em.1.mac_stats.total_pkts_recvd: 1771681360
              dev.em.1.mac_stats.xoff_txd: 1
              dev.em.1.mac_stats.xoff_recvd: 0
              dev.em.1.mac_stats.xon_txd: 1
              dev.em.1.mac_stats.xon_recvd: 0
              dev.em.1.mac_stats.coll_ext_errs: 0
              dev.em.1.mac_stats.alignment_errs: 0
              dev.em.1.mac_stats.crc_errs: 35
              dev.em.1.mac_stats.recv_errs: 22
              dev.em.1.mac_stats.recv_jabber: 22
              dev.em.1.mac_stats.recv_oversize: 295
              dev.em.1.mac_stats.recv_fragmented: 0
              dev.em.1.mac_stats.recv_undersize: 0
              dev.em.1.mac_stats.recv_no_buff: 91985175
              dev.em.1.mac_stats.missed_packets: 0
              dev.em.1.mac_stats.defer_count: 0
              dev.em.1.mac_stats.sequence_errors: 0
              dev.em.1.mac_stats.symbol_errors: 0
              dev.em.1.mac_stats.collision_count: 0
              dev.em.1.mac_stats.late_coll: 0
              dev.em.1.mac_stats.multiple_coll: 0
              dev.em.1.mac_stats.single_coll: 0
              dev.em.1.mac_stats.excess_coll: 0
              dev.em.1.queue_rx_1.rx_irq: 0
              dev.em.1.queue_rx_1.rxd_tail: 452
              dev.em.1.queue_rx_1.rxd_head: 455
              dev.em.1.queue_rx_0.rx_irq: 0
              dev.em.1.queue_rx_0.rxd_tail: 155
              dev.em.1.queue_rx_0.rxd_head: 158
              dev.em.1.queue_tx_1.tx_irq: 0
              dev.em.1.queue_tx_1.txd_tail: 609
              dev.em.1.queue_tx_1.txd_head: 605
              dev.em.1.queue_tx_0.tx_irq: 0
              dev.em.1.queue_tx_0.txd_tail: 533
              dev.em.1.queue_tx_0.txd_head: 532
              dev.em.1.fc_low_water: 16932
              dev.em.1.fc_high_water: 18432
              dev.em.1.rx_control: 67403806
              dev.em.1.device_control: 1477444168
              dev.em.1.watchdog_timeouts: 0
              dev.em.1.rx_overruns: 0
              dev.em.1.link_irq: 74
              dev.em.1.dropped: 72
              dev.em.1.eee_control: 1
              dev.em.1.itr: 488
              dev.em.1.tx_abs_int_delay: 66
              dev.em.1.rx_abs_int_delay: 66
              dev.em.1.tx_int_delay: 66
              dev.em.1.rx_int_delay: 0
              dev.em.1.rs_dump: 0
              dev.em.1.reg_dump: General Registers
                      CTRL     58100248
                      STATUS   00080783
                      CTRL_EXIT        80580000
              
              Interrupt Registers
                      ICR      80a00083
              
              RX Registers
                      RCTL     0404801e
                      RDLEN    00004000
                      RDH      000000a3
                      RDT      000000a2
                      RXDCTL   01050420
                      RDBAL    04414000
                      RDBAH    00000000
              
              TX Registers
                      TCTL     3103f0fa
                      TDBAL    057cc000
                      TDBAH    00000000
                      TDLEN    00004000
                      TDH      0000022d
                      TDT      0000022d
                      TXDCTL   0341011f
                      TDFH     00000b97
                      TDFT     0000137c
                      TDFHS    00000bea
                      TDFPC    0000000a
              
              
              dev.em.1.fc: 3
              dev.em.1.debug: -1
              dev.em.1.nvm: -1
              dev.em.1.iflib.rxq1.rxq_fl0.buf_size: 2048
              dev.em.1.iflib.rxq1.rxq_fl0.credits: 1023
              dev.em.1.iflib.rxq1.rxq_fl0.cidx: 520
              dev.em.1.iflib.rxq1.rxq_fl0.pidx: 519
              dev.em.1.iflib.rxq0.rxq_fl0.buf_size: 2048
              dev.em.1.iflib.rxq0.rxq_fl0.credits: 1023
              dev.em.1.iflib.rxq0.rxq_fl0.cidx: 166
              dev.em.1.iflib.rxq0.rxq_fl0.pidx: 165
              dev.em.1.iflib.txq1.r_abdications: 0
              dev.em.1.iflib.txq1.r_restarts: 0
              dev.em.1.iflib.txq1.r_stalls: 0
              dev.em.1.iflib.txq1.r_starts: 13052377
              dev.em.1.iflib.txq1.r_drops: 0
              dev.em.1.iflib.txq1.r_enqueues: 13058272
              dev.em.1.iflib.txq1.ring_state: pidx_head: 0699 pidx_tail: 0699 cidx: 0699 state: IDLE
              dev.em.1.iflib.txq1.txq_cleaned: 13726331
              dev.em.1.iflib.txq1.txq_processed: 13726379
              dev.em.1.iflib.txq1.txq_in_use: 50
              dev.em.1.iflib.txq1.txq_cidx_processed: 685
              dev.em.1.iflib.txq1.txq_cidx: 645
              dev.em.1.iflib.txq1.txq_pidx: 695
              dev.em.1.iflib.txq1.no_tx_dma_setup: 0
              dev.em.1.iflib.txq1.txd_encap_efbig: 0
              dev.em.1.iflib.txq1.tx_map_failed: 0
              dev.em.1.iflib.txq1.no_desc_avail: 0
              dev.em.1.iflib.txq1.mbuf_defrag_failed: 0
              dev.em.1.iflib.txq1.m_pullups: 1844
              dev.em.1.iflib.txq1.mbuf_defrag: 0
              dev.em.1.iflib.txq0.r_abdications: 0
              dev.em.1.iflib.txq0.r_restarts: 0
              dev.em.1.iflib.txq0.r_stalls: 0
              dev.em.1.iflib.txq0.r_starts: 679332
              dev.em.1.iflib.txq0.r_drops: 0
              dev.em.1.iflib.txq0.r_enqueues: 679664
              dev.em.1.iflib.txq0.ring_state: pidx_head: 1840 pidx_tail: 1840 cidx: 1840 state: IDLE
              dev.em.1.iflib.txq0.txq_cleaned: 1355334
              dev.em.1.iflib.txq0.txq_processed: 1355378
              dev.em.1.iflib.txq0.txq_in_use: 54
              dev.em.1.iflib.txq0.txq_cidx_processed: 626
              dev.em.1.iflib.txq0.txq_cidx: 586
              dev.em.1.iflib.txq0.txq_pidx: 640
              dev.em.1.iflib.txq0.no_tx_dma_setup: 0
              dev.em.1.iflib.txq0.txd_encap_efbig: 0
              dev.em.1.iflib.txq0.tx_map_failed: 0
              dev.em.1.iflib.txq0.no_desc_avail: 0
              dev.em.1.iflib.txq0.mbuf_defrag_failed: 0
              dev.em.1.iflib.txq0.m_pullups: 1912
              dev.em.1.iflib.txq0.mbuf_defrag: 0
              dev.em.1.iflib.override_nrxds: 0
              dev.em.1.iflib.override_ntxds: 0
              dev.em.1.iflib.separate_txrx: 0
              dev.em.1.iflib.core_offset: 1
              dev.em.1.iflib.tx_abdicate: 0
              dev.em.1.iflib.rx_budget: 0
              dev.em.1.iflib.disable_msix: 0
              dev.em.1.iflib.override_qs_enable: 0
              dev.em.1.iflib.override_nrxqs: 0
              dev.em.1.iflib.override_ntxqs: 0
              dev.em.1.iflib.driver_version: 7.6.1-k
              dev.em.1.%parent: pci2
              dev.em.1.%pnpinfo: vendor=0x8086 device=0x10d3 subvendor=0x15d9 subdevice=0x0000 class=0x020000
              dev.em.1.%location: slot=0 function=0 dbsf=pci0:2:0:0 handle=\_SB_.PCI0.RP05.PXSX
              dev.em.1.%driver: em
              dev.em.1.%desc: Intel(R) PRO/1000 Network Connection
              

              sysctl hw.em

              hw.em.max_interrupt_rate: 8000
              hw.em.eee_setting: 1
              hw.em.rx_process_limit: 100
              hw.em.sbp: 1
              hw.em.smart_pwr_down: 0
              hw.em.rx_abs_int_delay: 66
              hw.em.tx_abs_int_delay: 66
              hw.em.rx_int_delay: 0
              hw.em.tx_int_delay: 66
              hw.em.disable_crc_stripping: 0
              
              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, nothing jumps out there. Pretty much what I see on a test box here.

                At this point I would probably be swapping em0 and em1 to be sure it's not a hardware issue.

                Steve

                T 3 Replies Last reply Reply Quote 1
                • T
                  TechGeek01 @stephenw10
                  last edited by

                  @stephenw10 I will give that a shot when I get home.

                  Out of curiosity, if I'm understanding this correctly, it seems like the log messages are indicating a new WAN IP is triggering an interface reload. Am I correct there? Meaning swapping them would go from triggering a reload to the LAN just periodically dropping connectivity?

                  Also, the motherboard is a Supermicro X9SCM, so both NICs are on board, not that it makes much of a difference.

                  1 Reply Last reply Reply Quote 0
                  • T
                    TechGeek01 @stephenw10
                    last edited by

                    @stephenw10 Okay, so swapping the two, first thing I noticed is the clip on the WAN cable isn't broken off, but it doesn't click. I was planning on re-crimping that end, but out of curiosity, I let everything come back up after a config change (backup config, swap em0 and em1, and import modified config), and I unplugged, left, and plugged back in the WAN on em1.

                    I see the log is much shorter for hotplug on WAN compared to LAN, though I suspect that's just cause of all the LAN subinterfaces. I did noticed that instead of a "hotplug" message, this time the singular message is "HOTPLUG," and there's only one mention of it. I don't quite know if that's a different thing or not.

                    I'm currently hammering everything with a 240GB file transfer to see if I can get the error to pop up again on its own with the NIC swap. In the mean time, any idea why a physical unplug generates different log messages than what I was seeing before? Logs are as follows:

                    Mar  7 19:22:24 hydrogen check_reload_status[309]: Linkup starting em1
                    Mar  7 19:22:24 hydrogen kernel: em1: link state changed to DOWN
                    Mar  7 19:22:25 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet detached event for wan
                    Mar  7 19:22:26 hydrogen check_reload_status[309]: Reloading filter
                    Mar  7 19:22:43 hydrogen rc.gateway_alarm[90687]: >>> Gateway alarm: WAN_DHCP (Addr:1.1.1.1 Alarm:1 RTT:696.418ms RTTsd:130.285ms Loss:26%)
                    Mar  7 19:22:43 hydrogen check_reload_status[309]: updating dyndns WAN_DHCP
                    Mar  7 19:22:43 hydrogen check_reload_status[309]: Restarting ipsec tunnels
                    Mar  7 19:22:43 hydrogen check_reload_status[309]: Restarting OpenVPN tunnels/interfaces
                    Mar  7 19:22:43 hydrogen check_reload_status[309]: Reloading filter
                    Mar  7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: Gateway, NONE AVAILABLE
                    Mar  7 19:22:44 hydrogen php-fpm[83648]: /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP.
                    Mar  7 19:22:44 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (522789) There was an error trying to determine the public IP for interface - wan (em1 ).
                    Mar  7 19:22:45 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (home.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                    Mar  7 19:22:46 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (mail.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                    Mar  7 19:22:47 hydrogen php-fpm[271]: /rc.dyndns.update: Dynamic DNS (bitwarden.REDACTED) There was an error trying to determine the public IP for interface - wan (em1 ).
                    Mar  7 19:23:27 hydrogen kernel: ovpns2: link state changed to DOWN
                    Mar  7 19:23:27 hydrogen check_reload_status[309]: Reloading filter
                    Mar  7 19:23:53 hydrogen check_reload_status[309]: Linkup starting em1
                    Mar  7 19:23:53 hydrogen kernel: em1: link state changed to UP
                    Mar  7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: DEVD Ethernet attached event for wan
                    Mar  7 19:23:54 hydrogen php-fpm[270]: /rc.linkup: HOTPLUG: Configuring interface wan
                    Mar  7 19:23:56 hydrogen check_reload_status[309]: rc.newwanip starting em1
                    Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: calling interface_dhcpv6_configure.
                    Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Accept router advertisements on interface em1 
                    Mar  7 19:23:56 hydrogen php-fpm[270]: /rc.linkup: Starting rtsold process
                    Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: Info: starting on em1.
                    Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: rc.newwanip: on (IP address: 99.198.53.111) (interface: WAN[wan]) (real interface: em1).
                    Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Accept router advertisements on interface em1 
                    Mar  7 19:23:57 hydrogen php-fpm[10424]: /rc.newwanip: Starting rtsold process
                    Mar  7 19:23:58 hydrogen php-fpm[270]: /rc.linkup: Gateway, NONE AVAILABLE
                    Mar  7 19:23:58 hydrogen check_reload_status[309]: Restarting ipsec tunnels
                    Mar  7 19:23:58 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                    Mar  7 19:23:59 hydrogen php-fpm[10424]: /rc.newwanip: The command '/usr/sbin/rtsold -1 -p /var/run/rtsold_em1.pid -M /var/etc/rtsold_em1_script.sh -O /var/etc/rtsold_em1_script.sh em1' returned exit code '1', the output was 'rtsold: failed to open pidfile: File exists' 
                    Mar  7 19:23:59 hydrogen check_reload_status[309]: Reloading filter
                    Mar  7 19:24:01 hydrogen check_reload_status[309]: updating dyndns wan
                    Mar  7 19:24:01 hydrogen php-fpm[270]: /rc.linkup: Removing static route for monitor 1.1.1.1 and adding a new route through [REDACTED]
                    Mar  7 19:24:01 hydrogen check_reload_status[309]: Reloading filter
                    Mar  7 19:24:02 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                    Mar  7 19:24:03 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (522789): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                    Mar  7 19:24:04 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (home.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                    Mar  7 19:24:05 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (mail.REDACTED): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                    Mar  7 19:24:06 hydrogen rtsold[90613]: <cap_rssend> sendmsg on em1: Permission denied
                    Mar  7 19:24:06 hydrogen php-fpm[83648]: /rc.dyndns.update: phpDynDNS (bitwarden.REDACTED![hr-H-T-123456-unload-standup-base.png](/assets/uploads/files/1615167452915-hr-h-t-123456-unload-standup-base.png) ): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry.
                    
                    1 Reply Last reply Reply Quote 0
                    • T
                      TechGeek01 @stephenw10
                      last edited by

                      @stephenw10 Just an update, I swapped the NICs, and re-crimped the end on the WAN cable, and have seen no issues at all since then.

                      Out of curiosity, to see if it was the cable or not, I swapped them back now that the cable was crimped, and when I hammered it with a file transfer, I saw the same issues pop up again.

                      So now the question is, these NICs I believe use the same controller. So if the controller had an issue, they'd both be weird. If it was physical, I would normally expect to barely get anything working, if at all, not for it to disconnect under load.

                      I know heat is a thing, but we're connecting to a switch here, so heat output of the clients wouldn't cause a link on the pfSense box to disconnect. That being said, is there some other thing you can think of that would cause this? Or does it sound correct in assuming that that NIC is disconnecting under load, almost like it's a software issue?

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        If it was a software issue though you would expect the NICs to behave the same. Swapping them would make no difference.

                        Hard to say what might be causing that.

                        Steve

                        1 Reply Last reply Reply Quote 1
                        • jimpJ
                          jimp Rebel Alliance Developer Netgate
                          last edited by

                          I agree, if it was software the behavior would be identical on all ports.

                          There are many subtle ways electronics can fail under load, it's difficult to speculate about why, but usually boils down to heat or some kind of electrical interference which can't be compensated for at high load vs low load.

                          Remember: Upvote with the 👍 button for any user/post you find to be helpful, informative, or deserving of recognition!

                          Need help fast? Netgate Global Support!

                          Do not Chat/PM for help!

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.