Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    WAN NIC losing link on Intel(R) PRO/1000 (only on 2.2.x, not on 2.1.x)

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    26 Posts 12 Posters 8.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • X
      xtofh
      last edited by

      Hi all,

      Following up (with its own thread) on the WAN NIC losing it's link since 2.2.x and not on 2.1.x.

      I've been pointed to a possible solution by cmb in a different thread:

      @cmb:

      This looks to match this:
      https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174

      which should be worked around if you disable MSI and MSIX.
      https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards#MSI.2FMSIX

      I disabled MXIS and enabled MSI. (disabling both leads to non-working interfaces) I can confirm this setting is working by looking during boot time:

      em1: <intel(r) 1000="" pro="" network="" connection="" 7.4.2=""> port 0x2000-0x201f mem 0xf0900000-0xf091ffff,0xf0920000-0xf0923fff at device 0.0 on pci2
      em1: MSIX: insufficient vectors, using MSI
      em1: Using an MSI interrupt</intel(r)>
      

      My current /boot/loader.conf.local contains:

      autoboot_delay="3"
      vm.kmem_size="536870912"
      vm.kmem_size_max="1073741824"
      kern.ipc.nmbclusters="512000"
      boot_multicons="YES"
      boot_serial="YES"
      comconsole_speed="115200"
      console="comconsole,vidconsole"
      hw.usb.no_pf="1"
      vfs.zfs.prefetch_disable="1"
      hw.pci.enable_msix="0"
      hw.pci.enable_msi="1"
      

      I just implemented this in production and will report back in 5-7 days. (or sooner in case the problem returns)

      Regards,
      Kristof.

      1 Reply Last reply Reply Quote 0
      • X
        xtofh
        last edited by

        Just to follow up on this. (not really 5-7 days but I wanted to be sure we're good)

        I think this is solved, our current uptime (without losing connection) is: 15 Days 05 Hours 05 Minutes 08 Seconds

        Thanks to cmb who pointed out the solution. It was all due to driver changes and only on these Intel NICs.

        Regards,
        Kristof.

        1 Reply Last reply Reply Quote 0
        • X
          xtofh
          last edited by

          Unfortunately the problem remains. I tried the different combinations of disabling MSI/MSIX, does not make any difference.

          Anyhting else I can change that might affect this? (it's most likely a driver issue because it worked in previous pfsense versions, 2.1.x)

          1 Reply Last reply Reply Quote 0
          • johnpozJ
            johnpoz LAYER 8 Global Moderator
            last edited by

            so it runs for 15 + days, and you think its a driver issue?  Long time for a driver issue to not show itself.

            An intelligent man is sometimes forced to be drunk to spend time with his fools
            If you get confused: Listen to the Music Play
            Please don't Chat/PM me for help, unless mod related
            SG-4860 24.11 | Lab VMs 2.7.2, 24.11

            1 Reply Last reply Reply Quote 0
            • X
              xtofh
              last edited by

              it might run that long, sometimes it takes only a few days..

              It's the only (type of) firewall that we have (with Intel Pro 1000's) that has this issue. And it's been replaced with similar h/w. (to make sure hardware is not the issue)

              I'm basing my opinion on the reports that show up in freebsd's bugzilla:

              https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199174

              Other suggestions on how to troubleshoot/nuke this problem are appreciated..

              1 Reply Last reply Reply Quote 0
              • D
                DK-Hector
                last edited by

                Hello,

                I have seen the same problem on my Pf-box (2.2.4 32Bit).

                I use a Intel D2500cc with 4gb ram (I know overkill). With a hifn 7955 card onboard. Bios is version: CCCDT10N.86A.0037.2012.1217.1723 and a 30 gb IBM ssd drive. (Overkill too, but it was cheap  :D )

                Intel specs says the onboard nics are 8274GL models acording to this https://downloadmirror.intel.com/20718/eng/D2500CC_ProductGuide02_English.pdf link.

                I can see that my modem from my ISP shows error on the port where the PF is connected. Rebooting PF fixes the problem for a period of time. Sometimes 2,5,10,15 minuttes. Sometimes for 1-2- days.

                2 days ago I reinstalled the PF, this evening. The problem started, after 5 reboots. The WAN NIC didn´t not connect at all to the modem. The setting is a DHCP ip4. And not even a reboot could fix it in the end. Only a reinstall  :'(

                I have a backup of my settings, so a restore fixes all my settings.

                I run this box without a monitor, but now I have hooked up a monitor, and the console shows the following error when the problem starts:
                "ugen3.2: <unknown>at usbus3 disconnected"

                I run a Ipsec and Openvpn connection to a remote site. Also I have spoofed MAC address on the WAN NIC. To make my ISP happy.

                I have a buddy running PF on same exactly same HW configuration, He has also had the problem, where the WAN nic lost all connection.

                Earlier I have never seen this issue. Don´t remember if its only on version 2.2.X

                Any suggestions will be appreciated.</unknown>

                1 Reply Last reply Reply Quote 0
                • D
                  David_W
                  last edited by

                  If the information in FreeBSD bugzilla is correct, turning off TSO will solve the problem for now. You can do this in System -> Advanced, Networking tab, check "Disable hardware TCP segmentation offload" and reboot.

                  I haven't checked GitHub to see if the final version of the patch has been added to pfSense 2.2.5-DEVELOPMENT. I think it unlikely, though, as it hasn't been MFCed to FreeBSD stable/10 yet.

                  If the patch makes it into stable/10 by the time pfSense 2.3 releases, it should appear in 2.3. (2.3 snapshots are definitely alpha grade at present, and not suitable for production use).

                  1 Reply Last reply Reply Quote 0
                  • M
                    mer
                    last edited by

                    When you say reboot, do you mean a power cycle or a warm reset?  Sometimes the hardware doesn't do a "from dead" restart unless it's power cycled.

                    1 Reply Last reply Reply Quote 0
                    • D
                      David_W
                      last edited by

                      A warm reset should be sufficient. Indeed, it should do the same job if you can get to a command prompt and:

                      ifconfig em0 -tso
                      ifconfig em0 -tso4
                      ifconfig em0 -tso6

                      Repeat this for any other em interfaces and ignore any errors - I can't remember which of these three flags em uses and don't have any boxes with a em interface to hand. The box I'm using has interfaces that use the later igb driver.

                      There are also various ways to do this with the PHP shell, or by saving and applying a change to each interface.

                      1 Reply Last reply Reply Quote 0
                      • D
                        deanot
                        last edited by

                        I too have this issue, along with 2 other people that I have been talking with on here.  My box is a little different as it is a Firebox, using Marvel Nics.  This has only started since the 2.2.x upgrade went onto my CF, never has done this before now.

                        Something changed, and I also get the odd times it does it.  I can normally guarantee it will do it if I try to access the GUI, there is a serious bug within.  Unplugging the cable from the Nic and plugging it back it resets the issue, don't know why, but it does.

                        here is the topic I started and some things I have done to try to fix it.

                        https://forum.pfsense.org/index.php?topic=100010.0

                        PFSense System Specs.
                        –---------------
                        Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
                        4 CPUs: 1 package(s) x 4 core(s) 4 port HP Branded Intel Ethernet Card

                        1 Reply Last reply Reply Quote 0
                        • D
                          David_W
                          last edited by

                          @deanot:

                          I too have this issue, along with 2 other people that I have been talking with on here.  My box is a little different as it is a Firebox, using Marvel Nics.  This has only started since the 2.2.x upgrade went onto my CF, never has done this before now.

                          Something changed, and I also get the odd times it does it.

                          With respect, your issue is completely different - as it affects a different NIC using a different driver. The issue discussed in this thread is specific to em interfaces (older / lower spec Intel gigabit NICs - newer / higher spec Intel gigabit NICs use igb).

                          The only thing in common with your issue is that something changed between FreeBSD 8.3 (pfSense 2.1.x) and FreeBSD 10.1 (pfSense 2.2.x) that causes random NIC failure. As there has been a huge amount of development in FreeBSD between those two releases and you are not running hardware supported by ESF/Netgate, you have four choices - debug it yourself, pay someone with suitable experience to debug it for you, hope someone in the community has an answer or change hardware. If you insist on sticking with the hardware you have, turning off the NIC's hardware offloading features might help, as the driver/hardware issue might be to do with offloading.

                          1 Reply Last reply Reply Quote 0
                          • J
                            julicravo
                            last edited by

                            Hello guys,
                            I'm having the same problem as you.

                            I have a network adapter dual port intel 82546eb I use the ESXI 6.0 vmware.

                            Board is really very old.

                            What is the best dual port, you would advise me to buy for my case?

                            I want to use the newest version of PF 2.2.4

                            I thank everyone!

                            1 Reply Last reply Reply Quote 0
                            • J
                              julicravo
                              last edited by

                              guys,

                              I modified in order to disable the TSO.
                              The TSO was already disabled the firewall in advanced networking part.
                              I intend to change the network card, but it makes me yet quiet to do.

                              https://calomel.org/freebsd_network_tuning.html
                              http://www.peerwisdom.org/2013/04/03/large-send-offload-and-network-performance/

                              Thank you.

                              pfsense.png
                              pfsense.png_thumb

                              1 Reply Last reply Reply Quote 0
                              • Y
                                Yowsers
                                last edited by

                                I, too, have an Intel D2500CC with Intel 82574L onboard dual NIC running pfsense nanobsd 2.2.4-RELEASE (amd64).  I have turned off TSO via System -> Advanced, Networking tab, check "Disable hardware TCP segmentation offload."  I have not experienced any noticeable WAN disconnects and certainly none that require me to reboot the pfsense.  Below are some system logs for comparison.  On a related note, after stumbling upon this thread I saw the comment about D2500CC and was curious about this issue.  Turns out I already had TSO disabled so not sure if that was disabled when I imported my monowall config or if it is disabled by default.  I kept a a pretty good record of the settings/tweaks I have applied to pfsense but why TSO was disabled for me I am not sure but I am glad it was as I do not experience any WAN disconnects (maybe it is disabled on nanobsd? not sure.)

                                Oct 24 15:04:49 kernel: em1: Using MSIX interrupts with 3 vectors
                                Oct 24 15:04:49 kernel: em1: <intel(r) 1000="" pro="" network="" connection="" 7.4.2="">port 0x2000-0x201f mem 0x80120000-0x8013ffff,0x80100000-0x8011ffff,0x80140000-0x80143fff irq 17 at device 0.0 on pci1
                                Oct 24 15:04:49 kernel: pci1: <acpi pci="" bus="">on pcib2
                                Oct 24 15:04:49 kernel: pcib2: <acpi pci-pci="" bridge="">at device 28.1 on pci0
                                Oct 24 15:04:49 kernel: em0: Using MSIX interrupts with 3 vectors
                                Oct 24 15:04:49 kernel: em0: <intel(r) 1000="" pro="" network="" connection="" 7.4.2="">port 0x3000-0x301f mem 0x80220000-0x8023ffff,0x80200000-0x8021ffff,0x80240000-0x80243fff irq 16 at device 0.0 on pci2
                                Oct 24 15:04:49 kernel: pci2: <acpi pci="" bus="">on pcib1
                                Oct 24 15:04:49 kernel: pcib1: <acpi pci-pci="" bridge="">at device 28.0 on pci0

                                Intel Desktop Board D2500CC
                                http://www.intel.com/content/www/us/en/motherboards/desktop-motherboards/desktop-board-d2500cc.html

                                Intel Atom D2500 CPU Specifications
                                http://ark.intel.com/products/59682/Intel-Atom-Processor-D2500-1M-Cache-1_86-GHz</acpi></acpi></intel(r)></acpi></acpi></intel(r)>

                                1 Reply Last reply Reply Quote 0
                                • X
                                  xtofh
                                  last edited by

                                  Well, a few days into 2.2.5 now but the issue is not resolved yet.

                                  Intel(R) PRO/1000 with em driver, keeps disconnecting after random amount of time. (can be 1 day, can be 2-3 weeks)
                                  Internet connection is a business account from our local cable provider. (Telenet)

                                  I checked and these are the advanced settings:

                                  • Disable hardware checksum offload - Unchecked.

                                  • Disable hardware TCP segmentation offload - Checked.

                                  • Disable hardware large receive offload - Checked.

                                  But the problem remains after upgrading to 2.2.5.

                                  I temporarily placed an identical model with opnsense in place of this box. I will have to wait until a more recent freebsd is used and will retry then..

                                  1 Reply Last reply Reply Quote 0
                                  • N
                                    NOYB
                                    last edited by

                                    Have you tried "Disable Gateway Monitoring"?  Or configure it to be less sensitive?

                                    1 Reply Last reply Reply Quote 0
                                    • X
                                      xtofh
                                      last edited by

                                      Thanks for the suggestion, I have tried that in the past, same problem.

                                      The WAN interface keeps losing its ip.

                                      1 Reply Last reply Reply Quote 0
                                      • N
                                        NOYB
                                        last edited by

                                        @xtofh:

                                        The WAN interface keeps losing its ip.

                                        Is it static IP or DHCP assignment?  If DHCP assigned then I'd be watching at the renewals for failures / issues with a packet capture of ports 67/68.  What's the lifetime being handed out? (/var/db/dhclient.leases….)

                                        1 Reply Last reply Reply Quote 0
                                        • X
                                          xtofh
                                          last edited by

                                          @NOYB:

                                          Is it static IP or DHCP assignment?  If DHCP assigned then I'd be watching at the renewals for failures / issues with a packet capture of ports 67/68.  What's the lifetime being handed out? (/var/db/dhclient.leases….)

                                          It's dhcp (but a fixed lease), I didn't see any failures on dhcp with tcpdump. When the issue occurs, there are dhcp requests but simply no replies. (and the provider is working) Also, unplugging the ethernet cable, waiting for a minute, plugging it back in, doesn't help. Plugging a different device into the provider gives me an ip immediately.

                                          I have to reboot (by logging into the lan or a different wan interface) the whole system.

                                          /var/db/dhclient.leases.em1:

                                            option dhcp-lease-time 7200;
                                            option dhcp-message-type 5;
                                            option dhcp-server-identifier 195.130.x.y;
                                            option dhcp-renewal-time 3600;
                                            option dhcp-rebinding-time 3660;
                                          
                                          1 Reply Last reply Reply Quote 0
                                          • N
                                            NOYB
                                            last edited by

                                            @xtofh:

                                            I didn't see any failures on dhcp with tcpdump. When the issue occurs, there are dhcp requests but simply no replies.

                                            Why do you think that no replies is not an issue?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.