• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Randomly lost link on WAN interface

Scheduled Pinned Locked Moved General pfSense Questions
10 Posts 4 Posters 1.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • F
    FrozenFiber
    last edited by May 15, 2020, 4:14 PM

    Hello again everyone

    For a good week or so by now, I have experienced the issue of my WAN interface spontaneously deciding to go down and to never come back. It either works fine, or works but constantly loses inbound packets or doesn’t lose any packets but goes up and down again every five seconds. After checking the logs, i found the following sequence of entries constantly repeating itself:

    Gateway log:

    WAN_DHCP WAN Gateway IP: Alarm latency 414us stddev 32us loss 25% (only appears sometimes)

    WAN219_DHCP WAN Gateway IP: sendto error: 65 (repeats itself two or three times per second for 2-3 seconds)

    send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr WAN Gateway IP bind_addr IP in WAN network identifier "WAN219_DHCP "

    General log:

    kernel em0: link state changed to UP

    check_reload_status Linkup starting em0

    Some OpenVPN stuff

    check_reload_status Linkup starting em0

    kernel em0: link state changed to DOWN

    I also sometimes saw this message repeated several times per second:

    arpresolve: can't allocate llinfo for WAN Gateway IP on em0

    After a week of looking through the pfSense forums and other parts of the internet and changing out a lot of hardware and software, I am pretty much helpless and don’t know what to do at this point. It seems to be a configuration issue as everything works fine on factory defaults, at least for some time. A hardware issue is rather unlikely as I have tested four different and all known good cables, three different switches (the firewall WAN is connected to a switch) and I swapped the LAN and WAN interfaces to check whether any issues would occur when using the normally WAN interface as a LAN interface. The power supply also seems alright as I checked the output voltage while connecting a dummy device with more than four times the power draw of the actual computer to it. A second power supply also produced the issue. Unfortunately, all of these test came out negative, everything seems to work just fine, or produce the same error.

    On the software side, I have also tried almost everything I can possibly imagine. The basic things like lowering MTU and MSS and disabling all hardware offloading aside, I have also reset all gateway settings, disabled the firewall completely, manually forced speed and duplex, increased the size of the state table, toggled do not fragment compatibility, everything. Some things actually worked for some time, like reducing the MTU to 1000 B, but after a few hours the issue started coming again.

    Interestingly enough, it seems like other devices have might have something to do with it as the issue occurs after a shorter amount of time when there is another device on the WAN side. When plugging the WAN port into an unmanaged switch without any other device connected, the interface stays up for a good ten minutes or so. As soon as I connect another device, say a laptop for example, the actual link physically goes down and comes back, a cycle repeating every five seconds or so. It’s like every other device randomly sends a kill signal to the WAN port, telling it to turn itself off.

    The most confusing thing for me is that it does not always happen for some reason. On several occasions, the WAN interface worked perfectly fine for almost 20 hours and transported several hundred gigabytes of data and millions of packets before the issues start to arise. Then play around with the settings a little, like resetting to factory defaults, and the issue is gone. Restore the configuration five minutes later and it starts again. Restore to factory defaults again, wait a day or so, and it starts again.

    The computer running pfSense itself is highly reliable from what I can tell (it’s been running for several weeks nonstop before without any issues).

    Some more information about my specific configuration: The only package I have installed is iperf, which is not running most of the time. The firewall is usually running at idle, with at most 10 to 20 percent CPU load. Thermals are also well below 40C. The computer pfSense is running on has two Intel NIC’s, one i211 and one i219-V. The drivers are igb and em respectively. I have used the i219 one as the WAN interface. There are usually six to seven devices connected to the firewall, all in their own VLAN and via a managed switch.

    So here’s a quick summary of what I have already tried with the result:

    Reboot -> No effect
    Disable and reenable Interface -> No effect
    Reinstall pfSense -> Temporary fix
    Reset to factory defaults -> Temporary fix
    Swap LAN and WAN NIC’s -> Temporary fix
    Disable VLANS -> No effect
    Change WAN MTU -> Temporary fix
    Change power supplies -> No effect
    Turn off all hardware offloading -> No effect
    Change WAN side switch -> No effect
    Change LAN side switch -> No effect
    Reset gateway settings -> No effect
    Forced gateway to be always considered up -> No effect
    Increased state table size -> No effect
    Disable firewall -> No effect
    Force Speed & Duplex -> No effect
    Change network cables -> No effect
    Toggled do not fragment compatibility -> No effect
    Toggled state killing on gateway failure -> No effect
    Toggled reset all state if WAN IP address changes -> No effect
    Disabled NTP, DHCP, DNS resolver and all OpenVPN clients -> No effect
    Wait for a new DHCP lease & IP on WAN -> No effect

    At this point, I really don’t have the slightest idea on how to solve this issue and I would be thankful for some ideas! Thanks.

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by May 16, 2020, 2:39 PM

      @FrozenFiber said in Randomly lost link on WAN interface:

      sendto error: 65

      That error indicates there is no route to the gateway:
      https://docs.netgate.com/pfsense/en/latest/routing/gateway-monitoring-errors.html#sendto-error-65

      Since with a DHCP connection the gateway is almost always the next hop it usually implies the WAN IP address has been lost. What does the system loh show at or just before you see that error in the gateway log?

      What is your WAN connection there exactly?

      Steve

      1 Reply Last reply Reply Quote 0
      • F
        FrozenFiber
        last edited by May 16, 2020, 7:05 PM

        Thanks Steve for your answer. I looked through my logs again, and found a good sample of repeating log entries, all related to the issue (otherwise the log is quiet for hours). The IP the DHCP gave me is 192.168.100.50, the gateway is on 192.168.100.100.

        As for my WAN connection, it’s a simple 100Mb Ethernet link leading to my nearest unmanaged switch, which then in turn directly connects to the gateway router, which connects to the internet. This creates a double NAT situation, which is anything but ideal, but that has worked fine for months with pfSense by now.

        error log.txt

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by May 17, 2020, 2:01 PM

          Your WAN interface is actually losing link:

          5/15/2020 19:13	php-fpm	12911		/rc.linkup: DEVD Ethernet detached event for wan
          5/15/2020 19:13	check_reload_status			Linkup starting em0
          5/15/2020 19:13	kernel			em0: link state changed to DOWN
          

          Which now I check back was in your first post!

          It looks like you've swapped out enough things that should eliminate a bad NIC, cable or switch port. It sure looks likely to be one of those things though.

          It's linked at 100M? You could try setting to link speed to 100M-FD fixed if the switch will allow it. That might eliminate any link negotiation issues. It should normally always be set to auto-negotiate though.

          How far is it to the switch? Can you add another switch much closer as a test?

          Steve

          1 Reply Last reply Reply Quote 0
          • V
            valdask
            last edited by Apr 6, 2021, 10:57 AM

            I think that I have very similar problem; WAN randomly loses link.

            I'v put a switch before pfsense, and this issue seems to have disappeared. Using i211 nic and 2.5.0 version.

            I suspect this has to do something with length of cable to my ISP, as it's quite long; maybe signal is too weak and using switch to send it out helps?

            Or it could be something EEE related?

            F 1 Reply Last reply Apr 6, 2021, 2:37 PM Reply Quote 0
            • F
              f.meunier @valdask
              last edited by Apr 6, 2021, 2:37 PM

              Hi all
              I also experienced that kind of trouble (v2.5.0) last week :
              *WAN interface is UP in dashboard
              *gateway is up (connected to an "interconnect" dumb switch : if I connect a laptop it responds)
              *gateway monitor tells me it's down
              *ping from pfSense tells me it's down
              *arp status tells me incomplete resolution for gateway

              Had to reboot pfSense -> everything is OK

              nota : pfSense is set for HA, but second fw is down (voluntarily) since roughly 1 month. Maybe something to do with CARP or HA sync ?

              (mostly ZOTAC CI or CA nano barebones)

              1 Reply Last reply Reply Quote 1
              • S
                stephenw10 Netgate Administrator
                last edited by Apr 10, 2021, 12:57 PM

                You can disable EEE on most NICs with a loader variable or sysctl. Worth trying if you think it might be that.

                Steve

                V 1 Reply Last reply Apr 10, 2021, 7:47 PM Reply Quote 0
                • V
                  valdask @stephenw10
                  last edited by Apr 10, 2021, 7:47 PM

                  @stephenw10 Do you happen to know how to do this for Intel NIC as I believe that FreeBSD 11 -> 12 made some changes and I can't find any docs or guides on how to do that properly; as all google results are out of date?

                  1 Reply Last reply Reply Quote 0
                  • S
                    stephenw10 Netgate Administrator
                    last edited by Apr 10, 2021, 11:46 PM

                    What NIC is it?

                    V 1 Reply Last reply Apr 11, 2021, 6:24 AM Reply Quote 0
                    • V
                      valdask @stephenw10
                      last edited by Apr 11, 2021, 6:24 AM

                      @stephenw10 It should be I211 ( https://ark.intel.com/content/www/us/en/ark/products/64404/intel-ethernet-controller-i211-at.html )

                      $ pciconf -lv | grep -A1 -B3 network
                      
                      igb0@pci0:1:0:0:        class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
                          vendor     = 'Intel Corporation'
                          device     = 'I211 Gigabit Network Connection'
                          class      = network
                          subclass   = ethernet
                      
                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                        This community forum collects and processes your personal information.
                        consent.not_received