WAN starts cycling link after Ethernet link loss



  • I've just installed pfSense on new hardware, and am having trouble with the behavior of the WAN port. It'll work fine after I boot the machine UNTIL the first time I lose Ethernet link. This typically happens because the CMTS link goes down, causing my cable modem to drop the Ethernet link.

    Once the Ethernet link goes down the first time, I see the WAN port go down for about five seconds every 20 seconds or so. Usually this is behavior that I associated with one side of the link trying to run autonegotiation, with the other side having a fixed setting. However, it negotiates the link just fine the FIRST time after the machine boots.

    If I put an Ethernet switch between the cable modem and the WAN port, this means that the WAN port never loses Ethernet link and everything works fine.

    Any suggestions on a good way to get this to tolerate the Ethernet link going down?

    Thanks!

    I'm running the following versions:

    2.4.4-RELEASE-p1 (amd64)
    built on Mon Nov 26 11:40:26 EST 2018
    FreeBSD 11.2-RELEASE-p4

    Hardware is Intel(R) Celeron(R) CPU J1900 @ 1.99GHz with Intel NIC's running under the Intel PRO/1000 driver.



  • I'll also add that it falls into this state if I do anything that causes the WAN interface configuration to reload.

    ISP is Comcast with IPv6 enabled.



  • I had the same happen on my APU2C4s.

    This may sound strange, but the workaround there is to set WAN Speed and Duplex to 'Default' instead of 'Autoselect'.



  • @jitguy That seems to have fixed it, thank you!

    I'd call this a "bug"...it acts like setting "Autoselect" actually disables auto-negotiation!


  • LAYER 8 Netgate

    It runs another ifconfig to explicitly set autoselect. When it does it bounces the link. That is probably freaking out what it is connecting do and it is bouncing the link too, initiating a negotiation loop.

    I wouldn't call it a bug in pfSense. It could be a bug in the modem. :)

    I would call it an incompatibility caused by someone that shouldn't be tweaking settings and checking boxes unnecessarily since default (usually autoselect) is the default for a reason.



  • @derelict It sounds like a timing/race condition between the script code that's trying to configure the port and the Ethernet driver.

    Why is the pfSense GUI doing ifconfig's to the running port configuration instead of updating the default configuration in the filesystem and then telling the port to reload defaults? If the current GUI settings match the filesystem defaults, then you don't need to touch the port. This also helps avoid potential race conditions between the initial operating system initialization of the Ethernet driver and the later initialization performed after pfSense has started.

    I've configured plenty of network devices in my time, and I've never before had explicitly setting a port to auto-negotiation take the port down while the other end of the link was also set to auto-negotiate. Saying that the "user shouldn't do that" doesn't make pfSense any more robust or reliable.


  • LAYER 8 Netgate

    If you really believe it is a bug and have steps to reproduce, a bug can be opened at https://redmine.pfsense.org/

    Of course, a pull request with a fix is always appreciated.



  • @derelict said in WAN starts cycling link after Ethernet link loss:

    It runs another ifconfig to explicitly set autoselect. When it does it bounces the link. That is probably freaking out what it is connecting do and it is bouncing the link too, initiating a negotiation loop.

    I wouldn't call it a bug in pfSense. It could be a bug in the modem. :)

    I would call it an incompatibility caused by someone that shouldn't be tweaking settings and checking boxes unnecessarily since default (usually autoselect) is the default for a reason.

    Nice explanation. Not sure of the need to slam those of us that have run into that problem though. After all, the GUI text specifically says:

    "WARNING: MUST be set to autoselect (automatically negotiate speed) unless the port this interface connects to has its speed and duplex forced."

    A bit of a stretch to blame us for heeding the bold warning.



  • I can certainly confirm this. Its still happening. Spend two weeks wondering why my connection dies when cable modem cuts off even for a second. Every time modem connection drops WAN nic started to throttle up and down. Had autoselect selected in WAN "Speed and Duplex" box. "After selecting Default (no preference, typically autoselect)" problem is gone. This doesnt make any sense since they bot are "autoselect". Version: 2.4.4-RELEASE-p3 (amd64) Device: Zotac ZBOX CI323 nano.


  • LAYER 8 Netgate

    It makes perfect sense when you realize that running ifconfig eth0 media autoselect results in an interface down/up and triggers something in that modem that starts a link loss loop whereas running no ifconfig at all does not and results in the same autoselect setting.

    The defaults are defaults for reasons.



  • @Derelict Saying "the defaults are defaults for reasons" doesn't get the user any closer to understanding how to obtain desirable behavior. It also contradicts the general wisdom in configuring networking gear that it is always better to explicitly specify the behavior you want.

    There are two problems with the current behavior: The first is that the documentation doesn't make it clear what the default behavior IS, so the temptation to explicitly configure the port is strong.

    The second problem is that pfSense needs to check the current configuration of the port and compare it to the desired settings, and shouldn't issue ifconfig commands if they aren't needed. Always doing the configuration would be acceptable IF it didn't take the port down in the process. When paired with a peer device that does the same, this behavior results in a deadly embrace that prevents normal operation of the interface.

    This is a problem that is easily avoided by the knowledgeable. but is frustrating and show-stopping to the uninitiated. I personally believe that this should be considered a bug and the behavior of pfSense modified.


  • LAYER 8 Netgate

    Bug reports can be submitted at https://redmine.pfsense.org/

    Did you also contact Zotac to see why their device behaves in this manner? Did you open a bug report there regarding the frustrating and show-stopping behavior of their modem?



  • @Derelict I saw this same behavior on my non-Zotac device which uses a very common Intel Pro/1000 NIC. This isn't due to something strange or unreasonable that Zotac is doing.



  • @Derelict My other question is: Given the hostile and dismissive attitude I've encountered here, is it going to be a waste of time to file a formal bug report?

    I have the impression that the official attitude at Netgate is "we don't to want to be bothered by this."


  • LAYER 8 Netgate

    @tgoltz Was that other side also configured to issue another down/up of the port? It must have been or there would have been no loop.

    The solution to this problem is very simple. If you do not want ifconfig eth0 media autoselect run on the interface during events, because it makes the device on the other side freak out causing it to down/up the port again, resulting in a never-ending down/up loop, do not set that setting on the interface. Leave the interface at the defaults of default. Users should not be making changes they don't understand especially when the default setting works fine - as in there is no problem to be solved by making an unnecessary change to the default configuration in the first place. In the world of gigabit ethernet, this will almost always be what you want in almost all cases. Cases outside of this are what the manual settings are for, as in a Metro-E provider who instructs you to hard-set your port to 100/full.

    Given the above, you might very well get a, "Not a bug. If it hurts, Don't do that," response. I do not know I am not a developer.

    Why should a bunch of code go in that might have to take different code paths for every single interface type to prevent the user from shooting himself in the foot when the problem can be fixed with a simple select box that is already present and works for everything and probably shouldn't have been changed from the defaults at all?



  • @Derelict In my case, the device on the other end was a Netgear CM600 DOCSIS3 cable modem. As is typical for DOCSIS devices, there are virtually no configuration options available to the end-user.

    I don't understand why pfSense appears to re-issue the ifconfig commands every time it sees carrier loss on the port. It would make sense to do this once at startup (and immediately following a configuration change). I can only figure that somebody was trying to work around a NIC that reset the configuration after link loss, but that workaround is now causing problems in it's own right.

    It's not only link-loss that triggers the repeat of the ifconfig sequence: I tried placing an Ethernet switch between my pfSense machine and the cable modem so that the pfSense port wouldn't see a link loss when the cable modem dropped the Ethernet. With the port set explicitly to "autonegotiate", pfSense still cycled the NIC roughly every 20 seconds while the link to the cable modem remained stable.



  • There are two ways this could be handled:

    Change the behavior of the code.

    Update the documentation with a note that if you have "auto negotiation" set explicitly and you are seeing the port cycle link repeatedly, try resetting to "default".


Log in to reply