WAN starts cycling link after Ethernet link loss

Derelict

If you really believe it is a bug and have steps to reproduce, a bug can be opened at https://redmine.pfsense.org/

Of course, a pull request with a fix is always appreciated.

jitguy

@derelict said in WAN starts cycling link after Ethernet link loss:

It runs another ifconfig to explicitly set autoselect. When it does it bounces the link. That is probably freaking out what it is connecting do and it is bouncing the link too, initiating a negotiation loop.

I wouldn't call it a bug in pfSense. It could be a bug in the modem. :)

I would call it an incompatibility caused by someone that shouldn't be tweaking settings and checking boxes unnecessarily since default (usually autoselect) is the default for a reason.

Nice explanation. Not sure of the need to slam those of us that have run into that problem though. After all, the GUI text specifically says:

"WARNING: MUST be set to autoselect (automatically negotiate speed) unless the port this interface connects to has its speed and duplex forced."

A bit of a stretch to blame us for heeding the bold warning.

jonesko

I can certainly confirm this. Its still happening. Spend two weeks wondering why my connection dies when cable modem cuts off even for a second. Every time modem connection drops WAN nic started to throttle up and down. Had autoselect selected in WAN "Speed and Duplex" box. "After selecting Default (no preference, typically autoselect)" problem is gone. This doesnt make any sense since they bot are "autoselect". Version: 2.4.4-RELEASE-p3 (amd64) Device: Zotac ZBOX CI323 nano.

Derelict

It makes perfect sense when you realize that running ifconfig eth0 media autoselect results in an interface down/up and triggers something in that modem that starts a link loss loop whereas running no ifconfig at all does not and results in the same autoselect setting.

The defaults are defaults for reasons.

tgoltz

@Derelict Saying "the defaults are defaults for reasons" doesn't get the user any closer to understanding how to obtain desirable behavior. It also contradicts the general wisdom in configuring networking gear that it is always better to explicitly specify the behavior you want.

There are two problems with the current behavior: The first is that the documentation doesn't make it clear what the default behavior IS, so the temptation to explicitly configure the port is strong.

The second problem is that pfSense needs to check the current configuration of the port and compare it to the desired settings, and shouldn't issue ifconfig commands if they aren't needed. Always doing the configuration would be acceptable IF it didn't take the port down in the process. When paired with a peer device that does the same, this behavior results in a deadly embrace that prevents normal operation of the interface.

This is a problem that is easily avoided by the knowledgeable. but is frustrating and show-stopping to the uninitiated. I personally believe that this should be considered a bug and the behavior of pfSense modified.

Derelict

Bug reports can be submitted at https://redmine.pfsense.org/

Did you also contact Zotac to see why their device behaves in this manner? Did you open a bug report there regarding the frustrating and show-stopping behavior of their modem?

tgoltz

@Derelict I saw this same behavior on my non-Zotac device which uses a very common Intel Pro/1000 NIC. This isn't due to something strange or unreasonable that Zotac is doing.

tgoltz

@Derelict My other question is: Given the hostile and dismissive attitude I've encountered here, is it going to be a waste of time to file a formal bug report?

I have the impression that the official attitude at Netgate is "we don't to want to be bothered by this."

Derelict

@tgoltz Was that other side also configured to issue another down/up of the port? It must have been or there would have been no loop.

The solution to this problem is very simple. If you do not want ifconfig eth0 media autoselect run on the interface during events, because it makes the device on the other side freak out causing it to down/up the port again, resulting in a never-ending down/up loop, do not set that setting on the interface. Leave the interface at the defaults of default. Users should not be making changes they don't understand especially when the default setting works fine - as in there is no problem to be solved by making an unnecessary change to the default configuration in the first place. In the world of gigabit ethernet, this will almost always be what you want in almost all cases. Cases outside of this are what the manual settings are for, as in a Metro-E provider who instructs you to hard-set your port to 100/full.

Given the above, you might very well get a, "Not a bug. If it hurts, Don't do that," response. I do not know I am not a developer.

Why should a bunch of code go in that might have to take different code paths for every single interface type to prevent the user from shooting himself in the foot when the problem can be fixed with a simple select box that is already present and works for everything and probably shouldn't have been changed from the defaults at all?

tgoltz

@Derelict In my case, the device on the other end was a Netgear CM600 DOCSIS3 cable modem. As is typical for DOCSIS devices, there are virtually no configuration options available to the end-user.

I don't understand why pfSense appears to re-issue the ifconfig commands every time it sees carrier loss on the port. It would make sense to do this once at startup (and immediately following a configuration change). I can only figure that somebody was trying to work around a NIC that reset the configuration after link loss, but that workaround is now causing problems in it's own right.

It's not only link-loss that triggers the repeat of the ifconfig sequence: I tried placing an Ethernet switch between my pfSense machine and the cable modem so that the pfSense port wouldn't see a link loss when the cable modem dropped the Ethernet. With the port set explicitly to "autonegotiate", pfSense still cycled the NIC roughly every 20 seconds while the link to the cable modem remained stable.

tgoltz

There are two ways this could be handled:

Change the behavior of the code.

Update the documentation with a note that if you have "auto negotiation" set explicitly and you are seeing the port cycle link repeatedly, try resetting to "default".