WAN interfaces fail to return after power outage
-
Thanks Stephenw10 yes that's the correct gateway address for our WAN, it gets handed to us via DHCP from the ISP. There is no modem involved.
-
Hmm, well it appears it's not responding to ARP requests for some reason.
Are you spoofing your MAC address maybe?
When it's in that state try running
ifconfig -v igb0
And if possible run a packet capture ion that interface to see what it's actually sending and what's coming back (if anything).
Steve
-
Thanks Steve.
I'm not spoofing my MAC address.
What exactly is it that's not responding to ARP requests?
I'll do some testing and try to replicate it so I can do a capture. Thanks again.
-
It's whatever is upstream. If you connect directly and pull a DHCP lease it's whatever that is giving you as a gateway, 100.65.48.1.
Steve
-
Seems that way, or might it be a problem with pfsense because it was happening with both WANs, 2 completely different service? And if it were a problem upstream, it wouldn't go away from just unplugging and replugging the cable, right?
-
Please see my post about a similar issue, and another thread I responded to. I think I know what is happening here, and it needs to be documented, and or added as a default setting in new installs.
It seems that the dhclient code has been changed to respect the "interface-mtu" option that is being issued via DHCP by some CMTS equipment.
I the cases I have seen the MTU is being set to 576, rather than being left at the pfSense default of 1500.
In addition, the interface-mtu option issued via DHCP takes precedence over an MTU explicitly set by the user. To override the bad interface-mtu being set via DHCP, dhclient must be instructed to ignore the option. This is done by setting supersede interface-mtu 0 in the "Option modifiers" section of "Lease Requirements and Requests".
When the MTU is set too small, it seems that DHCP renewals are failing. The failures coincide with errors like:
arpresolve: can't allocate llinfo for $GATEWAY on $INTERFACEI think this is biting a lot of users on the upgrade to 2.4.4-RELEASE, and the symptoms are quirky. If left alone, the interface may stay up, but certain websites will become inaccessible due to packet fragmentation.
PRIOR POSTS ON SAME SUBJECT:
https://forum.netgate.com/topic/136089/solved-and-revised-2-4-4-release-arpresolve-can-t-allocate-llinfo-for-gateway-on-interface0-dhcp-mtu-576https://forum.netgate.com/topic/136253/frequent-internet-loss-need-help-figuring-out-where-and-why-maybe-pfsense-modem-isp-or-all-3
-
Wow, nice catch. And what are they doing sending 576?!
That does seem like it could be related at least. I'll be watching with anticipation...
Steve
-
Actually that should have been resolved already in https://redmine.pfsense.org/issues/8507
If it does work that has somehow missed your install.
Steve
-
From what I have read, I see references to the 576 MTU related to dialup connections. This might be an old fall-back that is being exposed only because dhclient now respects the option interface-mtu value being sent by the DHCP server. The value shows up in the issued lease. The changes in dhclient upstream are now exposing this.
This is worth exploring in connection with reports of WAN interface disconnections, unpredictable website connectivity, and may affect things like name resolution. When combined with the "IP Do-Not-Fragment compatibility" option in System/Advanced/Firewall&NAT, the small MTU breaks connectivity with some websites. I saw problems with the iHeart Radio website and streams, and with loading newyorker.com. Please propagate this up the chain. My earlier post has links to the issues as they are discussed in the FreeBSD development system.
https://forum.netgate.com/topic/136089/solved-and-revised-2-4-4-release-arpresolve-can-t-allocate-llinfo-for-gateway-on-interface0-dhcp-mtu-576
-
The fix discussed in Redmine doesn't seem to have made it into 2.4.4-RELEASE.
Rather, the fix of using the patched version of dhclient now in the FreeBSD tree is that the user must issue "supersede interface-mtu 0" to ignore the requested option 26 information. Dhclient is still requesting option 26 info from the DHCP server. The patch allows being able to supersede option 26 as issued with the lease.
-
I have opened a Redmine account, and posted in the relevant thread.
-
I agree, it's certainly worth exploring. It could explain a number of threads here.
The value
supersede interface-mtu 0
should be in the dhclient conf files in /var/etc by default. It is on everything I've just checked. If some connections are still seeing a 576 MTU then there must be some combination of factors that prevent it being added. If that is the case we need to find out what they are and stop that happening.Steve
-
I have a sneaking suspicion that a prior manual setting of MTU on the interface may be interfering with the the setting of supersede interface-mtu 0 in dhclient.conf on upgrade. I know that I have previously hard set the MTU to 1500 on a number of boxes as a matter of course. In this instance, the hard set MTU will not be respected if supersede interface-mtu 0 is not making it into dhclient.conf.
-
I think you're right. Working on something now....
-
Jim Pingle, the developer working on this has entered a new diff. Apparently, checking the advanced options checkbox and then saving and applying the config with no other changes entered, and then upgrading to 2.4.4-RELEASE, is enough to disrupt the fix the developers had put in place for the option 26 interface-mtu bug introduced by the new dhclient.
-
I added a new note with a workaround to the Upgrade Guide: https://www.netgate.com/docs/pfsense/install/upgrade-guide.html#upgrading-from-versions-older-than-pfsense-2-4-4
A patch is available that can be added with the System Patches package.
The fix is discussed on https://redmine.pfsense.org/issues/8507
-
Thank you! This is wonderful.
-
I think this bug also applies to fresh installs using a restored config, not just on in-place upgrades. That is the case for the system I encountered this on.
-
@bfeitell said in WAN interfaces fail to return after power outage:
I think this bug also applies to fresh installs using a restored config, not just on in-place upgrades. That is the case for the system I encountered this on.
Since it is a setting in the configuration and not a problem on the filesystem, that is correct. If you restore a config with advanced or custom options set there, it would fail this way.
-
It is an insidious bug. I triggered the DHCP renewal problems by saving and applying on the WAN with or without changes. Unless triggered by the user, it will lurk until the next DHCP renewal fails, and that may not happen for 30 minutes or more. Looking through recent forum posts, I suspect this bug is in play whenever a user notices arpresolve: can't set llinfo for $GATEWAY on $INTERFACE errors.