WAN flapping since upgrading to 2.4.5
-
So i decided to upgrade my old hardware that was running 2.4.4_3 to something newer. Old hardware [Solana Tech Mini ITX pfSense firewall router with 4x Intel NICs and 2ghz J1900 CPU, 4gb RAM and 32gb SSD] everything was working great, i just needed to upgrade to something that supported AES-NI as i have been setting up more tunnels. I first tried the Protectli boxes on amazon, installed 2.4.5 and immediately noticed my WAN going up and down every 15 mins. I decided to return the hardware and try something else, ended up finding a used Jetway JBC385. Installed 2.4.5 and again my WAN goes up and down. I put my old hardware back in place and not a single issue. I've tried troubleshooting everything i can at this point i believe. Looking for help.
-
Oct 26 23:28:32 check_reload_status Reloading filter Oct 26 23:28:32 check_reload_status Linkup starting igb0 Oct 26 23:28:32 kernel igb0: link state changed to UP Oct 26 23:28:32 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:32 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:33 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:33 php-fpm 713 /rc.openvpn: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Oct 26 23:28:33 php-fpm 713 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Oct 26 23:28:33 php-fpm 713 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Oct 26 23:28:33 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:33 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:33 php-fpm 713 /rc.linkup: DEVD Ethernet attached event for wan Oct 26 23:28:33 php-fpm 713 /rc.linkup: HOTPLUG: Configuring interface wan Oct 26 23:28:33 check_reload_status rc.newwanip starting igb0 Oct 26 23:28:33 php-fpm 713 /rc.linkup: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP' Oct 26 23:28:33 php-fpm 713 /rc.linkup: Gateway, none 'available' for inet6, use the first one configured. '' Oct 26 23:28:33 check_reload_status Restarting ipsec tunnels Oct 26 23:28:34 php-fpm 714 /rc.newwanip: pfSense package system has detected an IP change or dynamic WAN reconnection - 174.109.182.X -> 174.109.182.X - Restarting packages. Oct 26 23:28:34 check_reload_status Starting packages Oct 26 23:28:34 php-fpm 98247 /rc.newwanip: rc.newwanip: Info: starting on igb0. Oct 26 23:28:34 php-fpm 98247 /rc.newwanip: rc.newwanip: on (IP address: 174.109.182.X) (interface: WAN[wan]) (real interface: igb0). Oct 26 23:28:35 php-fpm 98247 /rc.newwanip: Gateway, none 'available' for inet6, use the first one configured. '' Oct 26 23:28:35 php-fpm 714 /rc.start_packages: Restarting/Starting all packages. Oct 26 23:28:36 php-fpm 98247 /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1603754916] unbound[19136:0] error: bind: address already in use [1603754916] unbound[19136:0] fatal error: could not open ports' Oct 26 23:28:37 kernel igb0: link state changed to DOWN Oct 26 23:28:37 check_reload_status Linkup starting igb0 Oct 26 23:28:37 check_reload_status updating dyndns wan Oct 26 23:28:37 check_reload_status Reloading filter Oct 26 23:28:38 php-fpm 713 /rc.linkup: DEVD Ethernet detached event for wan Oct 26 23:28:40 kernel arpresolve: can't allocate llinfo for 174.109.176.1 on igb0 Oct 26 23:28:40 rc.gateway_alarm 64625 >>> Gateway alarm: WAN_DHCP (Addr:174.109.176.1 Alarm:1 RTT:16.676ms RTTsd:3.783ms Loss:37%) Oct 26 23:28:40 check_reload_status updating dyndns WAN_DHCP Oct 26 23:28:40 check_reload_status Restarting ipsec tunnels Oct 26 23:28:40 check_reload_status Restarting OpenVPN tunnels/interfaces Oct 26 23:28:40 check_reload_status Reloading filter Oct 26 23:28:57 dpinger WAN_DHCP 174.109.176.1: sendto error: 65 Oct 26 23:28:58 dpinger WAN_DHCP 174.109.176.1: sendto error: 65 Oct 26 23:28:58 dpinger WAN_DHCP 174.109.176.1: sendto error: 65 Oct 26 23:29:00 dpinger send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 174.109.176.1 bind_addr 174.109.182.150 identifier "WAN_DHCP "
I restored from a previous backup, but did not include my packages. And when i started having these troubles, i also removed my VPN tunnels and went to a barebones install.
-
You NIC is detecting (or thinks it is detecting) a link state change. That is then triggering automatic processes (scripts, actually) to run within pfSense.
So the root cause is the link state change. As to what is causing that, the first place to look usually is hardware. But in your case you say you have tried two different pieces of new hardware with the same result. When you swap in the old hardware for testing, are you using the exact same Cat5 cable and switch port that you use for the new hardware? Don't forget that cables and switch ports can get flaky.
pfSense-2.4.5 is based on FreeBSD-11.3/STABLE while pfSense-2.4.4 was based on FreeBSD-11.2/RELEASE. So it's not outside the realm of possibility that some change to the NIC driver or something else in FreeBSD-11.3/STABLE is causing you grief with your new hardware. Although the igb driver is very popular, so any underlying issues with it in FreeBSD-11.3/STABLE would show up quickly and result in lots of complaints. That is not the case. I have igb NIC drivers in use in my Netgate SG-5100 appliance on pfSense-2.4.5_p1 with no issues.
-
So old hardware has intel 82583v.
New hardware is 1x intel 219-LM, 1x intel 211-AT, 4x intel 350-AM4. I may try and figure out which port is which model and then make the WAN a new port. Also someone just suggested placing a dumb switch in between the modem and firewall to see if the issue goes away, so ill be trying that. -
@larold42 said in WAN flapping since upgrading to 2.4.5:
So old hardware has intel 82583v.
New hardware is 1x intel 219-LM, 1x intel 211-AT, 4x intel 350-AM4. I may try and figure out which port is which model and then make the WAN a new port. Also someone just suggested placing a dumb switch in between the modem and firewall to see if the issue goes away, so ill be trying that.Swapping ports around is a good idea. As I said, the root cause of your problem is the NIC driver within pfSense "thinks" the link state is changing (going down and then coming back up) -- same as if someone unplugged the Cat5 cable and then plugged it back in.
-
@bmeeks The only thing is, the protectli box has the same NIC as my old hardware intel 82583v and that had the same problem.
-
@larold42 said in WAN flapping since upgrading to 2.4.5:
@bmeeks The only thing is, the protectli box has the same NIC as my old hardware intel 82583v and that had the same problem.
You might be seeing an impact of this bug which was reportedly fixed: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=235147. Do you perhaps have the older suggested workaround for this bug still in your configuration? If so, try removing it. Here is the pfSense Redmine bug report associated with the FreeBSD bug report I linked earlier: https://redmine.pfsense.org/issues/9414.
-
@bmeeks huh didnt even know about this bug, well on the jetway i dont have that ethernet controller and it is recognized and loads. But i'm wondering if i should put in a bug now. Problem is i really dont have the time to troubleshoot this anymore. This is the only part that sucks about not having support.
-
@bmeeks i'm almost tempted to try the 2.5 dev version, but i feel like that will only dilute the problem with likely more issues.
-
@larold42 said in WAN flapping since upgrading to 2.4.5:
@bmeeks huh didnt even know about this bug, well on the jetway i dont have that ethernet controller and it is recognized and loads. But i'm wondering if i should put in a bug now. Problem is i really dont have the time to troubleshoot this anymore. This is the only part that sucks about not having support.
If you submit a bug report, it most likely should be to the FreeBSD bunch and not pfSense. The pfSense team does not do anything with regards to drivers. That is all taken "as-is" from upstream FreeBSD.
What is different with pfSense-2.4.5 is the newer version of FreeBSD. Things like drivers get various fixes and adjustments with new OS versions. Some of those are good and fix things, but others can "break" things through unintentional regressions of one kind of another.
-
@larold42 said in WAN flapping since upgrading to 2.4.5:
@bmeeks i'm almost tempted to try the 2.5 dev version, but i feel like that will only dilute the problem with likely more issues.
2.5 is based on FreeBSD-12.2/STABLE, so it is newer still. But it does have all of the latest NIC driver "fixes". The really big change in terms of NIC drivers in FreeBSD-12 is the move to the iflib wrapper API. That is a big change to the way NIC manufacturers write their hardware drivers.
-
This one - maybe not related - is also an issue that has to be checked :
@larold42 said in WAN flapping since upgrading to 2.4.5:Oct 26 23:28:36 php-fpm 98247 /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1603754916] unbound[19136:0] error: bind: address already in use [1603754916] unbound[19136:0] fatal error: could not open ports'
"could not open ports" == another (probably) unbound instance was already running - or is still running (?). A new one can't be launched, as used ports like '53' are still occupied.
If I recall correctly (2.4.4-p3 is rather old already) , there was a timing issue with unbound, as the "stop" taks some time - and the restart came in to fast.Check the unbound logs to see why it failed ?!
Also : take note that these settings :
will take the interface (WAN) down and up (== 'flapping ?!) if the motoring looses contact with the automatic or gateway IP (my 87.98.136.xx).
Practical joke : many use 8.8.8.8 here - and 8.8.8.8 is not being paid to serve (reply to) ICMP packets, it's job is serving DNS requests. So when 8.8.8.8 stops replying on (useless) ICMP, many WAN interfaces will fall.
In other words; your WAN connection will be as good as the "Monitor IP" being able to reply to pings. Temporary solution : disable the Gateway action to exclude this reason as a possible cause.@larold42 said in WAN flapping since upgrading to 2.4.5:
and immediately noticed my WAN going up and down every 15 mins.
So it's down all the time (15 minutes) - then it goes UP :
Your log starts with :Oct 26 23:28:32 kernel igb0: link state changed to UP
to go down 5 seconds later
Oct 26 23:28:37 kernel igb0: link state changed to DOWN
at the end of the log lines you showed - and it stays down for another 15 minutes ?
-
-
@bmeeks so i check the interface that was doing this
igb0@pci0:1:0:0: class=0x020000 card=0x0000ffff chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base 0xdf160000, size 131072, enabled bar [18] = type I/O Port, range 32, base 0xe060, size 32, enabled bar [1c] = type Memory, range 32, base 0xdf18c000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 003018ffff0f0d21 ecap 000e[150] = ARI 1 ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled 0 VFs configured out of 8 supported First VF RID Offset 0x0180, VF RID Stride 0x0004 VF Device ID 0x1520 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304 ecap 0017[1a0] = TPH Requester 1 ecap 0018[1c0] = LTR 1 ecap 000d[1d0] = ACS 1
so... i'm wondering how many other folks are having issues running i350's, this is has to be a driver issue.
EDIT:
Here is the driver info as well
dev.igb.0.%desc: Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k
dev.igb.%parent: -
Here is a link to the source code for the latest version of Intel driver for what appears to be your card: https://downloadcenter.intel.com/download/15815/Intel-Network-Adapter-Driver-for-82575-6-and-82580-Based-Gigabit-Network-Connections-under-FreeBSD-?product=46827. This is only the C source code. To use this driver, you would need to create your own separate FreeBSD-11 virtual machine with the proper developer tools installed (compiler and linker) and then compile the source code into the binary driver module. Then copy that module over to your pfSense box and load it. That may be more effort than you wish to expend, though.
The one thing I've noticed over the years with FreeBSD is that the support of newer hardware seems to lag behind Linux. The drivers within FreeBSD-11 and earlier are maintained by a team of Intel folks who then submit the updates to FreeBSD. For FreeBSD-12 and later, as I mentioned in a previous post, FreeBSD has moved to a new wrapper API called iflib. That move has muddied the waters a bit in terms of NIC driver development and support as now the FreeBSD team has the iflib API part while hardware manufacturers write the pieces that need to directly manipulate widgets on their particular NIC.
It might be worth trying pfSense-2.5 DEVEL since it is based on FreeBSD-12.2/STABLE and will contain newer NIC driver versions.