WAN interface losing connectivity 5-10x daily - uber-thread!
-
Hey there, I'm an experienced pfSense user (6 years), and I'm pretty perplexed by this issue I've been experiencing. Over the last week I've had a ton of issues with my WAN connection or routes dropping many times daily (with strong reason to believe it's not internet service).
I've combed through dozens of threads of people losing WAN for unknown or mysterious reasons, and I'm hoping maybe I can pull some of these together here and find some lasting solutions, since it seems like this is a theme I've noticed. (Thanks in advance for any help!)
Alright, here's a quick rundown of my setup:
-
Supermicro A1SRi-2558F with 4x Intel igb gigabit connections (no external / USB ethernet being used), 4GB ECC RAM, Intel SSD
-
Production release (2.1.5 amd64)
-
Arris Surfboard SB6141
-
Astound cable service (dynamic IP), all new coax to the modem
-
1x gigabit WAN connection (all default + autonegotiate)
-
1x gigabit LAN connection (multiple internal VLANs with known-working config, NAT / fw rules)
I previously ran this exact same stack at this location on a Supermicro X7SPA-H, where everything was stable and running perfectly for months. Eventually I determined we'd need to upgrade that motherboard, and did so with a reinstall about 8 days ago. (Reinstall included a backup restoration of NAT / fw rules and RRD graph data, nothing else – especially not network configurations.)
I've read dozens of threads from people having similar WAN issues (linked below), and tried a ton of different attempts at fixes:
-
Ensuring WAN is not blocking private or bogon networks
-
Ensuring IPv6 DHCP is disabled
-
Ensuring no MAC address is entered
-
Increasing MBUFs (currently set to 262144)
-
Ensuring low hw.igb.num_queues (currently 2)
-
Not having any VPN services active (which I never did)
-
Allowing apinger to monitor the gateway
-
NOT allowing apinger to monitor the gateway
-
Rebooting the system, or modem, or all hardware
Symptoms I'm noticing when connection flaps:
-
No correlation between usage and outages; WAN loss of connectivity occurs during off hours as well as during moments of normal usage.
-
Not much going on in logs (especially with apinger disabled). Traffic simply stops routing outbound. (No firewall / NAT rule changes or scheduling has been made lately.)
-
State table drops down pretty low, but not completely to zero; same with MBUF usage.
-
System is still perfectly responsive, and WAN connection still appears to be parsing firewall rules.
Right now, the only thing that will reliably restore the connection is to completely reboot the box. Waiting a while (usually 10-20 minutes) sometimes also sees the connection restored on its own, but that isn't reliable. Enabling / disabling the WAN doesn't have any effect.
Cable connection + modem is fine, and internet access resumes without any intervention on the modem once the pfSense box comes back up. I've totally ruled out cable issues – again, this same setup worked with precisely zero outages for as long as it operated at this location prior to this installation.
Perhaps it's unrelated, but earlier on with the box, I had the WAN interface on igb1 and LAN on igb0 (currently it's reversed), and was having DHCP issues then, too; it went away for a while when I swapped them out. Which has me wondering whether this is potentially a driver issue. Although at this point I've chased down so many leads, I'm pretty much stumped.
Again, thanks in advance for any help!
Here's a sample log from earlier today, although this has more going on than what usually pops up during an outage:
Dec 31 12:22:47 php: rc.start_packages: Restarting/Starting all packages.
Dec 31 12:22:45 check_reload_status: Reloading filter
Dec 31 12:22:45 check_reload_status: Starting packages
Dec 31 12:22:45 php: rc.newwanip: pfSense package system has detected an ip change x.x.x.x (my public IP) -> x.x.x.x (the same public IP lol) … Restarting packages.
Dec 31 12:22:43 php: rc.newwanip: Creating rrd update script
Dec 31 12:22:43 php: rc.newwanip: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:43 php: rc.newwanip: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:43 php: rc.newwanip: Resyncing OpenVPN instances for interface WAN.
Dec 31 12:22:42 php: /interfaces.php: Creating rrd update script
Dec 31 12:22:42 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:42 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:42 check_reload_status: Reloading filter
Dec 31 12:22:38 check_reload_status: updating dyndns wan
Dec 31 12:22:36 php: rc.newwanip: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:22:36 php: rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAN[wan]) (real interface: igb0).
Dec 31 12:22:36 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
Dec 31 12:22:33 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:22:33 check_reload_status: rc.newwanip starting igb0
Dec 31 12:22:33 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
Dec 31 12:22:30 check_reload_status: Syncing firewall
Dec 31 12:22:09 php: /interfaces.php: Creating rrd update script
Dec 31 12:22:09 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:09 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:22:09 check_reload_status: Reloading filter
Dec 31 12:22:06 check_reload_status: updating dyndns wan
Dec 31 12:22:05 php: rc.interfaces_wan_configure: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
Dec 31 12:22:05 check_reload_status: updating dyndns wan
Dec 31 12:22:03 check_reload_status: Configuring interface wan
Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAN[wan]) (real interface: igb0).
Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
Dec 31 12:22:02 php: rc.linkup: The command '/sbin/route change -inet default 'x.x.x.1 (my ISP gateway)'' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change net default: gateway x.x.x.1 (my ISP gateway): Network is unreachable'
Dec 31 12:22:02 php: rc.linkup: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:22:02 php: rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
Dec 31 12:22:02 php: rc.linkup: HOTPLUG: Configuring interface wan
Dec 31 12:22:02 php: rc.linkup: DEVD Ethernet attached event for wan
Dec 31 12:22:00 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:22:00 check_reload_status: rc.newwanip starting igb0
Dec 31 12:22:00 kernel: igb0: link state changed to UP
Dec 31 12:22:00 check_reload_status: Linkup starting igb0
Dec 31 12:21:59 php: rc.linkup: DEVD Ethernet detached event for wan
Dec 31 12:21:56 kernel: igb0: link state changed to DOWN
Dec 31 12:21:56 check_reload_status: Linkup starting igb0
Dec 31 12:21:37 check_reload_status: Syncing firewall
Dec 31 12:21:29 php: /interfaces.php: Creating rrd update script
Dec 31 12:21:29 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:21:29 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:21:29 check_reload_status: Reloading filter
Dec 31 12:21:24 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
Dec 31 12:21:22 check_reload_status: Syncing firewall
Dec 31 12:18:34 php: /interfaces.php: Creating rrd update script
Dec 31 12:18:34 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:18:34 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:18:34 check_reload_status: Reloading filter
Dec 31 12:18:32 check_reload_status: updating dyndns wan
Dec 31 12:18:30 php: rc.interfaces_wan_configure: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
Dec 31 12:18:30 check_reload_status: updating dyndns wan
Dec 31 12:18:28 check_reload_status: Configuring interface wan
Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAN[wan]) (real interface: igb0).
Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
Dec 31 12:18:28 php: rc.linkup: The command '/sbin/route change -inet default 'x.x.x.1 (my ISP gateway)'' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change net default: gateway x.x.x.1 (my ISP gateway): Network is unreachable'
Dec 31 12:18:28 php: rc.linkup: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:18:28 php: rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
Dec 31 12:18:27 php: rc.linkup: HOTPLUG: Configuring interface wan
Dec 31 12:18:27 php: rc.linkup: DEVD Ethernet attached event for wan
Dec 31 12:18:26 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
Dec 31 12:18:26 check_reload_status: rc.newwanip starting igb0
Dec 31 12:18:25 kernel: igb0: link state changed to UP
Dec 31 12:18:25 check_reload_status: Linkup starting igb0
Dec 31 12:18:24 php: rc.linkup: DEVD Ethernet detached event for wan
Dec 31 12:18:22 kernel: igb0: link state changed to DOWN
Dec 31 12:18:22 check_reload_status: Linkup starting igb0
Dec 31 12:18:19 check_reload_status: Syncing firewall
Dec 31 12:17:58 php: /interfaces.php: Creating rrd update script
Dec 31 12:17:58 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:17:58 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
Dec 31 12:17:58 check_reload_status: Reloading filter
Dec 31 12:17:54 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
Dec 31 12:17:51 check_reload_status: Syncing firewall
Appendix of other WAN-related threads I've read in trying to solve this issue:
WAN interface going down
https://forum.pfsense.org/index.php?topic=84037.0Ethernet connection goes down every now & then
https://forum.pfsense.org/index.php?topic=83702.0WAN unable to obtain IP address via DHCP
https://forum.pfsense.org/index.php?topic=81987.0No internet connectivity on WAN with valid public IP
https://forum.pfsense.org/index.php?topic=81943.0WAN port goes down and up but no internet connection - em card, 2.1.5 x64
https://forum.pfsense.org/index.php?topic=81407.0check_reload_status at 100% + apinger messages
https://forum.pfsense.org/index.php?topic=79812.0All packages restart after a DHCP renew
https://forum.pfsense.org/index.php?topic=76597.0apinger exits when no useable targets but is not restarted
https://forum.pfsense.org/index.php?topic=71908.0WAN-link "randomly" disconnects. pfSense 2.1
https://forum.pfsense.org/index.php?topic=71624.0WAN down
https://forum.pfsense.org/index.php?topic=70682.0WAN dropped connection on 2.1
https://forum.pfsense.org/index.php?topic=70677.0pfSense 2.1 WAN interface DHCP problem
https://forum.pfsense.org/index.php?topic=69904.0kernel: arpresolve: can't allocate llinfo for 192.168.100.1 (cable modem)
https://forum.pfsense.org/index.php?topic=63474.0kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.xxx
https://forum.pfsense.org/index.php/topic,62964.0WAN down
https://forum.pfsense.org/index.php?topic=61785.0wan cable disconnect/reconnect causes interface drop no recover
https://forum.pfsense.org/index.php?topic=61182pfSense loses default route after link flap
https://forum.pfsense.org/index.php?topic=60886.0Since a few day's losing the wan (dhcp) ip address
https://forum.pfsense.org/index.php?topic=58819.0Intel i350: recognized, but no traffic and no DHCP
https://forum.pfsense.org/index.php?topic=55032.0Cannot obtain DHCP from WAN automatically, must be done manually.
https://forum.pfsense.org/index.php?topic=53341.0Gateway status Offline
https://forum.pfsense.org/index.php?topic=53187.0dhclient loosing WAN connection
https://forum.pfsense.org/index.php?topic=48013.0WAN DHCP Does Not Work
https://forum.pfsense.org/index.php?topic=31523.0Polling broken in current snapshot for igb interfaces
https://forum.pfsense.org/index.php?topic=27126.0wan interface losing ip address
http://lists.pfsense.org/pipermail/list/2012-July/002572.html- others, less relevant
Bugs I've investigated:
#3669
#2704
#2647
#2919
#1943 -
-
I know it's been a while, but were you able to figure this out?
-
All the log extracts in the original post show is the link cycling on the igb0 interface, with the inevitable consequences of pfSense stopping, starting and reloading various services.
The original post is now 13 months old and refers to an obsolete and end of life version of pfSense that is based on an obsolete and end of life version of FreeBSD. If you have an issue with link cycling, it would be best if you describe your issue afresh, enclosing relevant log extracts.