WAN interface losing connectivity 5-10x daily - uber-thread!



  • Hey there, I'm an experienced pfSense user (6 years), and I'm pretty perplexed by this issue I've been experiencing. Over the last week I've had a ton of issues with my WAN connection or routes dropping many times daily (with strong reason to believe it's not internet service).

    I've combed through dozens of threads of people losing WAN for unknown or mysterious reasons, and I'm hoping maybe I can pull some of these together here and find some lasting solutions, since it seems like this is a theme I've noticed. (Thanks in advance for any help!)

    Alright, here's a quick rundown of my setup:

    • Supermicro A1SRi-2558F with 4x Intel igb gigabit connections (no external / USB ethernet being used), 4GB ECC RAM, Intel SSD

    • Production release (2.1.5 amd64)

    • Arris Surfboard SB6141

    • Astound cable service (dynamic IP), all new coax to the modem

    • 1x gigabit WAN connection (all default + autonegotiate)

    • 1x gigabit LAN connection (multiple internal VLANs with known-working config, NAT / fw rules)

    I previously ran this exact same stack at this location on a Supermicro X7SPA-H, where everything was stable and running perfectly for months. Eventually I determined we'd need to upgrade that motherboard, and did so with a reinstall about 8 days ago. (Reinstall included a backup restoration of NAT / fw rules and RRD graph data, nothing else – especially not network configurations.)

    I've read dozens of threads from people having similar WAN issues (linked below), and tried a ton of different attempts at fixes:

    • Ensuring WAN is not blocking private or bogon networks

    • Ensuring IPv6 DHCP is disabled

    • Ensuring no MAC address is entered

    • Increasing MBUFs (currently set to 262144)

    • Ensuring low hw.igb.num_queues (currently 2)

    • Not having any VPN services active (which I never did)

    • Allowing apinger to monitor the gateway

    • NOT allowing apinger to monitor the gateway

    • Rebooting the system, or modem, or all hardware

    Symptoms I'm noticing when connection flaps:

    • No correlation between usage and outages; WAN loss of connectivity occurs during off hours as well as during moments of normal usage.

    • Not much going on in logs (especially with apinger disabled). Traffic simply stops routing outbound. (No firewall / NAT rule changes or scheduling has been made lately.)

    • State table drops down pretty low, but not completely to zero; same with MBUF usage.

    • System is still perfectly responsive, and WAN connection still appears to be parsing firewall rules.

    Right now, the only thing that will reliably restore the connection is to completely reboot the box. Waiting a while (usually 10-20 minutes) sometimes also sees the connection restored on its own, but that isn't reliable. Enabling / disabling the WAN doesn't have any effect.

    Cable connection + modem is fine, and internet access resumes without any intervention on the modem once the pfSense box comes back up. I've totally ruled out cable issues – again, this same setup worked with precisely zero outages for as long as it operated at this location prior to this installation.

    Perhaps it's unrelated, but earlier on with the box, I had the WAN interface on igb1 and LAN on igb0 (currently it's reversed), and was having DHCP issues then, too; it went away for a while when I swapped them out. Which has me wondering whether this is potentially a driver issue. Although at this point I've chased down so many leads, I'm pretty much stumped.

    Again, thanks in advance for any help!

    Here's a sample log from earlier today, although this has more going on than what usually pops up during an outage:

    Dec 31 12:22:47 php: rc.start_packages: Restarting/Starting all packages.
    Dec 31 12:22:45 check_reload_status: Reloading filter
    Dec 31 12:22:45 check_reload_status: Starting packages
    Dec 31 12:22:45 php: rc.newwanip: pfSense package system has detected an ip change x.x.x.x (my public IP) -> x.x.x.x (the same public IP lol) … Restarting packages.
    Dec 31 12:22:43 php: rc.newwanip: Creating rrd update script
    Dec 31 12:22:43 php: rc.newwanip: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:43 php: rc.newwanip: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:43 php: rc.newwanip: Resyncing OpenVPN instances for interface WAN.
    Dec 31 12:22:42 php: /interfaces.php: Creating rrd update script
    Dec 31 12:22:42 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:42 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:42 check_reload_status: Reloading filter
    Dec 31 12:22:38 check_reload_status: updating dyndns wan
    Dec 31 12:22:36 php: rc.newwanip: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:22:36 php: rc.newwanip: rc.newwanip: on (IP address: x.x.x.x) (interface: WAN[wan]) (real interface: igb0).
    Dec 31 12:22:36 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
    Dec 31 12:22:33 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:22:33 check_reload_status: rc.newwanip starting igb0
    Dec 31 12:22:33 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
    Dec 31 12:22:30 check_reload_status: Syncing firewall
    Dec 31 12:22:09 php: /interfaces.php: Creating rrd update script
    Dec 31 12:22:09 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:09 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:22:09 check_reload_status: Reloading filter
    Dec 31 12:22:06 check_reload_status: updating dyndns wan
    Dec 31 12:22:05 php: rc.interfaces_wan_configure: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
    Dec 31 12:22:05 check_reload_status: updating dyndns wan
    Dec 31 12:22:03 check_reload_status: Configuring interface wan
    Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
    Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAN[wan]) (real interface: igb0).
    Dec 31 12:22:03 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
    Dec 31 12:22:02 php: rc.linkup: The command '/sbin/route change -inet default 'x.x.x.1 (my ISP gateway)'' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change net default: gateway x.x.x.1 (my ISP gateway): Network is unreachable'
    Dec 31 12:22:02 php: rc.linkup: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:22:02 php: rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
    Dec 31 12:22:02 php: rc.linkup: HOTPLUG: Configuring interface wan
    Dec 31 12:22:02 php: rc.linkup: DEVD Ethernet attached event for wan
    Dec 31 12:22:00 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:22:00 check_reload_status: rc.newwanip starting igb0
    Dec 31 12:22:00 kernel: igb0: link state changed to UP
    Dec 31 12:22:00 check_reload_status: Linkup starting igb0
    Dec 31 12:21:59 php: rc.linkup: DEVD Ethernet detached event for wan
    Dec 31 12:21:56 kernel: igb0: link state changed to DOWN
    Dec 31 12:21:56 check_reload_status: Linkup starting igb0
    Dec 31 12:21:37 check_reload_status: Syncing firewall
    Dec 31 12:21:29 php: /interfaces.php: Creating rrd update script
    Dec 31 12:21:29 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:21:29 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:21:29 check_reload_status: Reloading filter
    Dec 31 12:21:24 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
    Dec 31 12:21:22 check_reload_status: Syncing firewall
    Dec 31 12:18:34 php: /interfaces.php: Creating rrd update script
    Dec 31 12:18:34 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:18:34 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:18:34 check_reload_status: Reloading filter
    Dec 31 12:18:32 check_reload_status: updating dyndns wan
    Dec 31 12:18:30 php: rc.interfaces_wan_configure: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
    Dec 31 12:18:30 check_reload_status: updating dyndns wan
    Dec 31 12:18:28 check_reload_status: Configuring interface wan
    Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: Failed to update wan IP, restarting…
    Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: on (IP address: ) (interface: WAN[wan]) (real interface: igb0).
    Dec 31 12:18:28 php: rc.newwanip: rc.newwanip: Informational is starting igb0.
    Dec 31 12:18:28 php: rc.linkup: The command '/sbin/route change -inet default 'x.x.x.1 (my ISP gateway)'' returned exit code '1', the output was 'route: writing to routing socket: No such process route: writing to routing socket: Network is unreachable change net default: gateway x.x.x.1 (my ISP gateway): Network is unreachable'
    Dec 31 12:18:28 php: rc.linkup: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:18:28 php: rc.linkup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was ''
    Dec 31 12:18:27 php: rc.linkup: HOTPLUG: Configuring interface wan
    Dec 31 12:18:27 php: rc.linkup: DEVD Ethernet attached event for wan
    Dec 31 12:18:26 php: /interfaces.php: ROUTING: setting default route to x.x.x.1 (my ISP gateway)
    Dec 31 12:18:26 check_reload_status: rc.newwanip starting igb0
    Dec 31 12:18:25 kernel: igb0: link state changed to UP
    Dec 31 12:18:25 check_reload_status: Linkup starting igb0
    Dec 31 12:18:24 php: rc.linkup: DEVD Ethernet detached event for wan
    Dec 31 12:18:22 kernel: igb0: link state changed to DOWN
    Dec 31 12:18:22 check_reload_status: Linkup starting igb0
    Dec 31 12:18:19 check_reload_status: Syncing firewall
    Dec 31 12:17:58 php: /interfaces.php: Creating rrd update script
    Dec 31 12:17:58 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:17:58 php: /interfaces.php: RRD create failed exited with 1, the error is: ERROR: you must define at least one Data Source
    Dec 31 12:17:58 check_reload_status: Reloading filter
    Dec 31 12:17:54 php: /interfaces.php: Clearing states to old gateway x.x.x.1 (my ISP gateway).
    Dec 31 12:17:51 check_reload_status: Syncing firewall


    Appendix of other WAN-related threads I've read in trying to solve this issue:
    WAN interface going down
    https://forum.pfsense.org/index.php?topic=84037.0

    Ethernet connection goes down every now & then
    https://forum.pfsense.org/index.php?topic=83702.0

    WAN unable to obtain IP address via DHCP
    https://forum.pfsense.org/index.php?topic=81987.0

    No internet connectivity on WAN with valid public IP
    https://forum.pfsense.org/index.php?topic=81943.0

    WAN port goes down and up but no internet connection - em card, 2.1.5 x64
    https://forum.pfsense.org/index.php?topic=81407.0

    check_reload_status at 100% + apinger messages
    https://forum.pfsense.org/index.php?topic=79812.0

    All packages restart after a DHCP renew
    https://forum.pfsense.org/index.php?topic=76597.0

    apinger exits when no useable targets but is not restarted
    https://forum.pfsense.org/index.php?topic=71908.0

    WAN-link "randomly" disconnects. pfSense 2.1
    https://forum.pfsense.org/index.php?topic=71624.0

    WAN down
    https://forum.pfsense.org/index.php?topic=70682.0

    WAN dropped connection on 2.1
    https://forum.pfsense.org/index.php?topic=70677.0

    pfSense 2.1 WAN interface DHCP problem
    https://forum.pfsense.org/index.php?topic=69904.0

    kernel: arpresolve: can't allocate llinfo for 192.168.100.1 (cable modem)
    https://forum.pfsense.org/index.php?topic=63474.0

    kernel: arpresolve: can't allocate llinfo for xxx.xxx.xxx.xxx
    https://forum.pfsense.org/index.php/topic,62964.0

    WAN down
    https://forum.pfsense.org/index.php?topic=61785.0

    wan cable disconnect/reconnect causes interface drop no recover
    https://forum.pfsense.org/index.php?topic=61182

    pfSense loses default route after link flap
    https://forum.pfsense.org/index.php?topic=60886.0

    Since a few day's losing the wan (dhcp) ip address
    https://forum.pfsense.org/index.php?topic=58819.0

    Intel i350: recognized, but no traffic and no DHCP
    https://forum.pfsense.org/index.php?topic=55032.0

    Cannot obtain DHCP from WAN automatically, must be done manually.
    https://forum.pfsense.org/index.php?topic=53341.0

    Gateway status Offline
    https://forum.pfsense.org/index.php?topic=53187.0

    dhclient loosing WAN connection
    https://forum.pfsense.org/index.php?topic=48013.0

    WAN DHCP Does Not Work
    https://forum.pfsense.org/index.php?topic=31523.0

    Polling broken in current snapshot for igb interfaces
    https://forum.pfsense.org/index.php?topic=27126.0

    wan interface losing ip address
    http://lists.pfsense.org/pipermail/list/2012-July/002572.html

    • others, less relevant

    Bugs I've investigated:
    #3669
    #2704
    #2647
    #2919
    #1943



  • I know it's been a while, but were you able to figure this out?



  • All the log extracts in the original post show is the link cycling on the igb0 interface, with the inevitable consequences of pfSense stopping, starting and reloading various services.

    The original post is now 13 months old and refers to an obsolete and end of life version of pfSense that is based on an obsolete and end of life version of FreeBSD. If you have an issue with link cycling, it would be best if you describe your issue afresh, enclosing relevant log extracts.