WAN connection drops after 15 min high load [SOLVED]

  • Setup:
    2.1-RELEASE (i386) on Intel D945gsejt with Intel dual-nic PCI-card (fxp - Intel 82558 Pro/100).
    WAN on fxp0, LAN on fxp1, OPT1 on re0 (RealTek 8168/8111)
    100/100mbit connection (DHCP)

    My trusty pfSense FW has started loosing connection to the internet whenever i download hard for 10+ minutes.
    WAN is still reported as up in pfSense, but no traffic gets through. LAN is fine.
    Restart seems to be the only way to bring back the connection.

    Limit my download to 50mbit = i can go forever without incident.
    Tried running WAN on the builtin NIC (re0) = Same result
    Tried to bypass the FW by connecting a PC straight to the ISP = No problem. Running 96mbit download for over an hour without incident.
    Update pfSense (was running 2.0 RC)

    It used to work fine and i can't pinpoint a change made around the time when i first saw the problem.

    Any ideas?
    I'm just about to throw it out the window.  :o

  • Netgate Administrator

    Nothing in the system logs?


  • This shows up when it chrashes:

    Mar 4 00:31:16 check_reload_status: Reloading filter
    Mar 4 00:31:16 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Mar 4 00:31:16 check_reload_status: Restarting ipsec tunnels
    Mar 4 00:31:16 check_reload_status: updating dyndns WAN_DHCP

  • Netgate Administrator

    That is more of a symptom than a cause. There are no apinger entries?

    What do the RRD quality graphs look like?

    You could try a 2.1.1 snapshot, I believe they have a number of fixes for apinger issues.


  • Nothing before this in the log, and nothing after until the reboot.

    Will try 2.1.1.

    Never heard of RRD graphs before so i just need to figure out what and where they are first  ;)

  • RRD Quality looks fine in general but it's not hard to see when then WAN drops out.
    See attached.

    No package-loss except from when the WAN is down

  • Netgate Administrator

    Yet that doesn't show 100%. Could be the rounding that RRD does to fit the data.
    You might try disabling apinger all together. Go to System: Routing: Gateways:  Edit the WAN gateway, advanced section, disable gateway monitoring.
    If that solves it you can instead tune apinger to better fit your line conditions.


  • Well that at least made a difference.
    Unfortunately not the difference i needed.

    After disabling apinger the line crashed after 1-2 min at 100mbit DL.
    I tried enabling it again, but now i only get 2 pings through after a reboot before it chrashes again.

    Gateway log is now showing:
    Mar 4 01:17:40 apinger: Exiting on signal 15.
    Mar 4 01:17:21 apinger: ALARM: WAN_DHCP( *** down ***
    Mar 4 01:17:11 apinger: Starting Alarm Pinger, apinger(24614)
    Mar 4 01:17:10 apinger: Exiting on signal 15.
    Mar 4 01:15:41 apinger: ALARM: WAN_DHCP( *** down ***
    Mar 4 01:15:18 apinger: Starting Alarm Pinger, apinger(33575)
    Mar 4 01:11:43 apinger: SIGHUP received, reloading configuration.
    Mar 4 01:11:40 apinger: SIGHUP received, reloading configuration.
    Mar 4 01:11:39 apinger: Starting Alarm Pinger, apinger(24298)
    Mar 4 01:11:05 apinger: No usable targets found, exiting
    Mar 4 01:11:05 apinger: Starting Alarm Pinger, apinger(38845)
    Mar 4 01:10:28 apinger: Exiting on signal 15.
    Mar 4 01:10:23 apinger: ALARM: WAN_DHCP( *** down ***
    Mar 4 01:10:13 apinger: Starting Alarm Pinger, apinger(6083)

    Had to bypass the FW now to go online.

  • Netgate Administrator

    Hmm. You could try setting an alternative monitor IP, something publically available like


  • Will try that tonight.
    Also prepped a LiveCD on USB to test with.

  • Running a fresh install with 2.1.1 embedded.
    This seems to have fixed the issue.

    Unfortunately i cannot provide much information on what the actual problem was, so for anyone else experiencing this i can only recommend that you try a fresh install of 2.1.1

    Appreciate the help Steve!

  • I did read somewhere that this was to do with Multi-WAN stuff happening even on single WAN setups. Maybe that was corrected. Would that make sense?

Log in to reply