WAN connection drops after 15 min high load [SOLVED]

nico_idskov

Setup:
2.1-RELEASE (i386) on Intel D945gsejt with Intel dual-nic PCI-card (fxp - Intel 82558 Pro/100).
WAN on fxp0, LAN on fxp1, OPT1 on re0 (RealTek 8168/8111)
100/100mbit connection (DHCP)

Problem:
My trusty pfSense FW has started loosing connection to the internet whenever i download hard for 10+ minutes.
WAN is still reported as up in pfSense, but no traffic gets through. LAN is fine.
Restart seems to be the only way to bring back the connection.

Tested:
Limit my download to 50mbit = i can go forever without incident.
Tried running WAN on the builtin NIC (re0) = Same result
Tried to bypass the FW by connecting a PC straight to the ISP = No problem. Running 96mbit download for over an hour without incident.
Update pfSense (was running 2.0 RC)

It used to work fine and i can't pinpoint a change made around the time when i first saw the problem.

Any ideas?
I'm just about to throw it out the window. :o

stephenw10

Nothing in the system logs?

Steve

nico_idskov

This shows up when it chrashes:

Mar 4 00:31:16 check_reload_status: Reloading filter
Mar 4 00:31:16 check_reload_status: Restarting OpenVPN tunnels/interfaces
Mar 4 00:31:16 check_reload_status: Restarting ipsec tunnels
Mar 4 00:31:16 check_reload_status: updating dyndns WAN_DHCP

stephenw10

That is more of a symptom than a cause. There are no apinger entries?

What do the RRD quality graphs look like?

You could try a 2.1.1 snapshot, I believe they have a number of fixes for apinger issues.

Steve

nico_idskov

Nothing before this in the log, and nothing after until the reboot.

Will try 2.1.1.

Never heard of RRD graphs before so i just need to figure out what and where they are first ;)

nico_idskov

RRD Quality looks fine in general but it's not hard to see when then WAN drops out.
See attached.

No package-loss except from when the WAN is down

Capture.PNG_thumb

stephenw10

Yet that doesn't show 100%. Could be the rounding that RRD does to fit the data.
You might try disabling apinger all together. Go to System: Routing: Gateways: Edit the WAN gateway, advanced section, disable gateway monitoring.
If that solves it you can instead tune apinger to better fit your line conditions.

Steve

nico_idskov

Well that at least made a difference.
Unfortunately not the difference i needed.

After disabling apinger the line crashed after 1-2 min at 100mbit DL.
I tried enabling it again, but now i only get 2 pings through after a reboot before it chrashes again.

Gateway log is now showing:
Mar 4 01:17:40 apinger: Exiting on signal 15.
Mar 4 01:17:21 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:17:11 apinger: Starting Alarm Pinger, apinger(24614)
Mar 4 01:17:10 apinger: Exiting on signal 15.
Mar 4 01:15:41 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:15:18 apinger: Starting Alarm Pinger, apinger(33575)
Mar 4 01:11:43 apinger: SIGHUP received, reloading configuration.
Mar 4 01:11:40 apinger: SIGHUP received, reloading configuration.
Mar 4 01:11:39 apinger: Starting Alarm Pinger, apinger(24298)
Mar 4 01:11:05 apinger: No usable targets found, exiting
Mar 4 01:11:05 apinger: Starting Alarm Pinger, apinger(38845)
Mar 4 01:10:28 apinger: Exiting on signal 15.
Mar 4 01:10:23 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:10:13 apinger: Starting Alarm Pinger, apinger(6083)

Had to bypass the FW now to go online.

stephenw10

Hmm. You could try setting an alternative monitor IP, something publically available like 8.8.8.8.

Steve

nico_idskov

Will try that tonight.
Also prepped a LiveCD on USB to test with.

nico_idskov

Running a fresh install with 2.1.1 embedded.
This seems to have fixed the issue.

Unfortunately i cannot provide much information on what the actual problem was, so for anyone else experiencing this i can only recommend that you try a fresh install of 2.1.1

Appreciate the help Steve!

bryan.paradis

I did read somewhere that this was to do with Multi-WAN stuff happening even on single WAN setups. Maybe that was corrected. Would that make sense?