WAN connection drops after 15 min high load [SOLVED]
-
Setup:
2.1-RELEASE (i386) on Intel D945gsejt with Intel dual-nic PCI-card (fxp - Intel 82558 Pro/100).
WAN on fxp0, LAN on fxp1, OPT1 on re0 (RealTek 8168/8111)
100/100mbit connection (DHCP)Problem:
My trusty pfSense FW has started loosing connection to the internet whenever i download hard for 10+ minutes.
WAN is still reported as up in pfSense, but no traffic gets through. LAN is fine.
Restart seems to be the only way to bring back the connection.Tested:
Limit my download to 50mbit = i can go forever without incident.
Tried running WAN on the builtin NIC (re0) = Same result
Tried to bypass the FW by connecting a PC straight to the ISP = No problem. Running 96mbit download for over an hour without incident.
Update pfSense (was running 2.0 RC)It used to work fine and i can't pinpoint a change made around the time when i first saw the problem.
Any ideas?
I'm just about to throw it out the window. :o -
Nothing in the system logs?
Steve
-
This shows up when it chrashes:
Mar 4 00:31:16 check_reload_status: Reloading filter
Mar 4 00:31:16 check_reload_status: Restarting OpenVPN tunnels/interfaces
Mar 4 00:31:16 check_reload_status: Restarting ipsec tunnels
Mar 4 00:31:16 check_reload_status: updating dyndns WAN_DHCP -
That is more of a symptom than a cause. There are no apinger entries?
What do the RRD quality graphs look like?
You could try a 2.1.1 snapshot, I believe they have a number of fixes for apinger issues.
Steve
-
Nothing before this in the log, and nothing after until the reboot.
Will try 2.1.1.
Never heard of RRD graphs before so i just need to figure out what and where they are first ;)
-
RRD Quality looks fine in general but it's not hard to see when then WAN drops out.
See attached.No package-loss except from when the WAN is down
-
Yet that doesn't show 100%. Could be the rounding that RRD does to fit the data.
You might try disabling apinger all together. Go to System: Routing: Gateways: Edit the WAN gateway, advanced section, disable gateway monitoring.
If that solves it you can instead tune apinger to better fit your line conditions.Steve
-
Well that at least made a difference.
Unfortunately not the difference i needed.After disabling apinger the line crashed after 1-2 min at 100mbit DL.
I tried enabling it again, but now i only get 2 pings through after a reboot before it chrashes again.Gateway log is now showing:
Mar 4 01:17:40 apinger: Exiting on signal 15.
Mar 4 01:17:21 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:17:11 apinger: Starting Alarm Pinger, apinger(24614)
Mar 4 01:17:10 apinger: Exiting on signal 15.
Mar 4 01:15:41 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:15:18 apinger: Starting Alarm Pinger, apinger(33575)
Mar 4 01:11:43 apinger: SIGHUP received, reloading configuration.
Mar 4 01:11:40 apinger: SIGHUP received, reloading configuration.
Mar 4 01:11:39 apinger: Starting Alarm Pinger, apinger(24298)
Mar 4 01:11:05 apinger: No usable targets found, exiting
Mar 4 01:11:05 apinger: Starting Alarm Pinger, apinger(38845)
Mar 4 01:10:28 apinger: Exiting on signal 15.
Mar 4 01:10:23 apinger: ALARM: WAN_DHCP(95.109.99.1) *** down ***
Mar 4 01:10:13 apinger: Starting Alarm Pinger, apinger(6083)Had to bypass the FW now to go online.
-
Hmm. You could try setting an alternative monitor IP, something publically available like 8.8.8.8.
Steve
-
Will try that tonight.
Also prepped a LiveCD on USB to test with. -
Running a fresh install with 2.1.1 embedded.
This seems to have fixed the issue.Unfortunately i cannot provide much information on what the actual problem was, so for anyone else experiencing this i can only recommend that you try a fresh install of 2.1.1
Appreciate the help Steve!
-
I did read somewhere that this was to do with Multi-WAN stuff happening even on single WAN setups. Maybe that was corrected. Would that make sense?