PFSense Double NAT in shared office - connection to internet drops, connection to Modem is still alive

AlthalusAvan

Hey all, I've been chipping away at this problem for a while but I haven't been able to solve it yet.

We've just moved into a new office, and we were provided with a (crap) TP-Link wireless router for our space. This is connected to a Draytek Vigor 2860 somewhere downstairs.

The performance and security on the device we've been given isn't going to cut it, so I've been trying to set up a PFSense installation. Obviously double NAT isn't ideal, but it's worked fine on the TP-Link we were provided, and the MR200 I brought in from home.

On two different PFSense machines (both insanely overpowered, currently using a DL120 Gen9 with a Xeon E5-2620 v4, 2x Intel Gigabit NICs) we've had issues with intermittent network outages - every 5-10 minutes traffic will stop to the internet for about 10 seconds. During this time, the connection to other devices on LAN continues uninterrupted, I can connect to the Router without issue (although the webinterface is a bit slower), and the router can still ping the upstream modem / router.

In all cases, the WAN network was configured to DHCP, and LAN as 10.0.0.1/8, with DHCP range from 10.0.10.1 - 10.0.10.254.

I've solved as many issues as possible, and the only thing still showing up in system logs is something like this:

Dec 4 11:47:25 	LAN 	Default deny rule IPv4 (1000000103) 	10.0.10.12:57344		149.28.198.209:80		TCP:RA
Dec 4 11:47:17 	LAN 	Default deny rule IPv4 (1000000103) 	10.0.10.12:57343		104.251.212.114:80		TCP:RA
Dec 4 11:47:10 	LAN 	Default deny rule IPv4 (1000000103) 	10.0.10.12:57344		149.28.198.209:80		TCP:FA
Dec 4 11:47:06 	LAN 	Default deny rule IPv4 (1000000103) 	10.0.10.12:57343		104.251.212.114:80		TCP:FA
Dec 4 11:47:02 	LAN 	Default deny rule IPv4 (1000000103) 	10.0.10.12:57344		149.28.198.209:80		TCP:FA

I've been told these are innocuos and shouldn't be an issue, but I've also made some changes that should mitigate them (changed Firewall Optimization Options to Conservative).

Has anyone experienced anything like this? Any ideas?

stephenw10

Indeed those are fin and reset acks to connections the firewall has already closed the states for. They should not be a cause of a problem.

If there is no WAN connectivity that will slow the webinterface as it cannot resolve DNS. Try to ping out from the firewall itself during one of these outages. Check the system logs for and entries at that time.

If the WAN gateway still shows as up try setting the monitoring IP to something other than the local IP of the Draytek device, such as 8.8.8.8. That way you can at least log when it happens.

Steve

AlthalusAvan

Hey @stephenw10, thanks for the suggestions.

I modified the monitoring IP to 8.8.8.8 as suggested, and got the following:

Dec 5 10:36:44 	dpinger 		send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr 192.168.90.29 identifier "WAN_DHCP "
Dec 5 10:35:52 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 929878us stddev 2804739us loss 11%
Dec 4 23:00:01 	dpinger 		WAN_DHCP 8.8.8.8: Clear latency 3268us stddev 6690us loss 0%
Dec 4 22:59:07 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 2642025us stddev 4669399us loss 9%
Dec 4 22:58:53 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 1337958us stddev 3694440us loss 20%
Dec 4 22:58:50 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 379323us stddev 1752080us loss 21%
Dec 4 19:38:30 	dpinger 		WAN_DHCP 8.8.8.8: Clear latency 2903us stddev 1352us loss 0%
Dec 4 19:37:35 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 1982976us stddev 3634587us loss 8%
Dec 4 19:37:23 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 679549us stddev 1775578us loss 21%
Dec 4 19:37:17 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 535520us stddev 1488845us loss 11%
Dec 4 18:44:38 	dpinger 		WAN_DHCP 8.8.8.8: Clear latency 150929us stddev 634930us loss 0%
Dec 4 18:43:23 	dpinger 		WAN_DHCP 8.8.8.8: Alarm latency 1210216us stddev 2775341us loss 1%
Dec 4 18:35:19 	dpinger 		send_interval 100ms loss_interval 600ms time_period 60000ms report_interval 0ms data_len 0 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 8.8.8.8 bind_addr 192.168.90.29 identifier "WAN_DHCP "

As far as I can see in the logs there's nothing else corresponding with these times. I would suspect the upstream router, if not for the fact that it works fine with off-the-shelf netgear and TP-Link solutions. I might try a linux-based firewall and see if it could be something to do with BSD's network drivers.

stephenw10

I would say that is actual latency you're seeing and that pfSense logs and acts on it where the TP-Link does not.

If you only have one WAN connection you can set ' Disable Gateway Monitoring Action' in the gateway settings. That will continue to monitor and log latency but will not take action when the WAN latency goes outside limits (200ms by default).

The 10s outage could be caused by it reloading scripts when the WAN alarm is triggered.

Steve

AlthalusAvan

Thanks for the tip - I've applied it and we haven't had any drops in the 2 hours or so since. Will report back if it stays smoothed out!