Intermittent pfSense partial outage
-
I'm using a pfsense device for home/office use, with WAN port connected to Virgin Media ISP (UK) - using the Virgin Media router in modem mode.
Around twice a day on average, not at predicable times, I get an outage with the following characteristics :
-
The pfsense router is reachable from the LAN
-
Both
LANVPN and WAN gateways are reported as up (WAN gateway is pinging 8.8.8.8). LAN, WAN and VPN interfaces look okay : -
All devices on the LAN are reachable from one another
-
Incoming OpenVPN continues to operate correctly.
-
There are no obvious error messages or firewall
block
messages reported in the logs -
Long running open states (like outbound VPN) remain operational.
-
From the LAN I cannot ping or connect to any internet hostname or IP address (*)
$ ping www.google.com ping: cannot resolve www.google.com: Unknown host
- From the LAN I can ping the ISPs DNS servers :
$ ping 194.168.4.100 PING 194.168.4.100 (194.168.4.100): 56 data bytes 64 bytes from 194.168.4.100: icmp_seq=0 ttl=61 time=11.463 ms
- pfsense will not resolve internet hostnames :
$ dig www.google.com ; <<>> DiG 9.10.6 <<>> www.google.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 25715 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;www.google.com. IN A ;; Query time: 235 msec ;; SERVER: 192.168.1.1#53(192.168.1.1) ;; WHEN: Mon May 25 09:42:03 BST 2020 ;; MSG SIZE rcvd: 43
- I can resolve internet hostnames by directly specifying the server
dig @194.168.4.100 www.google.com ; <<>> DiG 9.10.6 <<>> @194.168.4.100 www.google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36979 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;www.google.com. IN A ;; ANSWER SECTION: www.google.com. 213 IN A 216.58.211.164
- I cannot ping any internet hostname from the LAN :
ping 216.58.211.14 PING 216.58.211.14 (216.58.211.14): 56 data bytes Request timeout for icmp_seq 0
- I CAN ping internet hostnames directly from pfsense using the WAN as the selected source address.
The only resolutions that I have found which seems to consistently work to resolve this are :
- Reset the Virgin Media modem, or
- Reboot the router.
I'm at a bit of a loss on how to tackle this problem, so any insight would be gratefully received. I'm not sure whether this some strange limiting that Virgin Media is doing, or whether this is a pfsense malfunction (e.g. something going wrong with NAT). Happy to share in dump of pfsense config / logs as necessary.
strikethrough text -
-
Hi,
Do you use the Resolver ?
Check the Resolver logs.
Is it restarting at these very moments ? And if so, what is the time between 'stop' and 'started' log messages ?May 25 08:24:13 unbound 15536:0 info: service stopped (unbound 1.10.1). .... May 25 08:24:14 unbound 15536:0 notice: Restart of unbound 1.10.1.
Or is it a VM / multicore setup ?
Checkout the forum for the "FreeBSD 11.3 ip (firewall)" issue. -
Yes, using the Resolver. The resolver logs show configuration is reloaded every (approx) 5 to 15 minutes :
May 25 10:45:04 filterdns merge_config: configuration reload
But no service stop and restart.
The setup is using FreeBSD 11.3 :
I do see some similar filter reload activity in the system logs :
-
@rrab said in Intermittent pfSense partial outage:
Both LAN and WAN gateways are reported as up
There shouldn't be a LAN gateway.. Are you saying you can ping it so you know its up? You don't set a gateway on LAN interfaces..
Did you manually create a downstream gateway.. Is there some router downstream on your lan network? This wouldn't be set on the interface as a gateway.. And even if set up manually to get to a downstream router, this shouldn't be your LAN, that should be a transit network.. or you going to run into asymmetrical routing issues if devices are this LAN network.
-
Apologies, I mis-typed. I intended to say that both WAN and OpenVPN gateways report as online : and all interface report as okay.
-
@rrab said in Intermittent pfSense partial outage:
filterdns merge_config: configuration reload
Filterdns would run when there is an update to aliases you have created.. They by default run every 5 minutes.. this would/should not effect unbound being able to resolve..
What is more likely is your having an issue with your vpn.. And your trying to have unbound to use your vpn to resolve? This gateway might have gone down, or your IP changed and unbound is not bound to it..
You can see log entry where your openvpn was restarted. Looks like you had a gateway alarm - why are you using 8.8.8.8 vs just your isp gateway to monitor?
The best thing you can do if your wanting to resolve through a vpn, is run your resolver downstream of pfsense - so you can policy route the traffic.. Vs unbound running on pfsense having to bind to the vpn interface to be able to use it to resolve. Other option is on your outbound interface in unbound.. Set it to only use the localhost.. This way its queries will be routed via your routing... Pretty much all users setting up vpn services allow the vpn service to be default (uggghhh)..
-
I have no good reason for using 8.8.8.8; I did this as part of trying to diagnose this issue, as the behaviour I was observing was that the internet was unavailable, but the gateway was still reporting as up, my supposition was that the ISP was prone to upstream failures - but I can revert this.
I don't follow the recommendation (my lack of knowledge). Are you saying that the gateway alarm is result in openvpn restarting, and (for some reason) - that this resulting in being me unable to make connections outbound from LAN to WAN?