Intermittent freezes of web GUI, complete block of outgoing WAN traffic
-
Quick overview:
- pfsense virtual router/firewall running on Hyper-V
- two site-to-site routed IPSEC tunnels (Fortigate on one site, very old Juniper in the other) [Juniper will be replaced by a modern Fortigate device hopefully today]
- periodic crashes/freezes of the pfsense web GUI coupled with complete blocking of outgoing WAN traffic
The pfsense virtual machine in question has experienced this problem since the beginning, which is around 2 months now. To rule out hypervisor issues, this VM has been moved from one AMD Epyc based server to another. After that moved again to another Intel-based server, then a new 2.7.0 server was installed and the config restored. Finally, yesterday evening I installed a new fresh 2.7.1 machine (*1) and re-configured all settings by hand (no part of the existing virtual machine or pfsense configuration was preserved from previous installs).
In short: the server has been moved/restored/reinstalled multiple times over 3 different physical servers. The weird problems persist.
Currently it seems the problems occur around 2-3 times a day. What exactly happens?
- pfsense web GUI becomes completely unavailable over WAN (site simply won't load, timeout error)
- pfsense continues to respond to ping over WAN without packetloss
- pfsense will not ping any outside IP address or hostname from console, but will ping LAN IP addresses
- no traffic moves across IPSEC tunnels
- gateways.log will randomly report "high" ping (using two different units in system.log and gateways.log I might add) and substantial packetloss (20% and up) [seems to be bogus alerts? no actual connectivity issues: see further]
Since at some point the problems seemed to coincide with the reported gateway issues I disabled gateway monitoring and monitoring actions. It had no effect, problems persist.
Currently there are 5 other pfsense virtual instances running on the same physical server, I have checked all of them multiple times and none of the other instances report problems with the WAN GW. No other client networks report any sort of connectivity issues. Not only that, one of the virtual pfsense instances is running almost an identical setup with 100x more load and it's been 4+ months now without a single issue.
IPSEC and system logs show nothing that could point towards a possible cause.
The Hyper-V virtual machine has plenty of resources provisioned: 4x vCPU, 8 GB of RAM, 64 GB disk. The server has plenty of juice (Dell Poweredge with Intel Gold series CPUs etc).
Running out of ideas here. At the moment I'm down to two possibilities:
- either one or both IPSEC tunnels are somehow causing these weird connection and freezing issues
- it's the WAN IP (I have no idea how or why this could be)
Really looking forward to some suggestions here.
(*1) It seems that the 2.7.1 brought up IPSEC P1 phases but no traffic was able to move across them. I have no time to troubleshoot this at the moment, so we're sticking with 2.7.0 for now.
-
@OhYeah-0 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
pfsense will not ping any outside IP address or hostname from console, but will ping LAN IP addresses
OK I would concentrate on this. I assume 'LAN addresses' means other hosts inside the LAN subnet? Those are things it can reach directly as long as they respond to ARP. Anything outside it's own local subnets requires routing so I would check that it still has a default route and that it's the correct one. So Diag > Routes in the GUI or
netstat -rn
at the CLI.
If it's wrong then make sure you set the default IPv4 gateway to the WAN gateway specifically rather than automatic in System > Routing > Gateways.Steve
-
I assume 'LAN addresses' means other hosts inside the LAN subnet? << Yes.
So Diag > Routes in the GUI or netstat -rn at the CLI. << Since GUI cannot be reached I will try it from the console the next time it happens. Thanks for the tip.
-
You should also try it now so you know what it looks like when it's working correctly.
-
@stephenw10 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
You should also try it now so you know what it looks like when it's working correctly.
Just had a small, max 5 minute blackout again, ran netstat -rn from console and everything there was OK.
-
Can it ping anything in the WAN subnet?
Is the gateway in the ARP table?
There has to be some reason it stops being able to pass traffic on WAN.
-
@stephenw10 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
Can it ping anything in the WAN subnet?
Is the gateway in the ARP table?
There has to be some reason it stops being able to pass traffic on WAN.
It doesn't ping anything, including our main firewall (which is running on pfsense also).
What makes it stranger still that it continues to reply to ping coming from outside.
As for the gateway ARP table, I will try to check it next time from console. I did check if ARP entries were okay on the main firewall (thinking this could possibly be the culprit), but all was fine there.
-
@OhYeah-0 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
What makes it stranger still that it continues to reply to ping coming from outside.
A ping that is already running? Could indicate the state table exhausted. Though I'd expect to see logs indicating that.
-
@stephenw10 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
@OhYeah-0 said in Intermittent freezes of web GUI, complete block of outgoing WAN traffic:
What makes it stranger still that it continues to reply to ping coming from outside.
A ping that is already running? Could indicate the state table exhausted. Though I'd expect to see logs indicating that.
No, not only. The number of states is very low. There is currently only one server hosted with us with low to moderate usage by a couple of people.
-
Hmm, still feels like a bad default route then. Since incoming traffic on WAN hits a reply-to rule there replies from the firewall have a route back even if no route is present.
What error is shown when you try to ping something external from the firewall?
-
Case solved.
The cause of the problems was somewhat embarrassing. It turns out that the WAN IP had been in use previously (years ago) and it was still tied to another firewall/router as a virtual IP on the same network and I simply never managed to catch the ARP conflict live in the perimeter firewall.
My faith in pfsense is restored but the faith in my own IQ is severely diminished. Thanks for help and have a nice weekend.
-
Ah, good result! Interesting that pfSense didn't log that. I would normally expect to see 'xxxx is using my IP address' entries.