pfSense on Proxmox loses connection to LAN at random
-
I have been running a pfSense VM as my router/firewall for around 18 months now, and for the longevity of its use, it will randomly lose connection to my LAN. Sometimes it happens as frequent as twice a week, sometimes as infrequent as once bi-monthly -- there is no pattern to it.
This morning it happened again. I woke up, no Internet throughout the house. As I can't use the web GUI to log into the device, I connect to my Proxmox host, access the console, and do an option 5 reboot. After it comes back up, it's all fine again. Before I did this in the morning, I pinged a few internal devices and no response, but pinging 1.1.1.1, for example, works just fine. However, when I ping the LAN interface from my desktop, I get a response, but nothing from 1.1.1.1. I've checked all logs I know how to find and there is no reference to any failure other than the reboot.
Checking the stats of the device around the time of the incident shows nothing that jumps out to me as a problem. The gap at 08:20 is when my Entuity server lost connection to the device.
Here is the WAN utilisation:
Here is the LAN utilisation:
The specs of the VM:
I'm currently running 2.7, but the same issue happened with 2.6. I have very minimal in the way of firewall rules, so I'm confident it's nothing to do with those.
Sadly I won't be able to do any active testing as I can't replicate the issue at will. But I'd appreciate any tips on methods to troubleshoot this.
-
At the risk of coming off rude/hate to say it, but this is one of the reasons many professionals recommend to never virtualize a firewall, weird issues come up and it can be really really hard to diagnose them because of that.
But that doesn't really help your issue lol, so I'll see what I can do because it should work. I don't use Proxmox myself (not recently), but virtualize a lot of pfSense for testing on my XCP-ng cluster and haven't seen anything like this on them.
When you console in do you see any errors on the pfSense output? I think that would be the first place to check is if something is crashing on pfSense itself.
Does anything else running on this Proxmox host behave odd at similar times? Any logs from the host about the VM?
I'd also maybe consider giving pfSense a bit more specs than that, they're within the minimum but 2GB and 1 core is pretty minimal even for a low speed requirement firewall.
Also do you have any packages running? Maybe try disabling a few to see if it could be a specific package crashing it, of course hard to verify that.
And my final question for now, any idea if maybe a ton of traffic is happening during these crashes? Maybe Steam is running background game updates or something? That would point to maybe not enough "hardware" assigned to the VM.
-
@abide I had a similar issue with a hardware switch. The switch stopped routing under some loads traffic is not sent to the required nic on the switch (Shadow protect image manager monthly consolidating with concurrent verification most commonly). Shown by using Wire shark on one of the switches NIC configured to monitor other NIC on the switch. My solution when the fault was isolated was to stop Image manager mediately verifying images.
In your case you are using Proxmox as a software switch / bridge. Perhaps the Proxmox bridge is locking up in some circumstances.
Btw I run pfsense under Proxmox but pass through all NIC used by pfsense.
-
@planedrop I understand. I would never recommend one of my clients to virtualise a firewall. This is just my home network and I enjoy the fact that I can contain everything in one host.
When you console in do you see any errors on the pfSense output? I think that would be the first place to check is if something is crashing on pfSense itself.
When I log into the console the only messages that show are login attempts.
Does anything else running on this Proxmox host behave odd at similar times? Any logs from the host about the VM?
No, everything else performs fine. For clarity on the host I also have a Win 11 desktop, a Win 19 server, a docker container with some lightweight apps, I run an Entuity server, a file server, and the pfSense box. All of which have been spun up at different times, and none of which seem to have affected the pfSense having issues.
I'd also maybe consider giving pfSense a bit more specs than that, they're within the minimum but 2GB and 1 core is pretty minimal even for a low speed requirement firewall.
I'll take that onboard and look at giving it some more resources.
Also do you have any packages running? Maybe try disabling a few to see if it could be a specific package crashing it, of course hard to verify that.
I don't have anything wildly custom on this router, but I'll do some digging.
-
@Patch I hadn't considered it could be to do with my switch. I have a 10-port Zyxel switch. I'll consider that when I do my troubleshooting.
For clarity, I have 2 NICs passed-through to the VM, one for LAN, one for WAN. I use the Mgt interface on the host for everything else.