PfSense freeze when WAN down - due IPSec
-
Hello everyone,
I have a stability problem with pfSense IPSec when the WAN is down.
- PfSense 2.1.4 on VMware vSphere 5.5 with OpenVM Package.
- 10x vNetwork card E1000 (vmxnet3 is not displayed)
- em0 is the WAN
- 13x VPN tunnel
It's all very well, but when there is an Internet down due to the operator (VOO) pfSense stops. (freeze / lag, iSCSI down)
To reboot, it remains long on starting IPSec and Starting syslog if WAN is not UP (gateway not available). And the GUI latencies of several minutes
To test, I disabled the 13x VPN tunnel, and there pfSense boot normally and is stable. (no slowdown on IPSec and syslog)
It is obvious that the tunnels fall in an internet break, but pfSense should not crash and provide internal routing …
Best Regards,
-
Are those IPsec connections reliant on DNS? I believe it waits a long time in that case during boot time, maybe much longer than it should.
In general, losing a WAN or your only WAN has no impact on what else the system will do. It'll still route traffic internally where the networks remain reachable and functional, it'll still be able to be managed no problem, and it definitely won't freeze. It sounds like maybe DNS timing out is causing delays in IPsec and maybe in the web interface as well.
You mentioned iSCSI down, what's the relation to that iSCSI and the firewall? It routing the ESX host's iSCSI traffic for its own disk?
-
Thank you for your reply.
Yes, I have 3 VPN tunnel "DynDNS" of the 13 tunnels.
This is quite demonstrative, if I remove the cable network Wan, pfSense will freeze:
- HTML interface unresponsive
- It is very slow in SSH
- No communication between the host ESXi vSphere (VLAN 1) and vCenter Server VM (VLan 3)
There are many VM have access to an iSCSI disk routed pfSense, but nothing of great importance (Veeam, Ahsay … VM for backup)
For cons, the host is on a VLan vSphere, vCenter Server and is another VLan and routing is not performed when the WAN line is down.
I must also say that I use this parameter: Force Reload on IPsec Failover
I would have tried to disable the VPN tunnel "DynDNS" type and simulate a WAN break to see what happens.
-
Seeing the output of "top -SH" run from a command prompt at the console (or via SSH) while it's sluggish would probably be helpful.
This is probably something most effectively worked through via commercial support and we give a break on time billed where its cause is a bug in the base system so you don't have to worry about racking up time on that. At least if you need a really quick resolution. I'll continue to work with you here as time permits.
For cons, the host is on a VLan vSphere, vCenter Server and is another VLan and routing is not performed when the WAN line is down.
That's one of the main scenarios I talked about in our last hang out where having virtual firewalls is a bad idea. If your hypervisors can't be properly managed without the virtual firewall, don't use a virtual firewall in that circumstance. You will inevitably get into a circumstance where you have to jump through hoops to fix something because you can't route to some portion of your virtualization management. It's happened to us internally and at least a handful of our customers where something happens, some or all of your VMs end up powered off, and you need to get in to power them back on. Except you need that firewall running in order to reach the system you must reach to turn it on. Oh…
Virtual firewalls are a great fit for many things, as I discussed in depth in the hang out (recording available for Gold subscribers who missed it). For routing traffic required to manage your virtualization environment? That's not one of them. :)
-
Seeing the output of "top -SH" run from a command prompt at the console (or via SSH) while it's sluggish would probably be helpful.
I tried htop … I do not know FreeBSD well ;-)
This is probably something most effectively worked through via commercial support and we give a break on time billed where its cause is a bug in the base system so you don't have to worry about racking up time on that. At least if you need a really quick resolution. I'll continue to work with you here as time permits.
I'll consider! Respect for the work of the team too. For now, I'm experimenting on a small structure pfSense …
That's one of the main scenarios I talked about in our last hang out where having virtual firewalls is a bad idea. If your hypervisors can't be properly managed without the virtual firewall, don't use a virtual firewall in that circumstance. You will inevitably get into a circumstance where you have to jump through hoops to fix something because you can't route to some portion of your virtualization management. It's happened to us internally and at least a handful of our customers where something happens, some or all of your VMs end up powered off, and you need to get in to power them back on. Except you need that firewall running in order to reach the system you must reach to turn it on. Oh…
Indeed, to help me, I have access to the VLan 1 with the stand alone console of VMWare, but this is not practical, and what about remote access in case of problems …
I have to upgrade the infrastructure to make test HA and DualWAN.
For HA, I still hesitate between the HA and VMware HA pfSense. What track do you recommend? -
HA within the VMs is always better than hypervisor-level HA, where you can cluster anything inside the VM it's best. Hypervisor-level HA most always reacts slower for failover (in pfSense scenarios at least), and it does nothing for you with upgrades or other maintenance needs within the VM. Most people don't bother with any kind of HA on the VMs for pfSense, they just setup their environment as such that the primary and secondary firewalls are always on different physical hosts.
To clarify a bit - generally people do have the VMs set to start on another host if their host dies, one might consider that a form of "HA", I was more referring to features in certain hypervisors where the VM can run simultaneously on two physical hosts and quickly pick up if one host fails. That level of HA is a waste of hardware resources in most all cases IMO.