pfSense stops working for about 2 minutes after applying any new or changed configuration
-
I'm running a HA installation of pfSense Community on Hyper-V. I've been doing this now for several years and as far as I remember, since the upgrade from 2.5.2 to 2.6.0, every change I make to the configuration, from adding a new port fowarding configuration or changing an IP on a current config everything stops working. I start to receive timeouts from external websites monitoring tools I use for my websites published on pfSense, I lost connectivity to the GUI, my VPN connections drop, internet access for my client computers stops working and everything stays like this for about two minutes. The standby pfSense doesn't kick in and I have to wait for everything to start to work again. It takes longer than if I take the active vm down to the standby one come up.
Since I've done the upgrade, last night I decided to rebuild the entire environment using a fresh installation for both active and passive pfSenses and even with the fresh installation the problem remains.
I have no idea where to go from here. Has anybody seen anything like this or have any suggestions?
Thank you. -
Are you using VLANs?
If so check this: https://forum.netgate.com/topic/169884/after-upgrade-inter-v-lan-communication-is-very-slow-on-hyper-v?page=1 -
@stephenw10
Thanks for the reply. I use VLANs but my intervlan routing happens mostly on a L3 switch. On my pfSense I have very specific rules only for a single vlan that I use for a guest wifi network.
It's a more general problem. I don't know if the state table gets reset when the configuration is changed, if it's a default behavior or if it's something else but as in the case you mentioned, I also think it happened after the upgrade from 2.5.2 to 2.6.0, but I don't think of going back. It would be a huge hassle... Well, I can go back, go through the entire configuration process again, but, what's the purpose of using an old version of the software. I really need to understand what's gone wrong here and fix it. That would be the best scenario. -
If it's in hyper-v did you not snapshot the VMs before the upgrade?
Proving it's actually a regression after the update would certainly help in troubleshooting it.
But otherwise I would run
top -HaSP
at the console and see what happens when you save apply a change.Is there nothing logged when the delay happens?
Steve
-
@stephenw10
Hi Stephen. I don't work with production snapshots. The only thing I see on the logs is configuration refresh. I decided to rollback to pfs 2.5.2 with the same configuration to see what happens. It was working before, so I suppose it'll work again. I don't have any specific feature I'm interested on version 2.6.0. I updated just to get the most recent version. I'll let you know what happens after the "forced" rollback. -
@rafael9908 said in pfSense stops working for about 2 minutes after applying any new or changed configuration:
The only thing I see on the logs is configuration refresh
But you do see log entries with a 2min delay?
-
@stephenw10
No I don't. It's like everything is working as it should but it takes two minutes or more for the connections to get established again. I don't see any gaps like these. Things just takes longer to get back. -
Hmm, OK. Well let's see if it's repeatable.
Steve
-
@rafael9908
A netgate partner recommended me to go back to 2.5.2 saying there were some incompatibilities with the new version of freebsd and hyper-v... -
Yes, that thread I pointed to. The RSC support added in the FreeBSD hn(4) driver had a pretty large bug! https://redmine.pfsense.org/issues/12873
It's fixed now in 22.05 and 2.7 snapshots.
Steve
-
@stephenw10
I'm running an old version of Hyper-V. I have a 2012 R2 cluster. As far as I understood, that doesn't apply to me. I'm not in a good situation at all here...lol -
Ah, maybe. You could be hitting some other issue I'm not aware of then....
-
@stephenw10
After the rollback to 2.5.2 I'll let you know what happened. -
@rafael9908
after the rollback to the 2.5.2 everything is working as it should!!! -
Hmm, OK. And that was a clean 2.5.2 install with the configs restored?
Are you able to snapshot that and test the upgrade again? Or even test a 2.7 snap?
Steve
-
Usually if something stops after applying changes it's because you have the option enabled to kill states when a gateway is down and a gateway is down. Check under System > Advanced on the Misc tab to see if that option is enabled.
It takes a while for your browser to realize the state is gone and make a new connection.
If you try again, check the gateway status to see if maybe one of the gateways is showing as down on the newer version where it didn't on the older one.
-
@stephenw10
It's a clean install with the config restored.
At least for now I'm not going to do further tests with it. I had too much trouble. I'm gonna rest for a bit. -
@jimp
I came across a post talking about this option. My configuration already had it disabled.