Strange CARP behavioral change/bug in HA setup after upgrade from 2.6.0 to 2.7.0
-
I just upgraded to the latest 2.7.0 on an HA setup - everything went apparently well and everything seems ok on the 1st pfSense except that now I found I can no longer reach the second (backup) pfSense box through my IPSEC. With 2.6 I had a local gateway added pointing to the CARP gw IP plus a static route to tell the backup pfSense where to send the IPSEC originated packets back to as the active tunnel is on the first box and the second would not know of this route which is normal. Everything was working fine but now after the update I can see the packets reach the backup pfSense but it doesn't reply - and most importantly - I can no longer ping any CARP IPs active on the 1st pfSense from the second box (I can ping the active pfSense IP but not any currently active CARP IPs it has), which is very strange (CARP status is consistent on both boxes). At first I thought it was a routing/filter issue but after running some packet filtering it is evident IPSEC is fine and routing is fine until it reaches the destination box, but as this cannot reach the active CARP naturally there is nowhere the packets to return to.
Question: why is the standby pfSense no longer able to ping any active CARP IP of the 1st? Nothing has changed regarding switching (these are 2 VMs on the same vSwitch in promiscuous mode and have been so for years). I didn't find any mention of any changes in the releases notes apart from a couple of fixes. Any ideas?
Seems similar to this:
https://redmine.pfsense.org/issues/14026 -
When attempting to communicate between HA nodes, you should never use the CARP addresses since you have no idea where those will land. Especially if it's something hardcoded going between the nodes.
Even the suggested workaround in the docs specifically calls out to not use the VIPs:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/ha-vpn-secondary.html
The CARP behavior you describe is known but doesn't affect how the system works in practice when used appropriately.
-
@jimp Thanks for the prompt response: I understand the implications and the suggested methodology however what stands is that something has changed sinced 2.6 - I really never had any issue before using CARP IPs between nodes. Mind you it's not a big deal (the core functionality is untouched), it was just a surprise to me which got me investigating. The ongoing problem is though that even using the real LAN IP of the master box with the static route and master real IP as gateway packets from the IPSEC still don't return - they reach the 2nd pfSense LAN but they are lost into oblivion so at the moment my 2nd pfSense is unreachable directly through my IPSec.
Question: is it normal that after upgrading from 2.6 to 2.7.0 I can't ping from the Backup any CARP IP active on the Master while up until 2.6 everything was pinging and reachable from both nodes?
-
For the VPN issue make sure you also have the outbound NAT rule(s) described in the linked doc. Without that, the return traffic would get caught by the IPsec policy and would be dropped.
For the CARP ping thing, yes, read the Redmine issue I linked. It's known/expected. Something upstream in FreeBSD changed there.
-
@jimp Yes I found that issue too (I linked it in me OP) which is why I asked confirmation. I guess there is an open issue that has to be resolved (or not?) at filter level. Luckily I upgraded only my lab boxes - will wait for possible resolution b4 upgrading other HA installs as 2.6 has been working well for me with my several installs, no hurry. What also changed apparently due to this is the CARP behaviour of the pfBlocker interface - in an HA setup you select CARP instead of VIP and I usually set a /29 subnet in order for it to detect the master or backup correctly (else they both stay active - at least on 2.6) - now whatever I do they are always both active (and set to /32), no big issue but that's also a change and probably to be confronted. Anyhow thanks for the feedback!
-
If you can try setting
net.inet.ip.source_address_validation=0
on your lab HA pair. Let us know if that allows the previous behaviour.Steve
-
This post is deleted! -
@stephenw10 Hi Steve, I can confirm preliminarly that setting net.inet.ip.source_address_validation to 0 does in fact return to the previous behaviour even without reboot (the firewall apparently picks it up runtime after a brief non responsive period), I am now able to ping the active CARP addresses from the backup machine. I haven't done any further testing but I will leave that setting to 0 for now and see if everything behaves as it should.
Cheers -
Ok great. We need to confirm what changed there and what effects resetting that might have but it looks like we will probably set that by default.
-
@stephenw10 probably not directly related but I noticed the pfb_dnsbl service no longer starts unless I manually edit it's CARP iface subnet to /29 for ex.. Unfortunately this only lasts until the next update so it isn't much of a resolution. I posted this in the pfBlocker section but haven't had any feedback on the matter - I suppose that when setting the pfb iface in CARP mode and listening on the LAN the iface subnet should be anything other than /32 being in HA - I don't think this is related specifically to the kernel update though again something changed here too.
-
Mmm, that does seem like an unrelated issue but that shouldn't happen obviously.
-
@stephenw10 don't they have a Redmine open on this?
https://redmine.pfsense.org/issues/14524
https://redmine.pfsense.org/issues/14026
Is this the same issue?
-
@JonathanLee said in Strange CARP behavioral change/bug in HA setup after upgrade from 2.6.0 to 2.7.0:
https://redmine.pfsense.org/issues/14026
It's for sure related to the 14026 Red Mine which I linked in my OP and this settings resolves it. I'm not sure the 14524 is directly related though as this specifically seems a UI issue, not a core issue - but take it with a grain of salt.
-
@stephenw10
Thanks, I had same issue after upgrading to 2.7.0
I can now ping the CARP VIP from the backup node when adding this System Tunable setting.