NAT to WAN CARP IP loses connectivity on failover
Hello to everyone!
Let`s cut straight to the case!
I have a dual box HA/redundancy setup with pfsense. Boxes are identical and mostly everything works (including failover and state synchronization).
I also have net.inet.carp.preempt set to 1.
Now on the WAN I have a single public IP for each PFsense box (x.x.x.1) and (x.x.x.2). Then I have a CARP virtual IP x.x.x.3, which is used for internet connectivity. I also have a CARP virtual IP x.x.x.4 (for VPN LAN NAT) and x.x.x.5 (for LAN1 NAT), which are both also on the WAN interface and both are failing over just fine.
When I use a testbox in LAN1 to reach LAN2 and send a file, failover happens just fine and connection is being kept up since the states are being synched.
When I use testbox in LAN1 to reach the world and download an .iso file, the NAT to x.x.x.5 is being used and the testbox receives the file.
During this download from the internet I can see the states (with the indication of NAT) are being synched and appear on the slave machine, however, when the failover happens, the connection is being stalled (I use wget for the download and it shows a stalled connection). The download does not continue through the slave PFsense box, although it has the state synched, but the moment the master box comes back to life, the connection resumes and download finishes.
So the statement/question is:
NATed download TCP connection through WAN CARP VIP halts on failover, although the states are being synched and appear on the slave machine.
What could be at fault and how would one fix this issue?
If any additional information is needed I will provide it, but I feel, that config might not be at fault here since the only thing that seems to differ in the two tests is that one is done using NAT.
The same problem persists when I change the NAT address to be an IP alias to the WAN CARP IP.
Also, when the failover happens, the new master drops the state from its state table, while the previous master keeps it. When it fails back over, the slave, that dropped the state has it up again.
I tried only having one CARP IP on the WAN interface and having NAT translate the addresses to that, but it still doesn`t fail over. The moment I enter persistent CARP maintenance mode the other box drops the NAT states.
What could be at fault?
I managed to resolve this problem myself, when I found that Snort package, which I had installed and configured on WAN, was dropping the state on the slave (the box that is becoming the master on failover), because of the ongoing download for which the initialization was only seen by the snort on the previous master. Hence Snort thought it was an intrusive packet and denied the connection.
That also explains, why it was all good again on the master box after failing back.
Hope this helps someone, and sorry for the wasted time!