CARP with one public IP, outbound NAT crashes backup node


  • Hi All

    I have found a number of articles about setting CARP up with only a single public IP on the WAN side. I followed the instructions using a 192.168.254.241/29 and 192.168.254.242/29 for the two physical interfaces connected to the internet, then a CARP IP of the public IP (Lets say 1.2.3.4) and a gateway of 1.2.3.1.

    It worked brilliantly, although the secondary/backup node stops responding after a few minutes. I initially thought it was pfsync and disabled that but it still caused a problem.

    What I've narrowed it down to is the outbound NAT rule which I've read:
    "NOTE: Never add outbound NAT rules that could match the WAN/Public IP addresses of the cluster. This includes both rules that have the public IP addresses listed explicitly and also rules that have any set as a source. These NAT rules will cause other problems/unintended behavior, and will break outbound connectivity from the secondary node when it is in a BACKUP state."

    So I tried creating the outbound NAT rules as:
    1. Where source is the CARP internet IP (1.2.3.4), destination is *, DO NOT NAT
    2. Where the source is 192.168.254.240/29 (the local IPs i'm using on the WAN interfaces), DO NOT NAT
    3. If source is RFC1918 (an alias I have for 10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12) then NAT using the CARP address as the translation IP

    Hoping that this would comply with the above warning of never outbound NATting the WAN IP, but the UI still became completely unresponsive on the secondary and I'd get a heap of errors on the primary for the XMLRPC sync failures.

    The public IP for the WAN side is actually in a /29 and I had two extra IPs that I could repurpose to test, so I changed from using the 192.168 addresses to extra IPs in the public subnet (so 1.2.3.2 on the primary, 1.2.3.3 on the secondary and 1.2.3.4 as the CARP IP) and everything works perfectly with a single NAT rule of "source: RFC1918, destination ANY, NAT via CARP" so it must be related to the different subnet on the WAN side.

    Has anyone else experienced this, or is there something I'm missing that can make this work as I dont want to "waste" the two external IPs because of this hurdle.

    Thanks for any help in advance