NAT Trouble with CARP VIP in WAN

ttmcmurry

UPDATE - fixed, in the post below.

Hello - I'm having an issue which I would think is easy to solve, but I'm stumped. I'll provide more details below. Simply put, in manual NAT, as long as I keep the "interface address" as the WAN Address for the WAN interface, NAT works properly. However, the moment I swap the WAN Address for the CARP WAN VIP, I lose all connectivity to the internet. Changing back to the WAN Interface address restores connectivity.

I have not yet added in the 2nd pfSense firewall, so we're just working with basics here.

pfSense Version 2.4.2

There's 2 networks: LAN 10.10.10.251 and WAN 172.16.0.251. Gateway is 172.16.0.253.
There's 2 CARP VIPs: LAN 10.10.50.254 and WAN 172.16.0.254.

I have firewall rules in the LAN for both the interface address and the CARP VIPs so I can get to the firewall via the master firewall and actual firewall IP. These are working fine. Internet is also working fine.

The only plugin I'm using is pfBlockerNG. Snort is installed but disabled on the LAN.

Between the cable modem (DHCP WAN) and pfSense is a Ubiquiti EdgeRouter-X. This is providing the "poor man's NAT" to the internet since pfSense doesn't handle WAN failover without static IPs.

Prior to 2.4, this design was working fine in 2.3.

When I change the NAT translation address for LAN to the CARP WAN VIP, internet stops working. I checked the EdgeRouter and it is showing the correct source & destination IP in the NAT translation table, that is once the CARP WAN VIP is the translation address, the router is showing 172.16.0.254 is the IP sending/receiving data. The EdgeRouter is relaying data back to pfSense, but it once it hits pfSense, it goes nowhere. I've also tried power cycling the EdgeRouter, but it made no difference.

The EdgeRouter has the most simplistic policy on it - eth0 is the WAN in DHCP mode. eth1 is 172.16.0.253. Firewall policy has two rules on eth0 (1) allow established/related (2) drop invalid state on eth0. There are no other firewall rules on eth1. The NAT policy is "masquerade" all source/dest on eth0. The Router's only purpose is to provide the static IP on the gateway for pfSense.

There doesn't seem to be an obvious reason why pfSense's WAN_Interface IP works for NAT, while the WAN VIP doesn't. I've never seen pfSense behave like this before.

I also tried to see if the EdgeRouter would create a new flow before/after changing the NAT translation address in pfSense.

1) start a continuous "ping -t 8.8.8.8"
2) it will succeed
3) change the LAN NAT Translation address from WAN_Address to CARP VIP
4) Pings now fail
5) Check router's translation table
6) Flow still exists for .251
7) On the router, issue a "clear connection-tracking" (for the Cisco guys, it's the same as "clear xlate")
8) Pings still fail
9) On pfSense, change the LAN NAT Translation address back to WAN_Address
10) Pings still fail
11) Check router's translation table
12) Flow still exists for .254
13) Issue a "clear connection-tracking"
14) Pings now succeed

I can also set the WAN Interface IP to any number within the /24 it's configured for, and it works as expected. There is no loss of connectivity to the internet. The problem only happens with the CARP WAN VIP.

This all seems to be normal behavior from the router. I just can't explain what pfSense is doing.

ttmcmurry

Found my answer. Of course it took an entire post to go back and ask myself "did you check the fundamentals"..

I'm running on ESXi 6.5 .. While I had enabled the LAN vSwitch for promiscuous/forged/mac, I had not done the same for the WAN vSwitch.

Once I made that change, this worked. Let that be a lesson. It's not always the firewall, but it's almost always the user. :)