Strange NAT problem with multi-wan and CARP/not CARP
2 pfSense boxes (named pf1 and pf2) at version 2.3.1-RELEASE-p1
CARP setup for LAN, WAN1, WAN2 but not WAN3
WAN3 is a satellite connection that uses DHCP for addressing. It has a different IP address on pf1 and pf2.
I have a NAT rule defined for the "WAN3 address" to an internal address, with the associated firewall rule.
I can connect through that NAT rule on pf1 but not pf2. >:(
I can ping out and in on WAN3 on both pfSense boxes.
I've checked that the NAT and firewall rules are the same on pf1 and pf2. Both use aliases, and the aliases are the same.
The firewall log does show plenty of blocks from "Default deny rule IPv4 (1000000103)" on the WAN3 interface, but none for the source IP address that I am using to try this.
Routing problem with the ISP. Probably not this, since I can ping in and out.
Maybe the fact that WAN3 is not CARP and the others are is somehow messing with the definition of "WAN3 address" on pf2?
Any ideas? I'm stumped.
I attempted to test theory #2 by adding a NAT and firewall rule with the IP address instead of "WAN3 address". It did not help the situation.
I created a pass+logging firewall rule for this one source IP address, and the firewall logs show that the traffic is passing the firewall, yet the connection still times out.
My current theory: pf2 is attempting to pass the internal/LAN traffic over the CARP link to the internal address, but cannot since pf1 is the CARP master. This is supported by the fact that from pf2 I can ping the internal address over the LAN interface but not the LAN/CARP interface. There doesn't seem to be a spot in the NAT rule to specify which interface to use to reach the "Redirect target IP".
Diagnostics -> States shows
WAN3 tcp correct_internal_ip_and_port (corrrect_wan3_ip_and_port) <- my_testing_ip:48497 TIME_WAIT:TIME_WAIT 7 / 0 424 B / 0 B
I had to open a support ticket to get this fixed. Here is the reply from the technician:
Upon my initial reading here is what I think is happening:
Inbound connection arrives on pf2:WAN3
pf2 forwards the connection to the internal host
The internal host replies but its default gateway should be the LAN interface's CARP VIP which is currently on pf1
pf1 does not know what to do with the traffic so it is dropped.
The typical work around for this would be an outbound NAT entry on LAN so all traffic going to the inside host appears to come from the interface address on LAN. That will make the reply traffic same-subnet so the default gateway in the target host will not need to be used.
The downside is you lose the ability to see the actual outside source addresses in the logs/connections on the inside host. This might or might not be important to you.
This turned out to be exactly the problem. Adding an "outbound NAT" entry solved this.