Nat reflection issues with Pure NAT

siteunfold

TLDR; PureNAT reflection for a local website with external IP address stops working when introducing multiple VLANS, why?

We're having some issues with PureNAT reflection. I'll start off by describing our network which we are using for testing:

WAN: 1.2.3.2/29 (public IPv4)
WAN VIP: 1.2.3.4/32 (public IPv4)
(we also run CARP, but ignoring that for simplicity)

LAN: 192.168.1.1/24 (management network, configured on every VM)
OPT1: 192.168.2.1/24 (nginx load balancer VM)
OPT2: 192.168.3.1/24 (backend web server VM)

I have a port forward configured for 1.2.3.4:443/tcp (WAN VIP) going to 192.168.2.2:443/tcp (OPT1) which is a load balancer which fetches the request from the web server on 192.168.3.2:443/tcp (OPT2).

Other than the web GUI anti-lockout rule on LAN, we have the same any:any/any on all interfaces. We are using automatic outbound NAT.

a) External requests SUCCEED taking the path (curl https://1.2.3.4/):

9.9.9.9 > 1.2.3.4:443 > 192.168.2.2:443 > 192.168.3.2:443 > return.

b) Internal requests SUCCEED so long as the default gateway is OPT1/OPT2 (curl https://1.2.3.4/):

192.168.2.123 (OPT1) > 1.2.3.4:443 > 192.168.2.2:443 > 192.168.3.2:443 > return.

c) Internal requests FAIL if the gateway is not OPT1/OPT2 (curl https://1.2.3.4/):

192.168.1.123 (LAN) > 1.2.3.4:443 > 192.168.2.2:443 > 192.168.3.2:443 > return.

I can work around the problem by changing PureNAT to NAT+Proxy, but I don't understand why PureNAT would not be working.

I'm banging my head against the wall trying to understand at what point the packet is getting confused and rejected.

Any pointers on why PureNAT is not working here? Is NAT+Proxy bad? Is there a gotcha I am missing or something I need to read that better explains how this works?

viragomann

@siteunfold said in Nat reflection issues with Pure NAT:

c) Internal requests FAIL if the gateway is not OPT1/OPT2 (curl https://1.2.3.4/):

What does it mean, "the gateway is not OPT1/OPT2"?

Why don't you point your internal requests directly to the web server or even the load balancer?
When using host names to access the server, you could simply go with host overrides.

siteunfold

@viragomann said in Nat reflection issues with Pure NAT:

What does it mean, "the gateway is not OPT1/OPT2"?

Each VM has a management interface (LAN) and a second network interface, OPT1 or OPT2.

the gateway is not OPT1/OPT2 === the gateway / default route is via the LAN / Management interface

Why don't you point your internal requests directly to the web server or even the load balancer?
When using host names to access the server, you could simply go with host overrides.

We are aware of other approaches like split DNS. We prefer NAT reflection as it shouldn't require maintenance like maintaining split DNS would (e.g. when hostnames move IP etc). Especially a solution with many websites/hostname being hosted.

viragomann

@siteunfold said in Nat reflection issues with Pure NAT:

Each VM has a management interface (LAN) and a second network interface, OPT1 or OPT2.
the gateway is not OPT1/OPT2 === the gateway / default route is via the LAN / Management interface

Any meaningful reason for having two network interfaces on the VMs? Management and services can also securely go over a single interface. The VMs are connected to a firewall, which can control and restrict all access to and from the VMs.
Multi-home often ends in routing issues.

Also consider that if both devices are connected to the LAN switch, they can unchecked communicate with each other LAN IP.

If the default route points to the LAN, the packets between both subnets have to pass the LAN interface of pfSense. Hence you need a rule on LAN to allow this traffic. Do you have one?

siteunfold

@viragomann

Any meaningful reason for having two network interfaces on the VMs?

No.

If the default route points to the LAN, the packets between both subnets have to pass the LAN interface of pfSense. Hence you need a rule on LAN to allow this traffic. Do you have one?

For testing purposes we have the rules on all three interfaces (LAN/OPT1/OPT2) as source: any, dest: any etc. As permissive as it gets.

All vm's can communicate (ssh, curl, etc) with the other vm's on both interfaces if I directly use the LAN, OPT1 or OPT2 IP's. It is only when trying to use the NAT port forward on the WAN VIP where this issue show's it's self.

I'm still struggling to understand why enabling the NAT+Proxy mode solves this issue. Do you have any idea on this?

viragomann

@siteunfold
In proxy mode, pfSense itself accesses the destination device. This overrides all other firewall rules.
But since you say, you already have allowed any, this might not be the reason. Possibly you have floating block rules?