BINAT for IPSEC Only Works Outbound

ramsoftit

Greetings,

I am using the PFSense 2.3.4 Azure Image (FreeBSD 10.3-RELEASE-p19) as an IPSec VPN Gateway for a VM that I have running on Azure. The PFSense appliance is not handling any traffic to/from the VM besides IPSec-based traffic.

General Network Layout is as follows:

Front-end Subnet: This is where PFSense Appliance lives

10.200.11.0/24
PFSense is 10.200.11.4/32
PFSense has a single Interface (WAN) on Azure, which is assigned the 10.200.11.4/32 IP Address (This is how the PFSense Image works on Azure)
Public Access to the PFSense Appliance is via an Azure Public IP Address; only used as IPSec Peer

Back-end Subnet: This is where the VM lives

10.200.10.0/24
VM is 10.200.10.4/32

We have a Site-to-Site VPN Tunnel configured to one of our partners, and they have a requirement to only allow non-RFC1918 IP Addresses for communication over this tunnel. Since the Native Azure VPN Gateway does not Support this, we decided to use PFSense for this environment. We provisioned (but did not allocate) a new Public IP in Azure 52.229.X.X, with the plan to use this as Source NAT for the IP of the VM (10.200.10.4) for our side of the tunnel.

We successfully brought up a Site to Site tunnel from our Azure environment to the Customer's environment configured as follows:

Local Network/Type: Address
Local Network/Address: 10.200.10.4/32
NAT/BINAT/Type: Address
NAT/BINAT/Address: 52.229.X.X
Remote Network/Type: Address
Remote Network/Address: 199.116.X.X

Firewall/Rules:
We have allowed all traffic over the tunnel via IPSec Firewall Rule for troubleshooting purposes
Windows Firewall is disabled on the VM for troubleshooting purposes

The tunnel is established successfully and we are able to successfully access the remote side of the tunnel via 199.116.X.X, and can see that the traffic is successfully NAT'd via 52.229.X.X - this is working well. However, when the remote side tries to access us from 199.116.X.X, the traffic comes into PFSense, passes firewall rules, and then never makes it to the VM at 10.200.10.4/32 - the TCP State immediately goes to CLOSED:SYN_SENT (see below)

Interface Protocol Source (Original Source) -> Destination (Original Destination) State Packets Bytes
IPsec tcp 199.116.X.X:13876 -> 10.200.10.4:6012 (52.229.X.X:6012) CLOSED:SYN_SENT 3 / 0 180 B / 0 B

When I use the Port Test utility in PFSense, I found the following which seems to indicate Inbound Binat is not working:

1. I can connect to 10.200.10.4 on TCP 6012 as long as I don't set the source address to 199.116.X.X (Remote IP)
2. I cannot connect to 52.229.X.X (NAT IP of Local VM) regardless of which Source IP I use despite the fact that I can ping the IP from PFSsense and have appropriate NAT rules in place

Can someone please answer the following questions:

1. With PFSense 2.3.4, should I be able to use a non-RFC1918 IP for NAT for both inbound and outbound communication over my IPSec tunnel or is this ONLY for Outbound traffic?
2. Do we need to set the non-RFC1918 addresses as Virtual IP's in PFSense? I have tried with and without Virtual IP's, but it works the same
3. Is Outbound NAT required if BINAT is configured in IPSec tunnel's Phase 2?
4. Are there any tools that I can use to show me why traffic is not being NAT'd from 52.229.X.X to 10.200.10.4? tcpdump, test port, etc. do not seem to give me the missing information.

Please let me know if there is any additional information I can provide to help with understanding this scenario.

jimp

@ramsoftit:

1. With PFSense 2.3.4, should I be able to use a non-RFC1918 IP for NAT for both inbound and outbound communication over my IPSec tunnel or is this ONLY for Outbound traffic?

The type of address does not matter at all to NAT. It just does what you tell it to do.

@ramsoftit:

2. Do we need to set the non-RFC1918 addresses as Virtual IP's in PFSense? I have tried with and without Virtual IP's, but it works the same

No.

@ramsoftit:

3. Is Outbound NAT required if BINAT is configured in IPSec tunnel's Phase 2?

No.

@ramsoftit:

4. Are there any tools that I can use to show me why traffic is not being NAT'd from 52.229.X.X to 10.200.10.4? tcpdump, test port, etc. do not seem to give me the missing information.

Just the state table. The way IPsec+NAT works you won't see everything you'd expect when capturing on the IPsec interface. You should see a state entering IPsec and another state leaving the LAN.

You could also capture on LAN to see if the traffic ever exits the firewall.

CLOSED:SYN_SENT is the state you'll see for TCP when one side has sent the first packet of a TCP handshake, it's the normal starting state for a TCP connection. It just means the side that says CLOSED has not yet responded to the SYN. Usually because it didn't receive it (blocked, dropped, etc).

You have masked out a bit so I can't offer much in the way of advice, but be sure that if your local IP address is /32, the NAT address should also be /32. The most common way for NAT to appear to only work in one direction with IPsec is when it thinks it should be doing outbound style NAT.

ramsoftit

Thanks for the reply, jimp. I appreciate your confirmations regarding Virtual IP's and NAT - I am currently using only the Phase 2 BINAT setting to handle this; all NAT rules have been removed from the configuration and I am no longer using Virtual IP's, but I still have one-way traffic.

You are correct - when I capture on the IPSec interface, it just shows that the packet is received as expected, but nothing ever arrives at the destination on my side when I capture on the LAN. The State Table seems to indicate that NAT should be forwarding successfully to 10.200.10.4, but this does not occur (see attached screenshot). This screenshot also shows a successfully established TCP Handshake on the same port in the opposite direction (Outbound) for comparison.

Also, just to confirm, we are using /32 addresses for both Local and NAT IP's on both sides of the tunnel.

How can I debug the missing piece of the equation and determine what is stopping PFSense from NAT'ing the received traffic to the specified destination? Everything looks correct in the State Table from a NAT perspective, right?

states.png_thumb

jimp

Why is that leaving WAN? Is that your only interface?

From the looks of that state table, pfSense is passing that traffic out and 10.200.10.4 is not accepting it.

Run a packet capture on WAN and see if you see it leaving, note the destination MAC address, make sure it's the MAC address of 10.200.10.4
Run a packet capture on 10.200.10.4 and see if you see it inbound

I suspect a local firewall on 10.200.10.4 is not allowing the packets in, not pfSense.

It's also possible you could be hitting a quirk of pf/routing due to this happening on WAN and not LAN. There are still a few special cases where WAN behaves differently (e.g. route-to and reply-to on automatic rules)

ramsoftit

Yes, WAN is my only interface - this is just how the Azure Image is built. You can see that the successful connection in the other direction does the exact same thing. For Clarity, "IPSec" does not actually show up as an Interface under Interfaces for some reason.

I do see the traffic leaving the WAN interface; pointed at the correct destination:

10:30:04.776677 IP 199.116.X.X.3017 > 10.200.10.4.6012: tcp 0
10:30:05.775065 IP 199.116.X.X.3017 > 10.200.10.4.6012: tcp 0
10:30:07.775142 IP 199.116.X.X.3017 > 10.200.10.4.6012: tcp 0
10:30:39.788917 IP 199.116.X.X.8931 > 10.200.10.4.6012: tcp 0
10:30:40.787104 IP 199.116.X.X.8931 > 10.200.10.4.6012: tcp 0
10:30:42.787110 IP 199.116.X.X.8931 > 10.200.10.4.6012: tcp 0

However, when I do the same capture on the IPSec interface traffic appears to be pointed at the NAT IP instead of the actual IP:

10:33:34.858652 (authentic,confidential): SPI 0xcd5f4dff: IP 199.116.X.X.11372 > 52.229.X.X.6012: tcp 0
10:33:35.856153 (authentic,confidential): SPI 0xcd5f4dff: IP 199.116.X.X.11372 > 52.229.X.X.6012: tcp 0
10:33:37.856325 (authentic,confidential): SPI 0xcd5f4dff: IP 199.116.X.X.11372 > 52.229.X.X.6012: tcp 0

Opening the WAN Interface capture in wireshark, I can see that the MAC address in the capture does NOT match the MAC of 10.200.10.4 - could this be some kind of ARP issue? Any ideas? The IPSec Interface capture does not show the Ethernet information; likely because the payload is encrypted.

Also, how could I determine if we're running into some strange routing issue due to the fact that the Azure Image only uses a single WAN interface? Do we need any special Routing rules/configs?

10.200.10.4 does not see ANY traffic on this port from this source - there is no Local Firewall or antivirus running on 10.200.10.4 while we troubleshoot this issue, so this is not the culprit.

jimp

The capture on IPsec shows what I'd expect to see, ignore that.

System > Advanced, Firewall/NAT tab, check "Disable reply-to"

Firewall > Rules, Floating tab, add a new rule to pass, quick, outbound, on WAN from a source of 'any' to a destination of 10.200.10.0/24 and make sure there is no gateway set.

ramsoftit

Enabled "Disable reply-to" as advised.

I actually already had a pretty wide-open Floating Rule in place, which is where my states were being captured - I have adjusted the rule to match what you described above, and re-tested, but I am still seeing the exact same behavior (see states2.png)

The WAN Interface capture still looks the same as well, unfortunately:

11:05:05.598301 IP 199.116.X.X.8162 > 10.200.10.4.6012: tcp 0
11:05:06.596571 IP 199.116.X.X.8162 > 10.200.10.4.6012: tcp 0
11:05:08.596303 IP 199.116.X.X.8162 > 10.200.10.4.6012: tcp 0

states2.png_thumb

jimp

So the firewall is delivering the packets out, which is good. Change the detail to "full" and see what the MAC is there. If it's your gateway, it still isn't sending it quite right. If it's the MAC of your target box, then something else is odd on your local network.

The floating rule should have kicked it to not use route-to in that instance.

Is 10.200.10.x your WAN subnet? Or is it on a routed network away from WAN?

If that is your WAN subnet, something else you could do is go to Interfaces > WAN and make sure there is no gateway selected there. If it's your only interface you don't need that set there. The gateway being defined under System > Routing is enough.

ramsoftit

The detail of the capture was actually already set to full - I checked the MAC address again, but it is still not the MAC Address of my target box (it did not change).

Confirmed that the MAC Address in the capture is not the MAC Address of PFSense or any other device we own. In fact, I found that the MAC in the capture shows as a Locally Administered Address, and that is appears to be a dummy value! (12:34:56:78:9a:bc) - weird right? Any idea where this could potentially be coming from!? Is this because the Interfaces > WAN > MAC Address field is using the default value? (blank)

Note: When I modified my floating rule, I found that it does not give me an option to set NO Gateway - I either have to set "Default" or select the WAN Interface's Gateway. Should I have left it at Default (I did)?

10.200.10.x can be considered the LAN subnet - it is the Back-end Network where the Virtual Machine (10.200.10.4) resides - it is a routed network away from the WAN. PFSense resides in the 10.200.11.0/24 Subnet - PFSense is 10.200.11.4).

Interfaces > WAN does not actually have a field for, "Gateway" because the Azure Image uses DHCP for the interface. The Gateway under System > Routing is set to 10.200.11.1, which matches the Default Gateway reported by FreeBSD via netstat -rn - I presume goes directly to the Azure Fabric for WAN/LAN routing.

jimp

Maybe that's all a quirk specific to Azure then. That MAC address may also be specific to Azure. Does it show up under Diagnostics > ARP on pfSense? If so, what system is it? The default gateway on WAN? If WAN is DHCP there isn't a way to nudge pfSense to not treat that interface as a WAN.

With rules, "default" for a gateway means no gateway, so that's OK.

If 10.200.10.x is reached through another router, perhaps you need a static route set as well.

Either way though it does not appear this problem is specific to IPsec, so perhaps you'd get a better response under virtualization. pfSense is delivering the packets where it thinks they should go, they are making it out of the firewall, but it may need some other nudging in routing or Azure to flow the way you want on the inbound path.

ramsoftit

Good idea! Yes it does show up under Diagnostics > ARP on pfSense - it is the MAC for the Default Gateway IP (10.200.11.1), which seems to support the idea that this MAC is specific to Azure.

10.200.10.x is reached from pfSense through the Azure Fabric, which is hidden from me. However, I am able to hit pfSense (10.200.11.4) from the VM (10.200.10.4) and viceversa without issue.

I'll try re-opening this ticket under virtualization and getting some eyes at Azure looking into this issue, if you don't have any other ideas.

jimp

Usually the floating rule will work around that. If your WAN subnet is 10.200.11.x then it should be contacting those hosts directly and not sending the traffic to the gateway. So it's still hitting a route-to somehow.

Did you make sure the floating rule has "quick" checked?

What does that floating rule look like in /tmp/rules.debug? Maybe see what other rules in /tmp/rules.debug have route-to set.

ramsoftit

I'm not even sure it is possible for 10.200.11.x to hit 10.200.10.x without going through the Azure Gateway - these are all just Virtual Machines, so its not like there is a server directly-connected to PFSense that can bypass the Default Gateway. Right?

Yes, the Floating Rule has "quick" checked.

I have attached an anonymized version of rules.debug for your review.

rules.debug.txt

jimp

OK I was a bit blind back there I thought it was all the same subnet.

If they are different subnets, pfSense is doing the right thing. It's handing that off to its gateway. After that, it's 100% up to Azure to deliver that to the correct destination. Maybe there are ACLs/routing rules you have to setup in Azure to allow that connection to flow back the other way.

If you capture on your target, do you ever see the traffic arrive?

ramsoftit

The ACL/Routing in Azure is allowing all traffic between pfSense and VM for troubleshooting purposes, so its definitely not getting blocked.

I've had a capture running on the target for the last 24-hours and have not received a single packet from the remote side - it is getting lost in the Azure fabric, it seems.

I have opened a ticket with Azure referencing this Forum Post to see if we can get some debugging done between the Azure Gateway and our target VM.