NAT issues with multi-WAN
-
I did say that servers were on my LAN, and they are. There are also servers that are on the same network as the OPT1 interface, with public IPs assigned directly to them, and they communicate with the internet without going through this pfSense box at all.
This is in a data center, and we have lots of servers, so there's a lot going on that has nothing to do with this setup that is left out, so it might seem like this configuration is smaller than it is.
I do indeed have advanced outbound NAT setup so that the LAN traffic is NATed through both WAN and OPT1. This works just fine.
To reiterate what I'm trying to do:
The servers on this pfSense's LAN interface use this box as a gateway. The purpose is so that I forward ports from my fast cable modem connection AND from the slower but more reliable T1 connection to the same LAN IP.
This already works completely on the WAN (cable modem interface).
The problem arises with OPT1. If a gateway is not specified on OPT1, then incoming connections on OPT1 from outside of OPT1's subnet will not work. This is to be expected, at least from my understanding because there is no gateway for the return traffic to get back out to the internet. I can make incoming connections on OPT1 from the servers that are on the same subnet because of course a gateway is not required in this case. This too is expected, based on my understanding.
If I do put in a gateway (which is probably how this is supposed to be set), then port forwarding from out on the internet does work correctly. The problem here is that it no longer works from servers within OPT1's subnet. The incoming connection does actually work: I have confirmed this by logging the firewall connections and I can see them being allowed. By doing a packet capture, I confirmed my suspicions that what is going on here is that the return traffic comes from the server on the LAN into pfSense (as it should), and then pfSense sends it to the gateway specified on OPT1 instead of sending it directly out of OPT1 (since the ultimate destination of this packet is on the same subnet as OPT1).
The T1 router does not do any NAT. As I said, public IPs on that network are assigned directly to the servers. So when I try to SSH from the server with the IP 12.34.56.241 to 12.34.56.25 (a virtual IP on pfSense on OPT1 that is forwarded to 10.20.30.89), it works when no gateway is given in OPT1. It fails when a gateway is specified in OPT1 because the return traffic gets sent to the gateway on OPT1 (12.34.56.1). It's not just SSH. ICMP packets (pings) do this too.
I appreciate your patience. Thanks for the help.
-
Thank you for reformulating your problem.
I think i understand it now completly.I see that you are using 1:1 NAT for the 12.34.56.x subnet.
Could you try this with a normal port forward?
Just as a test.Also how does the firewall-rule on your LAN look like?
Did you create a AoN rule for your VIP too?
You shouldnt have to since you use the VIP in the 1:1 NAT rule. -
I did try to use the interface address with port forwarding previously, and it gave exactly the same results. I have done that again at your request, and the results are the same. I also did a new packet capture and tried to format it a bit so it's more easily readable:
http://sa.briantist.com/NATissue/cap.htmlThe key is like this:
Any MAC or IP address in red belongs to the T1 gateway (shouldn't come into play here)
Any MAC or IP address in green belongs to the machine between pfSense and the T1 router, which is assigned a public IP directly and is trying to access the NATed machine through port forwarding (or 1:1 NAT).
Any MAC or IP address in blue belongs to pfSense on OPT1, to be translated via NAT to 10.20.30.89 (doesn't appear in capture).My firewall log confirms that NAT is working and traffic is being allowed through to 10.20.30.89. You can see in the capture that traffic incoming to OPT1 goes through fine, and traffic being sent out of OPT1 that appears to be destined for 12.34.56.241 is actually sent to the MAC address of the gateway (12.34.56.1).
In the capture you will see ICMP traffic as well as TCP traffic destined for port 22, and the behavior is the same.
Also to clear things up, the interface IP of OPT1 is 12.34.56.24 and the virtual IP that I was also using in 1:1 NAT is 12.34.56.25.
-
might be a netmask problem.. any chance that one of the netmask/gateway combination on the server in the OPT subnet is wrong? or overlaps/conflicts with the VIP defined on the firewall?
-
I'm not sure exactly what you mean, sai. The OPT1 interface is set to 12.34.56.24/24. The VIP is defined as 12.34.56.25/32 (Single Address; in this case I assumed that the netmask is used to determine the quantity of IP addresses for which you want ProxyARP). Really the VIP config should be moot here because the problem behaves exactly the same on OPT1's interface address.
The server that is assigned a public IP directly which I am using for testing, has 12.34.56.241/24 (and .242/24 but that seems irrelevant here). Its gateway is 12.34.56.1 but again this machine is correctly sending its packets to pfSense; it's the return path where the packets are being sent to the gateway when they shouldn't be.
Is there anything else I should check specifically?
-
it's the return path where the packets are being sent to the gateway when they shouldn't be.
generally means the netmask is wrong but I am not sure how VIPs (and associated netmask) work in this…
now this is weird:
11:54:47.266968 00:50:8b:cf:a2:6a > 00:30:94:01:e7:90, length 62: (tos 0x0, ttl 63, id 38111, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3490: S, cksum 0xbcfc (correct), 3571569917:3571569917(0) ack 2188962038 win 5840 <mss 1460,nop,nop,sackok="">11:54:49.492704 00:50:8b:cf:a2:6a > 00:20:ed:91:f7:04, length 62: (tos 0x0, ttl 63, id 1468, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3479: S, cksum 0x6e76 (correct), 2180526778:2180526778(0) ack 2586252549 win 5840 <mss 1460,nop,nop,sackok="">2 consecutive packets, both from 12.34.56.24 both to 12.34.56.241 but (looking at the MAC address ) one goes to the gateway (ie the T1 modem) the other goes to the correct server.
all I can say is "wtf?". this problem really needs a guru</mss></mss>
-
:sigh: I always get the crazy problems no one has ever heard of.
Keep in mind this doesn't really relate to the VIP/ProxyARP/1:1 NAT, since the problem happens on the interface address.
I have looked at the routing table, and it seems fine to me. I don't pretend to understand all of the fields, but it looks okay to me:
12.34.56 link#3 UC 0 0 1500 fxp2
12.34.56.1 00:30:94:01:e7:90 UHLW 1 24400 1500 fxp2 1179
12.34.56.3 00:20:ed:66:79:34 UHLW 1 124 1500 fxp2 1148
12.34.56.241 00:20:ed:91:f7:04 UHLW 1 26009 1500 fxp2 1192(this is just the part of the routing table with 12.34.56 addresses)
-
I'll set a test-network up today evening and try to recreate your problem.
This seems to be a really strange problem. -
Thanks a lot; I really appreciate it. If you have any additional questions, you contact me directly at brian-NATissue@briantist.com. That goes right to my phone as well so I can probably reply quickly if I don't need to be at a computer to answer your question.
-
@sai:
it's the return path where the packets are being sent to the gateway when they shouldn't be.
generally means the netmask is wrong but I am not sure how VIPs (and associated netmask) work in this…
now this is weird:
11:54:47.266968 00:50:8b:cf:a2:6a > 00:30:94:01:e7:90, length 62: (tos 0x0, ttl 63, id 38111, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3490: S, cksum 0xbcfc (correct), 3571569917:3571569917(0) ack 2188962038 win 5840 <mss 1460,nop,nop,sackok="">11:54:49.492704 00:50:8b:cf:a2:6a > 00:20:ed:91:f7:04, length 62: (tos 0x0, ttl 63, id 1468, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3479: S, cksum 0x6e76 (correct), 2180526778:2180526778(0) ack 2586252549 win 5840 <mss 1460,nop,nop,sackok="">2 consecutive packets, both from 12.34.56.24 both to 12.34.56.241 but (looking at the MAC address ) one goes to the gateway (ie the T1 modem) the other goes to the correct server.
all I can say is "wtf?". this problem really needs a guru</mss></mss>
I didn't see your edit until now. That is weird; I didn't notice it before. I also checked the original packet capture data to make sure I didn't accidentally paste an incorrect MAC address, and I have confirmed that I did not (what you're seeing is correct). This makes it all the more weird though; I thought I had a consistent, repeatable issue here (and I do, kind of), but this one little packet is seriously making me wonder… I think it increases the likelihood that this problem is something I've done (whether it's in pfSense or not) to cause this problem. I can't think of what that might be though. Again, the help is really appreciated. Thanks everyone.
-
Sorry for not writing back sooner.
I'm having some problems with faulty hardware (part of my network-test-enviroment just died) and i havent had the time to replace it.
I'll be in holiday for a week now.
I hope the replacement parts i ordered are here when i get home. -
That sucks, sorry to hear about your hardware. Thanks for the update though, I've been checking the thread multiple times per day.
-
GruensFroeschli, please tell me you haven't forgotten about me! :'(
Anyone have any ideas? Any experiences with this?
-
No i havent forgotten about you.
In fact i'm working on it right now ;)But i think i've run into another problem with NAT.
Trying to reproduce it right now >_<Will probably write back later today. (if i dont go crazy)
-
Ok i think i tried everything.
But i havent been able to reproduce your problem.Here is what i did:
(WAN - public IP) ADSL-modem/router (LAN - 192.168.20.1/29) | | 192.168.20.4/29 | test-client-----switch--------------------------(WAN - 192.168.20.5/29) | pfSense2 | (LAN - 192.168.40.1/24) | | | | (WAN - 192.168.20.6/29) | pfSense1 (OPT1 - 192.168.40.2/24)--------switch----------test-client/server (LAN - 10.0.0.1/24) 192.168.40.200/24 | | server 10.0.0.10/24
I can access the server from the 192.168.20.4 test-client as expected if i connect to 192.168.20.6.
I can access the server as well if i connect to 192.168.20.5What you described is, that if the gateway on OPT1 is set you can no longer access the server from (in my case) the 192.168.40.x/24 range.
This worked for me.I'm not sure what the problem in your case could be >_<
-
Schnittstelle: 192.168.40.200 –- 0x3
Physikalische Adresse . . . . . . : 00-03-25-09-91-19 <<–- test-client/server
Internetadresse Physikal. Adresse Typ
192.168.40.1 00-02-44-8f-03-ae dynamisch <<–- Gateway
192.168.40.2 00-0d-b9-05-67-25 dynamisch <<–- OPT-interface
@OK:22:54:31.781241 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 62: (tos 0x0, ttl 128, id 56430, offset 0, flags [DF], proto: TCP (6), length: 48) 192.168.40.200.1596 > 192.168.40.2.80: S, cksum 0x30be (correct), 1713509472:1713509472(0) win 65535
22:54:31.782478 00:0d:b9:05:67:25 > 00:02:44:8f:03:ae, ethertype IPv4 (0x0800), length 62: (tos 0x0, ttl 127, id 31917, offset 0, flags [DF], proto: TCP (6), length: 48) 192.168.40.2.80 > 192.168.40.200.1596: S, cksum 0x7865 (correct), 1117681065:1117681065(0) ack 1713509473 win 65535
@OK:
22:54:31.782905 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 128, id 56431, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.40.200.1596 > 192.168.40.2.80: ., cksum 0xa461 (correct), 1:1(0) ack 1 win 65535
@OK:
22:54:31.819768 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 591: (tos 0x0, ttl 128, id 56432, offset 0, flags [DF], proto: TCP (6), length: 577) 192.168.40.200.1596 > 192.168.40.2.80: P, cksum 0x6236 (correct), 1:538(537) ack 1 win 65535
22:54:31.822462 00:0d:b9:05:67:25 > 00:02:44:8f:03:ae, ethertype IPv4 (0x0800), length 299: (tos 0x0, ttl 127, id 4765, offset 0, flags [DF], proto: TCP (6), length: 285) 192.168.40.2.80 > 192.168.40.200.1596: P, cksum 0xc9bd (correct), 1:246(245) ack 538 win 64998
@OK:
22:54:31.945686 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 128, id 56435, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.40.200.1596 > 192.168.40.2.80: ., cksum 0xa248 (correct), 538:538(0) ack 246 win 65290
Obviously i can reproduce that traffic is being sent to the wrong MAC.
But i'm wondering why it's working here…..
-
Ok i went at it with wireshark again. (see attachment)
The downloaded content is the page on http://psymia.mine.nu
Everything seems to be in order…. (see frame 6)But now i'm wondering why the capture from pfSense itself differs from the capture with wireshark.
??? ??? ???
-
GruensFroeschli, thanks so much for putting time into this. I'm sorry I haven't been able to do anything with it; I had to move suddenly, and things have been really crazy. I have had no time whatsoever to devote to this, and it might be a while before I can try again. I will resume working on this problem though, and I'll let you know how it turns out. Thanks again!
-
Can any of you describe what the issue is, if any, so i can give a looka t it?
-
ermal, I'm confused. What information do you need that is not in the thread? I think we've been really descriptive.