NAT issues with multi-WAN



  • I am using pfSense 1.2.

    I am dealing with three networks here. A LAN on 10.20.30.0/24, a cable connection with a static IP (1.2.3.142, this is the WAN in pfSense) and on my OPT1 interface I have 12.34.56.25/24 (a class C of public IPs from T1 lines). I have several servers already that are assigned public IPs from the 12.34.56 range, that sit on the other side of this particular firewall.

    On the LAN side of this, I have a few servers that have LAN IPs and will have ports forwarded to them from both the cable modem and the T1 IPs. The ultimate purpose of this is that these hosts use the cable modem for internet access by default giving them fast downloads, and I can port forward services from both public IPs, like for mail, so that I can have two MX records that point to two different public IPs on two different networks/connections and they come to the same box.

    The problem I'm having is with that OPT1 interface. If I give it a gateway (making it another WAN interface as far as pfSense sees it) then I can't access NAT'ed services from any of the hosts with an address in the 12.34.56 range, but I can access them from outside on the internet.

    If I take the gateway out, the opposite is true: I can access NAT'ed services from hosts within the 12.34.56 range, but not from anything outside.

    I have confirmed with a packet capture that when the gateway is in place on OPT1, packets coming into the LAN that are destined for the 12.34.56 range are being sent to the gateway on that interface even though they are within the subnet. I can't understand why or how to resolve it though.

    I have been going nuts trying to figure this out. Any help is appreciated.



  • Advanced > Network Address Translation Disable NAT Reflection



  • Thank you for the response sai. I have now disabled NAT reflection, but the problem remains. I even reset the states to make sure. Any other ideas?



  • Could you show screenshots of the firewall-rules you have on WAN/LAN/OPT1?



  • Sure thing. I already took these screen shots previously (and modified them to use the fake IPs I used in the post, to make it consistent). I don't have one for LAN, but I will describe it. The first rule is allowing any source to 127.0.0.1 on ports 8000:8030 for TCP/UDP (this is for the FTP helper). Other than that, I'm allowing all from LAN net to any.

    WAN (1.2.3.142/29)

    ServerWAN [OPT1] (12.34.56.24/24)

    NAT Port Forwarding

    NAT 1:1

    I do have a virtual IP defined for the IP used in 1:1 NAT. I did try it with the interface address and it behaves exactly the same way. In this case all testing is being done with port 22 (SSH) to the 10.20.30.89 machine, so you can ignore the the entry for BTIassp1.



  • Hmmm. I'm still not sure i understand your setup.

    You have your servers in the subnet of the OPT1.
    And at the same time you use the OPT1 Interface as a WAN?

    LAN
                |
                |
              pfSense
              /     
            /        OPT1
      WAN            ______ server
          |              |
      internet      internet

    Is this correct?



  • Yes, it's correct, more like this though:

    LAN
                |
                |
              pfSense
              /     
            /        OPT1
      WAN            ______ switch ____ servers (these have public IPs assigned directly)
          |              |
      internet  T1 gateway/router
                          |
                      internet

    Does this make more sense?



  • The last thing i dont really understand is: what are you trying to achieve from where?

    On the LAN side of this, I have a few servers that have LAN IPs and will have ports forwarded to them from both the cable modem and the T1 IPs. The ultimate purpose of this is that these hosts use the cable modem for internet access by default giving them fast downloads, and I can port forward services from both public IPs, like for mail, so that I can have two MX records that point to two different public IPs on two different networks/connections and they come to the same box.

    In the text above the servers are on the LAN.
    I dont see a problem for that.

    I think i remember that per default pfSense NATs from all interfaces to WAN.
    Not sure though if that changes if you set a gateway on the OPT interface.
    Most probably it dos influence it, since you are experiencing problems with exactly this.

    In case it does influence it, you have to enable Advanced outbound NAT.
    With AoN you can specify manually how traffic should be NATed.

    In your case i think you want traffic from LAN NATed to WAN and OPT1 and nothing else.

    Just set the gateway on the OPT1 config page, and create rules on the LAN to use the different WAN's accordingly.

    Also are you sure that the servers in your 12.34.56.x subnet access to the IP you have on the OPT1 interface?
    does your T1 gateway/router do any NAT too?



  • I did say that servers were on my LAN, and they are. There are also servers that are on the same network as the OPT1 interface, with public IPs assigned directly to them, and they communicate with the internet without going through this pfSense box at all.

    This is in a data center, and we have lots of servers, so there's a lot going on that has nothing to do with this setup that is left out, so it might seem like this configuration is smaller than it is.

    I do indeed have advanced outbound NAT setup so that the LAN traffic is NATed through both WAN and OPT1. This works just fine.

    To reiterate what I'm trying to do:

    The servers on this pfSense's LAN interface use this box as a gateway. The purpose is so that I forward ports from my fast cable modem connection AND from the slower but more reliable T1 connection to the same LAN IP.

    This already works completely on the WAN (cable modem interface).

    The problem arises with OPT1. If a gateway is not specified on OPT1, then incoming connections on OPT1 from outside of OPT1's subnet will not work. This is to be expected, at least from my understanding because there is no gateway for the return traffic to get back out to the internet. I can make incoming connections on OPT1 from the servers that are on the same subnet because of course a gateway is not required in this case. This too is expected, based on my understanding.

    If I do put in a gateway (which is probably how this is supposed to be set), then port forwarding from out on the internet does work correctly. The problem here is that it no longer works from servers within OPT1's subnet. The incoming connection does actually work: I have confirmed this by logging the firewall connections and I can see them being allowed. By doing a packet capture, I confirmed my suspicions that what is going on here is that the return traffic comes from the server on the LAN into pfSense (as it should), and then pfSense sends it to the gateway specified on OPT1 instead of sending it directly out of OPT1 (since the ultimate destination of this packet is on the same subnet as OPT1).

    The T1 router does not do any NAT. As I said, public IPs on that network are assigned directly to the servers. So when I try to SSH from the server with the IP 12.34.56.241 to 12.34.56.25 (a virtual IP on pfSense on OPT1 that is forwarded to 10.20.30.89), it works when no gateway is given in OPT1. It fails when a gateway is specified in OPT1 because the return traffic gets sent to the gateway on OPT1 (12.34.56.1). It's not just SSH. ICMP packets (pings) do this too.

    I appreciate your patience. Thanks for the help.



  • Thank you for reformulating your problem.
    I think i understand it now completly.

    I see that you are using 1:1 NAT for the 12.34.56.x subnet.
    Could you try this with a normal port forward?
    Just as a test.

    Also how does the firewall-rule on your LAN look like?

    Did you create a AoN rule for your VIP too?
    You shouldnt have to since you use the VIP in the 1:1 NAT rule.



  • I did try to use the interface address with port forwarding previously, and it gave exactly the same results. I have done that again at your request, and the results are the same. I also did a new packet capture and tried to format it a bit so it's more easily readable:
    http://sa.briantist.com/NATissue/cap.html

    The key is like this:
    Any MAC or IP address in red belongs to the T1 gateway (shouldn't come into play here)
    Any MAC or IP address in green belongs to the machine between pfSense and the T1 router, which is assigned a public IP directly and is trying to access the NATed machine through port forwarding (or 1:1 NAT).
    Any MAC or IP address in blue belongs to pfSense on OPT1, to be translated via NAT to 10.20.30.89 (doesn't appear in capture).

    My firewall log confirms that NAT is working and traffic is being allowed through to 10.20.30.89. You can see in the capture that traffic incoming to OPT1 goes through fine, and traffic being sent out of OPT1 that appears to be destined for 12.34.56.241 is actually sent to the MAC address of the gateway (12.34.56.1).

    In the capture you will see ICMP traffic as well as TCP traffic destined for port 22, and the behavior is the same.

    Also to clear things up, the interface IP of OPT1 is 12.34.56.24 and the virtual IP that I was also using in 1:1 NAT is 12.34.56.25.



  • might be a netmask problem.. any chance that one of the netmask/gateway combination on the server in the OPT subnet is wrong? or overlaps/conflicts with the VIP defined on the firewall?



  • I'm not sure exactly what you mean, sai. The OPT1 interface is set to 12.34.56.24/24. The VIP is defined as 12.34.56.25/32 (Single Address; in this case I assumed that the netmask is used to determine the quantity of IP addresses for which you want ProxyARP). Really the VIP config should be moot here because the problem behaves exactly the same on OPT1's interface address.

    The server that is assigned a public IP directly which I am using for testing, has 12.34.56.241/24 (and .242/24 but that seems irrelevant here). Its gateway is 12.34.56.1 but again this machine is correctly sending its packets to pfSense; it's the return path where the packets are being sent to the gateway when they shouldn't be.

    Is there anything else I should check specifically?



  • it's the return path where the packets are being sent to the gateway when they shouldn't be.

    generally means the netmask is wrong but I am not sure how VIPs (and associated netmask) work in this…

    now this is weird:

    11:54:47.266968 00:50:8b:cf:a2:6a > 00:30:94:01:e7:90, length 62: (tos 0x0, ttl  63, id 38111, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3490: S, cksum 0xbcfc (correct), 3571569917:3571569917(0) ack 2188962038 win 5840 <mss 1460,nop,nop,sackok="">11:54:49.492704 00:50:8b:cf:a2:6a > 00:20:ed:91:f7:04, length 62: (tos 0x0, ttl  63, id 1468, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3479: S, cksum 0x6e76 (correct), 2180526778:2180526778(0) ack 2586252549 win 5840 <mss 1460,nop,nop,sackok="">2 consecutive packets, both from 12.34.56.24 both to 12.34.56.241 but (looking at the MAC address ) one goes to the gateway (ie the T1 modem) the other goes to the correct server.

    all I can say is "wtf?".  this problem really needs a guru</mss></mss>



  • :sigh: I always get the crazy problems no one has ever heard of.

    Keep in mind this doesn't really relate to the VIP/ProxyARP/1:1 NAT, since the problem happens on the interface address.

    I have looked at the routing table, and it seems fine to me. I don't pretend to understand all of the fields, but it looks okay to me:

    12.34.56 link#3 UC 0 0 1500 fxp2
    12.34.56.1 00:30:94:01:e7:90 UHLW 1 24400 1500 fxp2 1179
    12.34.56.3 00:20:ed:66:79:34 UHLW 1 124 1500 fxp2 1148
    12.34.56.241 00:20:ed:91:f7:04 UHLW 1 26009 1500 fxp2 1192

    (this is just the part of the routing table with 12.34.56 addresses)



  • I'll set a test-network up today evening and try to recreate your problem.
    This seems to be a really strange problem.



  • Thanks a lot; I really appreciate it. If you have any additional questions, you contact me directly at brian-NATissue@briantist.com. That goes right to my phone as well so I can probably reply quickly if I don't need to be at a computer to answer your question.



  • @sai:

    it's the return path where the packets are being sent to the gateway when they shouldn't be.

    generally means the netmask is wrong but I am not sure how VIPs (and associated netmask) work in this…

    now this is weird:

    11:54:47.266968 00:50:8b:cf:a2:6a > 00:30:94:01:e7:90, length 62: (tos 0x0, ttl  63, id 38111, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3490: S, cksum 0xbcfc (correct), 3571569917:3571569917(0) ack 2188962038 win 5840 <mss 1460,nop,nop,sackok="">11:54:49.492704 00:50:8b:cf:a2:6a > 00:20:ed:91:f7:04, length 62: (tos 0x0, ttl  63, id 1468, flags [DF], proto: TCP (6), length: 48) 12.34.56.24.22 > 12.34.56.241.3479: S, cksum 0x6e76 (correct), 2180526778:2180526778(0) ack 2586252549 win 5840 <mss 1460,nop,nop,sackok="">2 consecutive packets, both from 12.34.56.24 both to 12.34.56.241 but (looking at the MAC address ) one goes to the gateway (ie the T1 modem) the other goes to the correct server.

    all I can say is "wtf?".  this problem really needs a guru</mss></mss>

    I didn't see your edit until now. That is weird; I didn't notice it before. I also checked the original packet capture data to make sure I didn't accidentally paste an incorrect MAC address, and I have confirmed that I did not (what you're seeing is correct). This makes it all the more weird though; I thought I had a consistent, repeatable issue here (and I do, kind of), but this one little packet is seriously making me wonder… I think it increases the likelihood that this problem is something I've done (whether it's in pfSense or not) to cause this problem. I can't think of what that might be though. Again, the help is really appreciated. Thanks everyone.



  • Sorry for not writing back sooner.
    I'm having some problems with faulty hardware (part of my network-test-enviroment just died) and i havent had the time to replace it.
    I'll be in holiday for a week now.
    I hope the replacement parts i ordered are here when i get home.



  • That sucks, sorry to hear about your hardware. Thanks for the update though, I've been checking the thread multiple times per day.



  • GruensFroeschli, please tell me you haven't forgotten about me!  :'(

    Anyone have any ideas? Any experiences with this?



  • No i havent forgotten about you.
    In fact i'm working on it right now ;)

    But i think i've run into another problem with NAT.
    Trying to reproduce it right now >_<

    Will probably write back later today. (if i dont go crazy)



  • Ok i think i tried everything.
    But i havent been able to reproduce your problem.

    Here is what i did:

    
    		(WAN - public IP)
    		ADSL-modem/router
    		(LAN - 192.168.20.1/29)
    		   |
    		   |
    192.168.20.4/29	   |
    test-client-----switch--------------------------(WAN - 192.168.20.5/29)
    		   |				pfSense2
    		   |				(LAN - 192.168.40.1/24)
    		   |				   |
    		   |				   |
    	(WAN - 192.168.20.6/29)			   |
    	pfSense1 (OPT1 - 192.168.40.2/24)--------switch----------test-client/server
    	(LAN - 10.0.0.1/24)					192.168.40.200/24
    		   |
    		   |
    		server
    		10.0.0.10/24
    
    

    I can access the server from the 192.168.20.4 test-client as expected if i connect to 192.168.20.6.
    I can access the server as well if i connect to 192.168.20.5

    What you described is, that if the gateway on OPT1 is set you can no longer access the server from (in my case) the 192.168.40.x/24 range.
    This worked for me.

    I'm not sure what the problem in your case could be >_<














  • Schnittstelle: 192.168.40.200 –- 0x3
    Physikalische Adresse . . . . . . : 00-03-25-09-91-19  <<–- test-client/server
      Internetadresse      Physikal. Adresse    Typ
      192.168.40.1          00-02-44-8f-03-ae    dynamisch    <<–- Gateway
      192.168.40.2          00-0d-b9-05-67-25    dynamisch <<–- OPT-interface
    @OK:

    22:54:31.781241 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 62: (tos 0x0, ttl 128, id 56430, offset 0, flags [DF], proto: TCP (6), length: 48) 192.168.40.200.1596 > 192.168.40.2.80: S, cksum 0x30be (correct), 1713509472:1713509472(0) win 65535

    @STRANGE:

    22:54:31.782478 00:0d:b9:05:67:25 > 00:02:44:8f:03:ae, ethertype IPv4 (0x0800), length 62: (tos 0x0, ttl 127, id 31917, offset 0, flags [DF], proto: TCP (6), length: 48) 192.168.40.2.80 > 192.168.40.200.1596: S, cksum 0x7865 (correct), 1117681065:1117681065(0) ack 1713509473 win 65535

    @OK:

    22:54:31.782905 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 128, id 56431, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.40.200.1596 > 192.168.40.2.80: ., cksum 0xa461 (correct), 1:1(0) ack 1 win 65535

    @OK:

    22:54:31.819768 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 591: (tos 0x0, ttl 128, id 56432, offset 0, flags [DF], proto: TCP (6), length: 577) 192.168.40.200.1596 > 192.168.40.2.80: P, cksum 0x6236 (correct), 1:538(537) ack 1 win 65535

    @STRANGE:

    22:54:31.822462 00:0d:b9:05:67:25 > 00:02:44:8f:03:ae, ethertype IPv4 (0x0800), length 299: (tos 0x0, ttl 127, id 4765, offset 0, flags [DF], proto: TCP (6), length: 285) 192.168.40.2.80 > 192.168.40.200.1596: P, cksum 0xc9bd (correct), 1:246(245) ack 538 win 64998

    @OK:

    22:54:31.945686 00:03:25:09:91:19 > 00:0d:b9:05:67:25, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 128, id 56435, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.40.200.1596 > 192.168.40.2.80: ., cksum 0xa248 (correct), 538:538(0) ack 246 win 65290

    Obviously i can reproduce that traffic is being sent to the wrong MAC.

    But i'm wondering why it's working here…..



  • Ok i went at it with wireshark again. (see attachment)
    The downloaded content is the page on http://psymia.mine.nu
    Everything seems to be in order…. (see frame 6)

    But now i'm wondering why the capture from pfSense itself differs from the capture with wireshark.

    ??? ??? ???

    test_capture.pcap.txt



  • GruensFroeschli, thanks so much for putting time into this. I'm sorry I haven't been able to do anything with it; I had to move suddenly, and things have been really crazy. I have had no time whatsoever to devote to this, and it might be a while before I can try again. I will resume working on this problem though, and I'll let you know how it turns out. Thanks again!



  • Can any of you describe what the issue is, if any, so i can give a looka t it?



  • ermal, I'm confused. What information do you need that is not in the thread? I think we've been really descriptive.


Log in to reply