1:1 NAT problem - Outgoing traffic uses general Outbound NAT
I've set up this scenario with 1:1 NAT
VIP1 (WAN iface) <–> internal Postfix eth0 iface
VIP2 (OPT1 iface) <--> internal Postfix eth0.1 iface (same server, virtual iface)
Postfix has default gw to pfSense's LAN iface via eth0
Incoming smtp traffic goes w/o problems, but the outgoing smtp traffic from Postfix instance binded to eth0.1 iface is NATed to the pfSense's WAN IP. However, in pfSense's Firewall log I can see the Postfix's eth0.1 IP address as source (this IP is set in 1:1 NAT rule)
Oops, I hope I've described the problem brightly...
I do not know what more to check to be sure I've set it correctly... I'll appreciate any help!
I've just try to move the second Posftix instance to physical iface eth3 - no change :-( In pfSense's log there is Postfix's eth3 IP as source…
If OPT1 is another WAN, you are probably hitting this:
Thanks, jimp! That linked problem seems very similar, i'll go through it and I'll let you know. I just notices the problem is 2.0 version related - I'm still using 1.2.3. Sorry about omitting this…
The problem existed on 1.2.3 also, I believe. But all development focuses on 2.0.
I understand the develop focus, i know it :)
I've just go through the linked bug and I thing it's different problem (or I do not know I have it too :-))
I noticed this detail, I do not know if it's significant:
Both WAN and OPT1 iface are tagged on the same physical NIC. However, in ifconfig I can see this:
vlan0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500 ether 00:40:63:fd:b4:34 inet6 fe80::240:63ff:fefd:b438%vlan0 prefixlen 64 scopeid 0xa inet x.y.58.24 netmask 0xffffffe0 broadcast x.y.58.31 media: Ethernet autoselect (1000baseTX <full-duplex>) status: active vlan: 401 parent interface: vge2 vlan1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500 ether 00:40:63:fd:b4:34 inet6 fe80::240:63ff:fefd:b438%vlan1 prefixlen 64 scopeid 0xb inet x.y.58.90 netmask 0xffffffe0 broadcast x.y.58.95 media: Ethernet autoselect (1000baseTX <full-duplex>) status: active vlan: 403 parent interface: vge2</full-duplex></up,broadcast,running,promisc,simplex,multicast></full-duplex></up,broadcast,running,simplex,multicast>
vlan0 is WAN, vlan1 is OPT1 and the difference is that OPT1 (vlan1) has 'PROMISC' extra flag - it's the only iface with this flag…
I'm not too familiar with the vge (VIA gigabit ethernet) cards, you could possibly be hitting a driver bug of some kind.
It would be worth trying a 2.0 snapshot to see if the behavior is similar with the updated drivers therein.
Do you think it's driver-related issue? Could it be a configuration-related problem? I cannot simulate this scenario on 2.0 with the vge cards, the 1.2.3 box is SPoF :( I'll ask our "Virtual" team if we'll be able to simulate this scenario in virtual environment just to ensure it's not configuration or concept issue, will it be helpful?
Well the promisc issue might be a driver bug, or you could be using something that is putting that interface into promisc mode (rate/traffic graph view, packet capture, etc)
Oops, I've just notices the PROMISC mode is gone :-o The only thing I've done was save configuration of OPT1 iface (no change, just click tha save button by mistake). However, no change in the behavior… I'm not sure if the virtual environment test will be possible, I'm not sure if there is suport for tagged traffic in the VM - at least I'll make proof of concept with standard interfaces (not-tagged)...
Maybe I was run the ifconfig command with tcpdump commadn running in the second window - it would explain the PROMISC mode, wouldn't it?
I've done some more research and it seems as pfSense "absobloodylutely" ignore the second 1:1 NAT rule and "transforms" it to just PAT (port forwarding). I'll try to describe problem in more detail.
The postfix box have two interfaces (eth0, eth3) in the same VLAN/subnet. Default route goes through eth0 iface to the IP of pfSense's LAN iface. There are two instances of Postfix - first default binded to eth0 iface [192.168.100.5, 127.0.0.1], and second modified dinded to eth3 iface [192.168.100.12]. On pfSense there are two 1:1 NAT rules:
VIP1 (WAN iface) [x.y.58.6] <–> internal Postfix eth0 iface [192.168.100.5]
VIP2 (OPT1 iface) [x.y.58.70] <–> internal Postfix eth3 iface [192.168.100.12]
In this state, the smtp flow from default instance of Postfix (at eth0) is correctly NATed to the VIP1, however the smtp flow from the second Postfix instance is incorrectly NATed by the default Outbound NAT to the pfSense's WAN iface IP [x.y.58.24]
But when I tried to change the default gw on Postfix server to go through eth3, all outbound traffic from the Postfix server is NATed to the pfSense's WAN iface IP [x.y.58.24] and moreover - the incomming traffic from internet stop work (with no entry in the pfSense's Firewall log)….
I'm relly confused by this behaviour and cannot realize no explanation with my knowledge... :-(
That sheds some light on what might be the problem.
Run tcpdump on the postfix box. I bet it's sending the traffic back out the wrong IP.
Having two interfaces on the same subnet doesn't work like you might expect. It might work better with two IPs on a single interface, but you also need to consider that outbound NAT does not direct traffic out a certain way, you need a policy route rule to direct traffic as well. Usually reply-to would handle traffic from incoming connections, so I suspect the OS on that mail server may not be doing what you think it should be doing.
tcpdump shows the smtp traffic from second instance of Postfix with source 192.168.100.12, as expected - the instance is binded to the eth3 with that IP. I'm going to temporarily remove the first 1:1 NAT rule for the first instance of Postfix this late evening to check the second 1:1 NAT itself and I'll let you know.
By "two IPs on a single iface" you mean virtual iface like eth0.1? I had it configuraed this way before with the same behaviour…
I've just tried to remove the first NAT and the second doesn't work. Maybe I found the reason - both postfix addresses are translated to the same MAC address (to the eth0's one). I've tried to make static entry in ARP table with the correct MAC, however it doesn't work - no ping reply :(
Well, pfSense seems to choose the appropriate NAT rule according to the L2/ethernet header source (and then ARP table entry) instead of the L3/IP header source. Am I right? Is there any way to solve this?
as I expected - creating dedicated Postfix replacing the second instance on the first server solve the problem (the ARP entries on pfSense for [192.168.100.5] and [192.168.100.12] are now different).
However, I still do not understand the principles of how pfSense is building an outgouing NAT. Jimp, please, can you explain int for me?