2.0.3->2.1, IPSEC/NAT changes?
-
I have just upgraded from 2.0.3 to 2.1 for four firewalls.
A/B = my office
Y/Z = colocation centerBoth A/B and Y/Z are master/slave and share IP with CARP.
I have an IPSEC tunnel from A/B to Y/Z for three subnets:
A/B: 10.97/16 <-> Y/Z: 192.168.97/24
A/B: 10.24/16 <-> Y/Z: 192.168.97/24
A/B: 10.240/16 <-> Y/Z: 192.168.97/24All three of these have been setup and working on 2.0.3 since August 2013 or so. When I would connect from 10.24.6.5 to 192.168.97.85, the machine at 192.168.97.85 would see the incoming connection as originating from 10.24.6.5. The same was true for all three phase 2s.
This did not "break" until upgrading Y/Z to 2.1 (A/B was upgraded to 2.1 first. With A/B on 2.1 and Y/Z on 2.0.3 everything still worked as expected)
After the upgrade to 2.1 (did "13: Upgrade from console"/auto) - when connecting from 10.24.6.5 -> 192.168.97.85; 192.168.97.85 sees the incoming connection as originating from 192.168.97.2 (the CARP master "Y"). This broke a few things - most notably some of our monitoring software.
The oddest part to me is that the 10.240/16 -> 192.168.97.85 still shows the source connection as 10.240.3.2 (or whichever individual machine initiated the connection).
On IRC I have been advised to look into /tmp/rules.debug - where it did indeed (On Y/Z) have the following:
# Outbound NAT rules # Subnets to NAT tonatsubnets = "{ 10.24.0.0/16 10.97.0.0/16 192.168.0.0/16 172.19.0.0/24 127.0.0.0/8 0.0.0.0 }" nat on $WAN from $tonatsubnets port 500 to any port 500 -> 8.11.97.254/32 port 500 nat on $WAN from $tonatsubnets to any -> 8.11.97.254/32 port 1024:65535
I don't understand where "tonatsubnets" is derived from - nor why it would have my 10.24 and 10.97 - but not the 10.240. It would however explain why 10.240 is still working as I'd expect.
In pftop, when initiating connections across from 10.240, no gateway is assigned:
PR DIR SRC DEST PKTS BYTES STATE AGE EXP RATE PEAK AVG RU GW tcp In 10.24.6.9:45096 192.168.97.85:22 44 1700K 4:4 9036 38738 86235 * tcp Out 10.24.6.9:45096 192.168.97.85:22 44 1700K 4:4 9036 38738 86235 * 192.168.97.2:56600
Rule
PR DIR SRC DEST PKTS BYTES STATE AGE EXP RATE PEAK AVG RU GW tcp In 10.240.3.2:57772 192.168.97.85:22 42 6174 4:4 13 86393 474 * tcp Out 10.240.3.2:57772 192.168.97.85:22 42 6174 4:4 13 86393 474 70
Can anyone point me in the right direction to solve this and make it so that each end of the IPSEC sees the source-IP as the true source-IP?
-
To append, one of the rare scenarios where our Colo is allowed to talk back to the office:
PR DIR SRC DEST RATE PEAK AVG BYTES STATE PKTS AGE EXP RU GW tcp In 192.168.97.85:35449 10.24.6.9:80 420 4204 9:9 12 10 170 * tcp Out 192.168.97.85:35449 10.24.6.9:80 420 4204 9:9 12 10 170 *
…does not get assigned a gateway either, and does the expected behaviour of showing the source (in 10.24.6.9's logs) as 192.168.97.85
-
More reporting:
I'm presuming this has something to do with the new "nat before ipsec"?
# pfctl -sa -vv | grep nat | grep 10.24 | less @3 nat on bge0 inet from 10.24.0.0/16 port = isakmp to any port = isakmp -> 8.11.97.254 port 500 @9 nat on bge0 inet from 10.24.0.0/16 to any -> 8.11.97.254 port 1024:65535 @17 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.80 -> 192.168.97.2 port 1024:65535 @21 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.74 -> 192.168.97.2 port 1024:65535 @25 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.67 -> 192.168.97.2 port 1024:65535 @29 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.85 -> 192.168.97.2 port 1024:65535 @33 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.50 -> 192.168.97.2 port 1024:65535 @37 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.51 -> 192.168.97.2 port 1024:65535 @41 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.52 -> 192.168.97.2 port 1024:65535 @45 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.53 -> 192.168.97.2 port 1024:65535 @49 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.54 -> 192.168.97.2 port 1024:65535 @53 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.55 -> 192.168.97.2 port 1024:65535
This goes on and on for every WAN IP that is NAT 1:1… These rules are not in any of the XML when I do a backup.... where is this defined so I can turn it off and go back to the old/expected behaviour? (or perhaps this is just the NAT 1:1 reflection?)
Either way, I see these rules for 10.97.0.0/16 (which is being forced through a gateway/translated to 'source 192.168.97.2') as well as 10.24.0.0/16 (which has the same behaviour) but not for 10.240.0.0/16 (the third IPSEC phase 2 which is still working as expected).
-
Thanks to the IRC channel, it was pointed out that it appeared as if my 10.24 and 10.97 were seen as LANs to my Colo Y/Z pfsense machines.
Searching the XML for those two subnets only showed them up in a few places - DNS Forwarder entries, the IPSEC phase 2, and an alias I had of "LAN_OFFICE" LAN_OFFICE was defined as 10.24/16, 10.97/16 - AHA! This may explain why the 10.240 was still working right!
I tried to delete the alias for LAN_OFFICE because I didn't see it referenced anywhere, and it could not be because of: "Cannot delete alias. Currently in use by IPSEC Office-to-Colo"
I couldn't find that reference anywhere at first - but there it was under
<staticroutes><route><network>LAN_OFFICE</network> <gateway>LANG</gateway> <disabled></disabled></route></staticroutes>
Note the "disabled" here, but - after deleting (not just disabling) this static route, as well as the alias - everything went back to "normal" - and connections are now again showing their proper sources.
To see if it was this disabled route - or the LAN_NAMED alias, I created another LAN_ alias, "LAN_TEST" and assigned it 192.168.5.0/24 just to test… This did not appear in my pfctl output (pfctl -sa) getting NAT redirection or anything else. I was presuming that perhaps there was an error in parsing aliases that have "LAN" in them - but that does not seem to be the issue. I am still unsure whether to blame an errantly named alias or a disabled static route - but either way - it IS working (as expected) again now.
-
It looks like there is an issue in a number of places where static routes are processed - the various functionality gets implemented even if the static route is disabled. It does not look to be fixed in 2.1.1 or 2.2 current code, so I raised a bug report: https://redmine.pfsense.org/issues/3560