2.0.3->2.1, IPSEC/NAT changes?



  • I have just upgraded from 2.0.3 to 2.1 for four firewalls.

    A/B = my office
    Y/Z = colocation center

    Both A/B and Y/Z are master/slave and share IP with CARP.

    I have an IPSEC tunnel from A/B to Y/Z for three subnets:

    A/B: 10.97/16 <-> Y/Z: 192.168.97/24
    A/B: 10.24/16 <->  Y/Z: 192.168.97/24
    A/B: 10.240/16 <->  Y/Z: 192.168.97/24

    All three of these have been setup and working on 2.0.3 since August 2013 or so. When I would connect from 10.24.6.5 to 192.168.97.85, the machine at 192.168.97.85 would see the incoming connection as originating from 10.24.6.5. The same was true for all three phase 2s.

    This did not "break" until upgrading Y/Z to 2.1 (A/B was upgraded to 2.1 first. With A/B on 2.1 and Y/Z on 2.0.3 everything still worked as expected)

    After the upgrade to 2.1 (did "13: Upgrade from console"/auto) - when connecting from 10.24.6.5 -> 192.168.97.85; 192.168.97.85 sees the incoming connection as originating from 192.168.97.2 (the CARP master "Y"). This broke a few things - most notably some of our monitoring software.

    The oddest part to me is that the 10.240/16 -> 192.168.97.85 still shows the source connection as 10.240.3.2 (or whichever individual machine initiated the connection).

    On IRC I have been advised to look into /tmp/rules.debug - where it did indeed (On Y/Z) have the following:

    # Outbound NAT rules
    
    # Subnets to NAT
    tonatsubnets    = "{ 10.24.0.0/16 10.97.0.0/16 192.168.0.0/16 172.19.0.0/24 127.0.0.0/8 0.0.0.0  }"
    nat on $WAN  from $tonatsubnets port 500 to any port 500 -> 8.11.97.254/32 port 500
    nat on $WAN  from $tonatsubnets to any -> 8.11.97.254/32 port 1024:65535
    

    I don't understand where "tonatsubnets" is derived from - nor why it would have my 10.24 and 10.97 - but not the 10.240. It would however explain why 10.240 is still working as I'd expect.

    In pftop, when initiating connections across from 10.240, no gateway is assigned:

    
    PR    DIR SRC                   DEST                    PKTS  BYTES STATE   AGE   EXP   RATE   PEAK    AVG RU GW
    tcp   In  10.24.6.9:45096       192.168.97.85:22                        44  1700K  4:4    9036 38738 86235  *
    tcp   Out 10.24.6.9:45096       192.168.97.85:22                        44  1700K  4:4    9036 38738 86235  * 192.168.97.2:56600
    
    

    Rule

    
    PR    DIR SRC                   DEST                    PKTS  BYTES STATE   AGE   EXP   RATE   PEAK    AVG RU GW
    tcp   In  10.240.3.2:57772      192.168.97.85:22          42   6174  4:4     13 86393                  474  *
    tcp   Out 10.240.3.2:57772      192.168.97.85:22          42   6174  4:4     13 86393                  474 70 
    
    

    Can anyone point me in the right direction to solve this and make it so that each end of the IPSEC sees the source-IP as the true source-IP?



  • To append, one of the rare scenarios where our Colo is allowed to talk back to the office:

    
    PR    DIR SRC                   DEST                    RATE   PEAK    AVG  BYTES STATE   PKTS   AGE   EXP RU GW
    tcp   In  192.168.97.85:35449   10.24.6.9:80                           420   4204  9:9      12    10   170  *
    tcp   Out 192.168.97.85:35449   10.24.6.9:80                           420   4204  9:9      12    10   170  *
    
    

    …does not get assigned a gateway either, and does the expected behaviour of showing the source (in 10.24.6.9's logs) as 192.168.97.85



  • More reporting:

    I'm presuming this has something to do with the new "nat before ipsec"?

    
    # pfctl -sa -vv | grep nat | grep 10.24 | less
    @3 nat on bge0 inet from 10.24.0.0/16 port = isakmp to any port = isakmp -> 8.11.97.254 port 500
    @9 nat on bge0 inet from 10.24.0.0/16 to any -> 8.11.97.254 port 1024:65535
    @17 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.80 -> 192.168.97.2 port 1024:65535
    @21 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.74 -> 192.168.97.2 port 1024:65535
    @25 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.67 -> 192.168.97.2 port 1024:65535
    @29 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.85 -> 192.168.97.2 port 1024:65535
    @33 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.50 -> 192.168.97.2 port 1024:65535
    @37 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.51 -> 192.168.97.2 port 1024:65535
    @41 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.52 -> 192.168.97.2 port 1024:65535
    @45 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.53 -> 192.168.97.2 port 1024:65535
    @49 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.54 -> 192.168.97.2 port 1024:65535
    @53 nat on bge1 inet from 10.24.0.0/16 to 192.168.97.55 -> 192.168.97.2 port 1024:65535
    
    

    This goes on and on for every WAN IP that is NAT 1:1… These rules are not in any of the XML when I do a backup.... where is this defined so I can turn it off and go back to the old/expected behaviour? (or perhaps this is just the NAT 1:1 reflection?)

    Either way, I see these rules for 10.97.0.0/16 (which is being forced through a gateway/translated to 'source 192.168.97.2') as well as 10.24.0.0/16 (which has the same behaviour) but not for 10.240.0.0/16 (the third IPSEC phase 2 which is still working as expected).



  • Thanks to the IRC channel, it was pointed out that it appeared as if my 10.24 and 10.97 were seen as LANs to my Colo Y/Z pfsense machines.

    Searching the XML for those two subnets only showed them up in a few places - DNS Forwarder entries, the IPSEC phase 2, and an alias I had of "LAN_OFFICE" LAN_OFFICE was defined as 10.24/16, 10.97/16 - AHA! This may explain why the 10.240 was still working right!

    I tried to delete the alias for LAN_OFFICE because I didn't see it referenced anywhere, and it could not be because of:  "Cannot delete alias. Currently in use by IPSEC Office-to-Colo"

    I couldn't find that reference anywhere at first - but there it was under

    
        <staticroutes><route><network>LAN_OFFICE</network>
                <gateway>LANG</gateway>
    
                <disabled></disabled></route></staticroutes> 
    
    

    Note the "disabled" here, but - after deleting (not just disabling) this static route, as well as the alias - everything went back to "normal" - and connections are now again showing their proper sources.

    To see if it was this disabled route - or the LAN_NAMED alias, I created another LAN_ alias, "LAN_TEST" and assigned it 192.168.5.0/24 just to test… This did not appear in my pfctl output (pfctl -sa) getting NAT redirection or anything else. I was presuming that perhaps there was an error in parsing aliases that have "LAN" in them - but that does not seem to be the issue. I am still unsure whether to blame an errantly named alias or a disabled static route - but either way - it IS working (as expected) again now.



  • It looks like there is an issue in a number of places where static routes are processed - the various functionality gets implemented even if the static route is disabled. It does not look to be fixed in 2.1.1 or 2.2 current code, so I raised a bug report: https://redmine.pfsense.org/issues/3560


Log in to reply