NAT stop working suddenly after a couple of packets



  • Hi all,

    I have a small private network in front of my public network.
    No NAT is done for the small network, therefore in order to have my firewall able to ping outside i need to set a public ip for packets going out.
    On 1st firewall, this is working properly, always natting the packets.
    On 2nd firewall, i have one/two packet correctly natted, and then suddely it stop working.

    Config:

    10.199.240.204 / 11.22.3.17 VIP for the two firewall
    10.199.240.205 / 11.22.3.18 1st firewall
    10.199.240.206 / 11.22.3.19 2nd firewall
    

    nat rules:

    nat on lagg1.11 inet from 10.199.240.205 to ! 10.199.240.200/29 -> 11.22.3.18 port 1024:65535
    nat on lagg1.11 inet from 10.11.0.0/16 to any -> 11.22.3.17 port 1024:65535
    nat on lagg1.11 inet from 192.168.6.0/24 to any -> 11.22.3.17 port 1024:65535
    [...]
    binat on lagg1.11 inet from 10.199.240.206 to any -> 11.22.3.19
    

    Test: tcpdump in background + ping to 8.8.4.4

    [root@pf2-lipi ~]# ping 8.8.4.4
    PING 8.8.4.4 (8.8.4.4): 56 data bytes
    17:02:33.137595 IP 11.22.3.19 > 8.8.4.4: ICMP echo request, id 8505, seq 0, length 64
    17:02:33.144200 IP 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 8505, seq 0, length 64
    64 bytes from 8.8.4.4: icmp_seq=0 ttl=55 time=6.698 ms
    17:02:34.143564 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 1, length 64
    17:02:35.145871 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 2, length 64
    17:02:36.146453 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 3, length 64
    17:02:37.156435 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 4, length 64
    ^C
    --- 8.8.4.4 ping statistics ---
    5 packets transmitted, 1 packets received, 80.0% packet loss
    round-trip min/avg/max/stddev = 6.698/6.698/6.698/0.000 ms
    [root@pf2-lipi ~]#
    

    As you can see, the first packet is NATed, the second is not.

    filter logs

    May 26 17:12:00 pf2-lipi filterlog: 97,,,1554678266,lagg1.11,match,pass,out,4,0x0,,64,34277,0,none,1,icmp,84,11.22.3.19,8.8.4.4,request,32992,064
    May 26 17:12:00 pf2-lipi filterlog: 97,,,1554678266,lagg1.11,match,pass,in,4,0x0,,55,0,0,none,1,icmp,84,8.8.4.4,10.199.240.206,reply,32992,064
    

  • Netgate Administrator

    Can we see a diagram here showing exactly how this is connected?

    Steve



  • Hi Steve,

    Sure,

    uplink #1        uplink #2
    10.199.240.201 10.199.240.202 (VIP: 10.199.240.203)
    |                   |
    -------|     |-------
            switch (VLAN 11, private interconnect 10.199.240.200/29)
    -------|     |-------
    |                   |
    10.199.240.205 10.199.240.206 
    fw1               fw2
    
    
    10.199.240.200/29 interconnect network
    10.199.240.201 uplink #1
    10.199.240.202 uplink #2
    10.199.240.203 uplink VIP (VIP among the two uplink: 10.199.240.203, which is the default gw for fw)
    10.199.240.204 fw VIP(VIP among the two fw: 10.199.240.204, which is the gw for uplink to 10.11.0.0/16)
    10.199.240.205 fw1
    10.199.240.206 fw2
    
    10.11.0.0/16 public network (fake ip)
    11.22.3.17 fw VIP public IP
    11.22.3.18 fw1 public IP
    11.22.3.19 fw2 public IP
    

    As (i guess you through email) noticed, there is a particular line in the which show that the packet go back for 10.199.240.206.

    My best guessing is that given the peculiarity of this network, fw1 is contacted from upstream, and therefore there is an asymmetric routing.

    Thanks,
    Daniele


  • Netgate Administrator

    Ah, Ok that looks like what I had assumed. The only confusing part here is that what you also showed the public /28 subnet (here as 11.22.3.16/28) on a DMZ interface directly. That is the case?

    I assume fw2 has 10.199.240.203 as it's default route?

    And that whatever is upstream is routing 11.22.3.16/28 to 10.199.240.204?

    That seems like a problem as replies to 11.22.3.19 will be sent back to fw1.

    Check the MAC addresses in that packet capture. Do the one or two replies you see actually come from the upstream gateway?

    Steve



  • @stephenw10 said in NAT stop working suddenly after a couple of packets:

    Ah, Ok that looks like what I had assumed. The only confusing part here is that what you also showed the public /28 subnet (here as 11.22.3.16/28) on a DMZ interface directly. That is the case?

    Correct, 11.22.3.16/28 is defined as DMZ directly (but of course from the schema perspective behind the WAN).

    I assume fw2 has 10.199.240.203 as it's default route?

    Correct, both fw1 and fw2 has 10.199.240.203 as default gw

    And that whatever is upstream is routing 11.22.3.16/28 to 10.199.240.204?

    Correct.
    And whatever is the correct wording (no cdp, no lldp, no nothing).

    That seems like a problem as replies to 11.22.3.19 will be sent back to fw1.

    Check the MAC addresses in that packet capture. Do the one or two replies you see actually come from the upstream gateway?

    from fw1 arp table

    ? (10.199.240.203) at 00:00:5e:00:01:01 on lagg1.11 expires in 294 seconds [vlan]
    ? (10.199.240.205) at 00:08:a2:0e:cb:99 on lagg1.11 permanent [vlan]
    ? (10.199.240.206) at 00:08:a2:0e:cf:e1 on lagg1.11 expires in 747 seconds [vlan]
    

    fw2

    PING 8.8.4.4 (8.8.4.4): 56 data bytes
    00:49:10.053861 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 11.22.3.19 > 8.8.4.4: ICMP echo request, id 2129, seq 0, length 64
    00:49:10.062017 00:08:a2:0e:cb:99 > 00:08:a2:0e:cf:e1, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 2129, seq 0, length 64
    64 bytes from 8.8.4.4: icmp_seq=0 ttl=55 time=8.259 ms
    00:49:11.062450 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 1, length 64
    00:49:12.071027 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 2, length 64
    00:49:13.077363 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 3, length 64
    00:49:14.080101 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 4, length 64
    

    fw1

    00:49:09.733513 00:a3:8e:3a:0c:3f > 00:00:5e:00:01:15, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 11.22.3.19: ICMP echo reply, id 2129, seq 0, length 64
    00:49:09.733559 00:08:a2:0e:cb:99 > 00:08:a2:0e:cf:e1, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 2129, seq 0, length 64
    

    For the record, the most interesting options (i think) to be known in the current configuration
    Firewall Optimization Options -> conservative
    Bypass firewall rules for traffic on the same interface -> checked
    Anti lock out -> enabled

    HTH


  • Netgate Administrator

    OK so the reply does go back to fw1 and then gets sent back from there. Presumably because the NAT state is sync'd across so it is able to reverse the translation and then send it directly.
    But why then do we see no further replies on either fw2 or fw1.... 🤔

    It may not matter as TCP traffic would be broken by asymmetry anyway.

    If the upstream provider can route that IP separately that would work but I doubt that's possible.

    There are some ugly workarounds to provide a valid route to the secondary when it's backup with only one IP that would probably apply here. Like switching the default route the primary LAN. Ugly!

    Steve



  • @stephenw10 said in NAT stop working suddenly after a couple of packets:

    OK so the reply does go back to fw and then gets sent back from there. Presumably because the NAT state is sync'd across so it is able to reverse the translation and then send it directly.
    But why then do we see no further replies on either fw2 or fw1.... 🤔

    Can be also something related to the state sync?
    Otherwise NAT (as it is 1:1, but this is the same for outgoing) should anyway NAT the source IP.

    There are some ugly workarounds to provide a valid route to the secondary when it's backup with only one IP that would probably apply here. Like switching the default route the primary LAN. Ugly!

    Of course, the best would be to have simply a public transport subnet...

    Still, it is interesting to think about how pfsense can manage such situation.

    Is really state counting more than nat?
    Is maybe the fact that Anti lock out is enabled, and (i presume) keep is used for the state?

    For the people following the thread (i do not think too much 😂 ), my feeling is that this can be an interesting corner case...


  • Netgate Administrator

    Mmm, the only reason you would ever not see that traffic NAT'd when the NAT rule is present and correct is if it cannot create the NAT state due to one already existing.
    I suspect a state is synced from fw1 somehow and it prevents the correct state being re-created. If there are no replies to the pings that doesn't happen so you see the outbound ping requests all NAT'd correctly.
    You might be able to prevent that happening with stateless rules for example.... but you need the NAT state synced to fw1 in order for it to send the replies back to fw2. You might be able to use a port forward for that maybe.

    All pretty ugly! And you would need to replicate whatever you put in place so that fw has connectivity when fw2 is master.

    Steve


Log in to reply