NAT stop working suddenly after a couple of packets

Rocco83

Hi all,

I have a small private network in front of my public network.
No NAT is done for the small network, therefore in order to have my firewall able to ping outside i need to set a public ip for packets going out.
On 1st firewall, this is working properly, always natting the packets.
On 2nd firewall, i have one/two packet correctly natted, and then suddely it stop working.

Config:

10.199.240.204 / 11.22.3.17 VIP for the two firewall
10.199.240.205 / 11.22.3.18 1st firewall
10.199.240.206 / 11.22.3.19 2nd firewall

nat rules:

nat on lagg1.11 inet from 10.199.240.205 to ! 10.199.240.200/29 -> 11.22.3.18 port 1024:65535
nat on lagg1.11 inet from 10.11.0.0/16 to any -> 11.22.3.17 port 1024:65535
nat on lagg1.11 inet from 192.168.6.0/24 to any -> 11.22.3.17 port 1024:65535
[...]
binat on lagg1.11 inet from 10.199.240.206 to any -> 11.22.3.19

Test: tcpdump in background + ping to 8.8.4.4

[root@pf2-lipi ~]# ping 8.8.4.4
PING 8.8.4.4 (8.8.4.4): 56 data bytes
17:02:33.137595 IP 11.22.3.19 > 8.8.4.4: ICMP echo request, id 8505, seq 0, length 64
17:02:33.144200 IP 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 8505, seq 0, length 64
64 bytes from 8.8.4.4: icmp_seq=0 ttl=55 time=6.698 ms
17:02:34.143564 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 1, length 64
17:02:35.145871 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 2, length 64
17:02:36.146453 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 3, length 64
17:02:37.156435 IP 10.199.240.206 > 8.8.4.4: ICMP echo request, id 8505, seq 4, length 64
^C
--- 8.8.4.4 ping statistics ---
5 packets transmitted, 1 packets received, 80.0% packet loss
round-trip min/avg/max/stddev = 6.698/6.698/6.698/0.000 ms
[root@pf2-lipi ~]#

As you can see, the first packet is NATed, the second is not.

filter logs

May 26 17:12:00 pf2-lipi filterlog: 97,,,1554678266,lagg1.11,match,pass,out,4,0x0,,64,34277,0,none,1,icmp,84,11.22.3.19,8.8.4.4,request,32992,064
May 26 17:12:00 pf2-lipi filterlog: 97,,,1554678266,lagg1.11,match,pass,in,4,0x0,,55,0,0,none,1,icmp,84,8.8.4.4,10.199.240.206,reply,32992,064

stephenw10

Can we see a diagram here showing exactly how this is connected?

Steve

Rocco83

Hi Steve,

Sure,

uplink #1        uplink #2
10.199.240.201 10.199.240.202 (VIP: 10.199.240.203)
|                   |
-------|     |-------
        switch (VLAN 11, private interconnect 10.199.240.200/29)
-------|     |-------
|                   |
10.199.240.205 10.199.240.206 
fw1               fw2

10.199.240.200/29 interconnect network
10.199.240.201 uplink #1
10.199.240.202 uplink #2
10.199.240.203 uplink VIP (VIP among the two uplink: 10.199.240.203, which is the default gw for fw)
10.199.240.204 fw VIP(VIP among the two fw: 10.199.240.204, which is the gw for uplink to 10.11.0.0/16)
10.199.240.205 fw1
10.199.240.206 fw2

10.11.0.0/16 public network (fake ip)
11.22.3.17 fw VIP public IP
11.22.3.18 fw1 public IP
11.22.3.19 fw2 public IP

As (i guess you through email) noticed, there is a particular line in the which show that the packet go back for 10.199.240.206.

My best guessing is that given the peculiarity of this network, fw1 is contacted from upstream, and therefore there is an asymmetric routing.

Thanks,
Daniele

stephenw10

Ah, Ok that looks like what I had assumed. The only confusing part here is that what you also showed the public /28 subnet (here as 11.22.3.16/28) on a DMZ interface directly. That is the case?

I assume fw2 has 10.199.240.203 as it's default route?

And that whatever is upstream is routing 11.22.3.16/28 to 10.199.240.204?

That seems like a problem as replies to 11.22.3.19 will be sent back to fw1.

Check the MAC addresses in that packet capture. Do the one or two replies you see actually come from the upstream gateway?

Steve

Rocco83

@stephenw10 said in NAT stop working suddenly after a couple of packets:

Ah, Ok that looks like what I had assumed. The only confusing part here is that what you also showed the public /28 subnet (here as 11.22.3.16/28) on a DMZ interface directly. That is the case?

Correct, 11.22.3.16/28 is defined as DMZ directly (but of course from the schema perspective behind the WAN).

I assume fw2 has 10.199.240.203 as it's default route?

Correct, both fw1 and fw2 has 10.199.240.203 as default gw

And that whatever is upstream is routing 11.22.3.16/28 to 10.199.240.204?

Correct.
And whatever is the correct wording (no cdp, no lldp, no nothing).

That seems like a problem as replies to 11.22.3.19 will be sent back to fw1.

Check the MAC addresses in that packet capture. Do the one or two replies you see actually come from the upstream gateway?

from fw1 arp table

? (10.199.240.203) at 00:00:5e:00:01:01 on lagg1.11 expires in 294 seconds [vlan]
? (10.199.240.205) at 00:08:a2:0e:cb:99 on lagg1.11 permanent [vlan]
? (10.199.240.206) at 00:08:a2:0e:cf:e1 on lagg1.11 expires in 747 seconds [vlan]

fw2

PING 8.8.4.4 (8.8.4.4): 56 data bytes
00:49:10.053861 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 11.22.3.19 > 8.8.4.4: ICMP echo request, id 2129, seq 0, length 64
00:49:10.062017 00:08:a2:0e:cb:99 > 00:08:a2:0e:cf:e1, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 2129, seq 0, length 64
64 bytes from 8.8.4.4: icmp_seq=0 ttl=55 time=8.259 ms
00:49:11.062450 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 1, length 64
00:49:12.071027 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 2, length 64
00:49:13.077363 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 3, length 64
00:49:14.080101 00:08:a2:0e:cf:e1 > 00:00:5e:00:01:01, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 10.199.240.206 > 8.8.4.4: ICMP echo request, id 2129, seq 4, length 64

fw1

00:49:09.733513 00:a3:8e:3a:0c:3f > 00:00:5e:00:01:15, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 11.22.3.19: ICMP echo reply, id 2129, seq 0, length 64
00:49:09.733559 00:08:a2:0e:cb:99 > 00:08:a2:0e:cf:e1, ethertype 802.1Q (0x8100), length 102: vlan 11, p 0, ethertype IPv4, 8.8.4.4 > 10.199.240.206: ICMP echo reply, id 2129, seq 0, length 64

For the record, the most interesting options (i think) to be known in the current configuration
Firewall Optimization Options -> conservative
Bypass firewall rules for traffic on the same interface -> checked
Anti lock out -> enabled

HTH

stephenw10

OK so the reply does go back to fw1 and then gets sent back from there. Presumably because the NAT state is sync'd across so it is able to reverse the translation and then send it directly.
But why then do we see no further replies on either fw2 or fw1....

It may not matter as TCP traffic would be broken by asymmetry anyway.

If the upstream provider can route that IP separately that would work but I doubt that's possible.

There are some ugly workarounds to provide a valid route to the secondary when it's backup with only one IP that would probably apply here. Like switching the default route the primary LAN. Ugly!

Steve

Rocco83

@stephenw10 said in NAT stop working suddenly after a couple of packets:

OK so the reply does go back to fw and then gets sent back from there. Presumably because the NAT state is sync'd across so it is able to reverse the translation and then send it directly.
But why then do we see no further replies on either fw2 or fw1....

Can be also something related to the state sync?
Otherwise NAT (as it is 1:1, but this is the same for outgoing) should anyway NAT the source IP.

There are some ugly workarounds to provide a valid route to the secondary when it's backup with only one IP that would probably apply here. Like switching the default route the primary LAN. Ugly!

Of course, the best would be to have simply a public transport subnet...

Still, it is interesting to think about how pfsense can manage such situation.

Is really state counting more than nat?
Is maybe the fact that Anti lock out is enabled, and (i presume) keep is used for the state?

For the people following the thread (i do not think too much ), my feeling is that this can be an interesting corner case...

stephenw10

Mmm, the only reason you would ever not see that traffic NAT'd when the NAT rule is present and correct is if it cannot create the NAT state due to one already existing.
I suspect a state is synced from fw1 somehow and it prevents the correct state being re-created. If there are no replies to the pings that doesn't happen so you see the outbound ping requests all NAT'd correctly.
You might be able to prevent that happening with stateless rules for example.... but you need the NAT state synced to fw1 in order for it to send the replies back to fw2. You might be able to use a port forward for that maybe.

All pretty ugly! And you would need to replicate whatever you put in place so that fw has connectivity when fw2 is master.

Steve