Intermittent NAT failures
-
We're seeing a number of pfSense installs over various versions (2.2.x and 2.3.x) intermittently failing to NAT packets. This happens almost exclusively with UDP streams, but occasionally with ICMP or TCP. Outbound NAT rules are automatic, there's nothing in the log files, and these internal users have other TCP and UDP conversations being NATed properly at the same time.
We're scheduling maintenance windows to try an upgrade to 2.4 to see if the problem is still present there.
Anyone seen anything similar?
https://pastebin.com/e5sCn7PS
-
I haven't seen anything similar, but I'll keep a tcpdump running on my PPPoE interface towards the world and see if it picks up any un-natted packets (2.4.2p1)
-
Kept a tcpdump going for ~8 hours yesterday, never saw any non-natted packets egressing my pppoe interface.
Realise this doesn't really help you, just another data point.
-
What are we looking at there?
What interface is em1?
What do the states look like?
What rule is creating them?
What are your Outbound NAT rules?
You mentioned TCP is not affected but that pcap shows presumably outbound SYNs from 10.0.0.0/8 addresses. Hard to say whether those were translated or not since details were not provided.
-
Outbound NAT rules are automatic, so everything should be going through NAT. Firewall rules are a simple allow all. em1 in the trace is my WAN port and the 10.10.0.0/20 network is on the LAN side. We're only seeing this on installs with a lot of traffic (e.g. consistently hitting 300-500 Mbps.)
-
How many states? I am unsure what the behavior is if there is not an available ephemeral source port for the outbound translation. You might need a pool of outbound NAT addresses if that is the case.
If you are truly seeing something intermittent there, that would be something I would certainly look at, especially if it only occurs during periods of high-traffic. That would take tens of thousands of simultaneous connections all to the same destination protocol:host:port however and seems unlikely.
Have you done anything like setting static source ports, reducing the available ephemeral source ports or maybe something else with outbound NAT?
-
How many states? I am unsure what the behavior is if there is not an available ephemeral source port for the outbound translation. You might need a pool of outbound NAT addresses if that is the case.
If you are truly seeing something intermittent there, that would be something I would certainly look at, especially if it only occurs during periods of high-traffic. That would take tens of thousands of simultaneous connections all to the same destination protocol:host:port however and seems unlikely.
Have you done anything like setting static source ports, reducing the available ephemeral source ports or maybe something else with outbound NAT?
I've not touched outbound NAT rules, I expect it to just work! We're dealing with many thousands of states; I can't seem to find a count anywhere in the UI, but as I said we're looking at pretty high traffic volumes. We'll give the additional NAT addresses a try and see how it works.
-
We're dealing with many thousands of states; I can't seem to find a count anywhere in the UI…
It's on the Dashboard, look again.
-
Ha, never noticed that. I was looking at the state table page!
Turns out I solved this by modifying a firewall rule on an unrelated VLAN interface. Checked the state table and noticed states related to OPT1 traffic were showing up on OPT2. Changed the firewall rule on OPT2 from source "any" to source "OPT2 network" and the problem was fixed. Doesn't explain why that traffic was coming out the wrong interface though…
-
would need more details to be able to make a determination. Glad it's fixed.