No NAT processing for certain packets

stephenw10

Is that your bug linked above?

Before reopening that or opening a new one we need to see states and/or packet captures and compare that with the running ruleset.

What pfSense version are you running?

turrican64

Yes, the bug ticket is the linked one.

I took packet captures while the issue was happening as well as after, when I just stopped the SIP traffic for a minute and started again. Probably that pause cleared the state in pfSense, because after that the local SIP client continued sending packets with exactly the same source / destination IP and port (black tcpdump screenshot in the ticket) and has been still working fine.

I didn’t check the states but I am waiting for this to happen again and I will check the state as well

pfSense+ version is: 24.03

stephenw10

That's actually your bug report though, you opened it?

If so, yes, the first thing to do there is check the states and rules actually running when you see it.

turrican64

Hi @stephenw10

Yes, I opened that bug report.
The issue has started happening today again. The 10.20.33.1 IP is not translated to WAN IP address.

As you suggested I checked the states.
Filtering on 10.20.33.1:

Filtering on 103.140.134.2:

There are only automatic rules:

Can we reopen the bug ticket?

Thank you!
Best regards

stephenw10

Were there other open states on the WAN to the same remote IP address and port? Some other internal SIP device?

The most common cause of seeing outbound NAT not applied is that it conflicts with an existing state.

turrican64

Hi @stephenw10

@stephenw10 said in No NAT processing for certain packets:

Were there other open states on the WAN to the same remote IP address and port? Some other internal SIP device?

No. And, even if that was the case the source port randomization should resolve any conflict, in my uderstanding. Correct?
But the answer is no.

I kept the pfSense in this state for days, but yesterday I had to restore the service. What I did is just disconnected the SIP server computer's ethernet cable for approx 50 seconds and when I connected back the NAT was working properly.

stephenw10

@turrican64 said in No NAT processing for certain packets:

even if that was the case the source port randomization should resolve any conflict, in my understanding. Correct?

Yes, it should. Though only if NAT is working of course. Some old SIP devices only accept 5060 as a source port but that is clearly not the case here since it works fine after reconnecting.

So when this happens it appears to be spontaneous? Not associated with a filter reload perhaps?

The only other situation I have seen something like this is if a state is somehow opened before the NAT rules are loaded. That should not normally be possible unless you have some custom startup items?

turrican64

@stephenw10 said in No NAT processing for certain packets:

So when this happens it appears to be spontaneous? Not associated with a filter reload perhaps?

Yes, spontaneous and it happens every time when I am not even logged in to pfSense, therefore I don't initiate filter reload.

There is not custom startup.

Thank you!
Best regards

tinfoilmatt

@turrican64 said in No NAT processing for certain packets:

I have active automatic rules

Can you post a screencap of the configuration/'Edit' page of one of these NAT rules? Trying to see the format you've used to define source networks.

stephenw10

Those are automatically generated rules, they cannot be edited. By default pfSense creates rules for each internal subnet as source to any interface that has a gateway defined. What is there should be fine.

tinfoilmatt

@stephenw10 Then something might be wacky with how the install script/wizard/whatever function created those 'auto-added' rules. I just referenced one of my own boxes and there's a seperate "Auto created rule" for each defined subnet. This may simpy be due to my own topolgy. But the way OP's source networks appear in their NAT rule table looks weird to me.

Regardless, the 'Edit' page can still be accessed even for 'auto-added' rules.

stephenw10

In which pfSense version? The actual auto-rules cannot be edited. If you select manual outbound NAT mode a set of user rules are added equivalent to the auto rules that you can edit. Those remain even if set back to auto mode but they only do anything in manual or hybrid mode.

tinfoilmatt

@stephenw10 said in No NAT processing for certain packets:

If you select manual outbound NAT mode a set of user rules are added equivalent to the auto rules that you can edit.

That's exactly what I'm seeing then. Disregard, OP!

turrican64

Hi @stephenw10

Can we reopen the bug ticket?

Thank you!
Best regards

stephenw10

Done. Though I still think it must be something causing a state conflict somehow. It's going to be very difficult to reproduce.

I assume you cannot produce this on demand? You just have to wait for it to happen?

turrican64

Hi @stephenw10

Thank you for reopening the ticket.

@stephenw10 said in No NAT processing for certain packets:

I still think it must be something causing a state conflict somehow

If you think on something specific I am happy to share it.

@stephenw10 said in No NAT processing for certain packets:

I assume you cannot produce this on demand? You just have to wait for it to happen?

Correct, unfortunately I don't have a method to reproduce it, I have to wait for it to happen. Last time it took 4 weeks.

stephenw10

Nothing easy! What I expect is happening is that when the state tries to open a conflicting state exists. But to actually see that would require dumping the state table at that point. So we would need to script something to do it.

turrican64

Hi @stephenw10

How would it be possible to identify that moment and trigger the script?

I’d like to ask some questions in regards the bug ticket updates:

Like I mentioned a couple comments up, the way that happens is when something tries and fails to make a NAT state. Usually static port is the easiest way it happens

May I ask what static port means?

Another way is if they have so many connections to the same remote ip:port that they exhausted their pool of unique external source ports.

There’s only one computer talking to this remote IP:Port. But regardless I can’t imagine a scenario creating that many connections which could exhaust the pool of 65536 source ports. Is this a likely scenario?

Need to see things like their entire ruleset

How can I send it to the developers without exposing my ruleset publicly?

and state table at the exact moment a packet failed to get NAT applied.

I kept pfSense in this state for 3 days which means NAT was not applied to these packets for this much time and I posted the state table entry for this remote IP:Port. There was only one entry which means no conflict.

Is this not alone a proof of a bug?

turrican64

Additionally I can summarise my assumptions in this way:

NAT state was successfully created and was working well for 4 weeks
NAT state disappeared for some reason and it’s recreation was not successful so a faulty state was created
outgoing packets from the host arriving at 1 pps rate which kept alive this faulty state in the state table
once packets stopped arriving for 50 seconds (because I shut down the computer’s switch port) the faulty state from the state table disappeared
once packets started arriving from the computer again after 50s NAT state creation was successful and it has been working since

My expectation would be that even if for some reason (eg conflict) a NAT state is not possible to be created at a certain point of time a faulty state should not remain in the state table once the conflict is not existing anymore.

A log level which could print a message with the reason why NAT was failing would be helpful. Do you know if this is possible to set up such logging?

stephenw10

Yeah, good question. I imagine a floating outbound rule with logging enabled that passes traffic with a source of the internal subnet. That should not happen because the NAT rules should change the source before it reaches that normally. However that's exactly what's happening here. So that would at least log it to give an idea of when it happens and the frequency. But then we would want a script that dumps the state table when that log entry is created. I'm thinking about that.

'Static port' there means a rule where source port randomisation is disabled. pfSense adds a rule for IPSec on port 500 for that since a lot of IPSec servers will not accept connections from other source ports. Commonly some VoIP devices are also know to break with source port changes. So the most likely scenario we see this in is two VoIP phones trying to connect to the same remote PBX where they use the same OBN rule. The state already exists with SIP as source and destinations ports resulting in a conflict and the second device not being NAT'd.
But since you only have one device and no static source port rules that cannot be happening here.

The time point that matters is when the state was created. A conflicting state might expire seconds later and the bad state will remain.