BUG! Pfsense 2.1 - UDP port 4500 breaks outbound NAT
I've definitely found a bug in pfsense 2.1. At some point in time, pfsense will stop NAT'ing outbound udp packets going to port 4500, and instead send the packets out the WAN side using my internal IP address as the source.
The odd thing is that it starts doing this at random times, and a reboot does not fix it.
The device generating the outbound traffic to UDP/4500 is an AT&T microcell.
I had been running pfsense for several months with no problems at all. The only tweaking I did was playing around with traffic shaping rules to get my VoIP phone system working better. I did not do anything relating to the IP address of the microcell or port 4500. A couple of weeks ago, my microcell quit working. After wiresharking the WAN side of pfsense, I could see that UDP packets headed out of pfsense destined to port 4500 were going out with the source IP set to the internal address of my microcell, instead of pfsense's public IP address. A reboot of pfsense did not fix the problem.
Finally, I turned off automatic outbound rule generation, and created an explicit rule to NAT udp 4500 and moved the rule to the top of the list. This fixed the problem. For a while. A few days ago, I set up OpenVPN on pfsense using the wizard, and around the same time, outbound udp/4500 broke again.
I have a suspicion that UDP traffic to port 4500 is waking up some sort of IPSEC ALG code in pfsense which gets confused.
I am using an ATT microcell also behind pfsense along with OpenVPN and Racoon ((the ipsec VPN)) . I have not seen this problem. We have used manual outbound nat for a long time though. Are you doing any sort of NAT reflection?
No. I'm not doing anything that should allow packets to go out with the source IP address set the inside address. All I've been doing is a few things to make my Asterisk system talk to Vitelity (disable scrubbing, conservative firewall, etc).
I've re-installed pfsense and started over, and so far, I'm not seeing the problem. Keeping my fingers crossed.
Sorry for revisiting a slightly older post.
I'm encountering what seems to be the exact same problem as the OP. Only difference is that I'm running strongswan/ipsec on a dedicated vm. pfsense acts only as the gw/fw.
When it breaks, pfsense is sending out the 4500 traffic with the internal source IP of my ipsec server instead of the WAN IP. Verified using tcpdump on the console sniffing the wan adapter.
I've been running linux fw for years, and wanted to give pfsense a try. Got it setup this weekend. Did have to set the fw optimization to conservative in order to allow ssh traffic to not get "stuck", not sure what that's about. But otherwise it seems to be running fine.
Usually the port 4500 problem happens when I'm configuring rules/nat options. If I keep the fw configuration very basic, it seems like it runs ok (though I haven't let it run for days w/o touching it – too much fun getting everything configured and exploring all the options, reports, etc).
Anyway, running the current build of 2.1, strongswan 5.1 on virtualbox.
That kind of thing can happen if you set static port on your outbound NAT for all ports or specifically udp/4500. You do not need nor want static port in that scenario.
If you have static port set and this happens:
A:4500 -> X:4500 -> D:4500
If client B then also tries to use B:4500 to reach D:4500, the connection from B cannot have NAT applied since the static source port is already in use.
Thanks for the feedback.
At this point, I can't replicate the problem. My VPN seems to be running. There is still some weird things happening, but at least the tunnel is staying up and passing traffic.
What you said makes sense, and I checked and I do not currently have static port configured for the port 4500 tunnel. Perhaps I did before when I was first configuring the box.
I've spent more time the past few days looking at a port range issue (another post I just made, think it may be a real bug).
Anyway, If I encounter this issue again, I'll update the post.
Of course, I spoke too soon. I was able to replicate this.
While messing around with my open udp port range forward problem, I enabled the "do not scrub" option, and the problem returned.
For some odd reason, when the scrubbing is off, the system routes the lan traffic through the wan interface. wth.
I also noticed that I can use static port enabled, and it will allow port 4500 to be reused, works great when that scrubbing option is enabled.
So, not sure what that's about. What exactly does the scrubbing have to do with the IP addr rewriting (or lack thereof)?
I'll bet it is the "do not scrub"! I was playing with this option trying to keep asterisk's SIP registrations alive with my VoIP provider.
Oddly, I got fed up, reinstalled pfsense 2.1 from scratch, set the bare minimum settings I needed, and I haven't seen any SIP or port 4500 problems since.
I don't know why it works that way – but it does. It would imply there is some sort or routing problem that the "scrubbing" is fixing behind the scenes.
I normally don't like that kinda stuff -- I want stuff to work right. But as of today, I have both of my problems fixed (my vpn issue and my asterisk issue) and my pfsense installation is working perfectly.
I did try a reinstall once -- but reloaded the saved configuration.
One thing I did do that seems to have made my vpn solution work better, is enable "sloppy" for the state settings for the vpn rule. I had a problem with the asymmetric routing that was solved by changing that state mode for the vpn rule.
As long as I don't need to disable the packet scrubbing, I'm good.