PFSense 2.2.2 crash on fragmentation
-
I recently upgraded to v2.2.2 and since then my pfsense running on LEX NEO NE2301 (nanobsd) became unreliable. I had more than 7 sporadic reboots in the last 3 weeks, most of them around the same time (~01:05am). Previous version 2.1.4/5 was running for months without any problems.
I noticed by chance that I can ping the pfsense to death: If i send a large ICMP paket ("ping hostname -l 2000") through one of my IPSEC-VPN tunnel from a branch office to the main office, the pfsense is reproducable crashing. I assume that something similar occurs at night (branch->office backups are running).
I read about some sort of crash dump but I didn't find it in the file system. Do I have to manually enable it? Where are they located?
THX, Daniel
-
That's probably fragmentation, not large packets. You're not sending a 2000 byte ping, it's multiple fragmented packets.
32 bit I'm guessing? Adding system tunable net.inet.ipsec.directdispatch set to 0 under System>Advanced, System Tunables might fix that. Could be related to https://redmine.pfsense.org/issues/4537
nano versions don't have crash dumps because they don't have swap.
-
Yes, ist 32bit but I wasn't accessing the web GUI via VPN. Anyway, I'll give net.inet.ipsec.directdispatch=0 a try and let you know the results.
Many Thanks!
-
Does this setting need a reboot?
I made the change while I was connected via openVPN. Thereafter I accessed a branch office server and made a "ping -l 2000" to our main office server. It seems that this crashed the NAT? I was still able to access the pfsense GUI but none of the devices on the main office. I had to reboot the PFsense to get it working again.
Any idea how to track this issue down?
Thanks
Daniel -
Even with net.inet.ipsec.directdispatch=0 and a reboot the problem still exists. Any ideas how to track the reason?
Otherwise I'll have to go back to 2.1…
Thanks
Daniel -
(You probably already know this but ….) There are three ways to do system tunables in pfSense: using sysctl at a command line (nonpermanent, useful for testing), adding an entry to /etc/sysctl.conf (permanenet but requires a reboot), or using the pfSense gui to add the parameter (system -> advanced -> system tunables). Not sure which one you used, but if you used the command line then rebooted, the change is gone.
There seem to be a number of cases reported where the sysctl "net.inet.ipsec.directdispatch=0" fixes the issue, or is recommended as a fix:
https://forum.pfsense.org/index.php?topic=87946.0
https://forum.pfsense.org/index.php?topic=88606.0
https://redmine.pfsense.org/issues/4610
https://redmine.pfsense.org/issues/4537
https://redmine.pfsense.org/issues/4685Bug 4685 has a few other system tunables that may help. Near the end of that entry, Ermal seems to find a common root cause, running a proxy arp. But it also seems that may not be the right root cause.
You may want to try the three system tunables suggested there, as well as the new choparp package (if applicable for your situation), before reverting to 2.1.5
-
Thanks for your Input!
Not sure which one you used, but if you used the command line then rebooted, the change is gone.
I did use "system -> advanced -> system tunables", therefore this tuneable is still active on my box.
There seem to be a number of cases reported where the sysctl "net.inet.ipsec.directdispatch=0" fixes the issue, or is recommended as a fix:
https://forum.pfsense.org/index.php?topic=87946.0
https://redmine.pfsense.org/issues/4610
https://redmine.pfsense.org/issues/4537
https://redmine.pfsense.org/issues/4454First I thought these posts describe a different problem, because my box did crash when I ping a remote server across VPN. But now I can confirm that it also crashes if I remotely access the web gui.
https://forum.pfsense.org/index.php?topic=88606.0
I'm not using IP compression or cryptographic hardware acceleration. I can't use x64 because of the VIA Eden CPU. I'm also not using virtual IPs, so I assume the proxy arp / charp hints are not relevant to me?
IP random id generation was enabled by default. I've switched it to "no" and also set net.bpf.zerocopy_enable=0 and net.isr.dispatch=deferred. The box is still crshing reproducible:(
I have to mention that not every time the box is crashing completely: Sometimes just NAT breaks and I have to reboot pfsense to get it working again. It also doesn't crash if I access the web gui via openVPN and it only crashes on pings from one specific branch office. All others are able to access the web gui.
BTW: This specific branch office can successfully ping the gateway of my main office ISP with ICMP length 2000 but I don't get a reply on the next hop, which is the pfsense external IP!? It's the only branch office with fixed line and MTU 1500, all others are using PPPoE with MTU 1492.As long as I can easily reproduce the crash: Any developer interested in additional tests? Otherwise I'll go back to 2.1.5 on friday.
-
I would be interested on the crash dump shown to you.
-
@cmb:
nano versions don't have crash dumps because they don't have swap.
@ermal: Sorry, no dump. Is there any other way to trace this crash on nano systems?
-
I'm now back on 2.1.5. (same config) and get no more crashes on "ping -l 2000" across IPSEC-VPN.
Also interesting:This specific branch office can successfully ping the gateway of my main office ISP with ICMP length 2000 but I don't get a reply on the next hop, which is the pfsense external IP!?
With 2.1.5 I get replies by pinging the pfsense external Interface with fragmented packets which wasn't successfull with v2.2.2.