PfSense 2.4.1 - ikev2 IPSEC tunnel under load crashes whole firewall VM



  • Hello pfSense team,

    as there is preferred to open the forum topic before raising a bug, I am doing so. My pfSense Xenserver VM after upgrade from latest 2.3 to to 2.4.1 keeps crashing once I am transferring bigger amount of data through the IPSEC tunnel. I would like to collect some crash data, however it does not seems for me it is even able to create any crash file. Your hint where to search for them is welcome.

    Tunnel is established between two pfSense VMs, one running on ESXi 5.5 using CPU without AES-NI, second one (crashing one) is running on the Xenserver 7.0 on CPU with AES-NI. I can provide all details of the configuration, for now I have solved the issue by using the OpenVPN tunnel.

    Logs during issues shows loss of connectivity and simultaneous reboot:

    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense syslogd: kernel boot file is /boot/kernel/kernel
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: Copyright © 1992-2017 The FreeBSD Project.
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989                                                                                        , 1991, 1992, 1993, 1994
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: The Regents of the University of California. All                                                                                          rights reserved.
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: FreeBSD is a registered trademark of The FreeBSD                                                                                          Foundation.
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: FreeBSD 11.1-RELEASE-p2 #6 r313908+7eae9364d25(R                                                                                        ELENG_2_4): Sun Oct 22 17:32:35 CDT 2017
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: root@buildbot2.netgate.com:/builder/ce-241/tmp/o                                                                                        bj/builder/ce-241/tmp/FreeBSD-src/sys/pfSense amd64
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: FreeBSD clang version 4.0.0 (tags/RELEASE_400/fi                                                                                        nal 297347) (based on LLVM 4.0.0)
    Oct 31 10:46:19 pfSense syslogd: sendto: Network is unreachable
    Oct 31 10:46:19 pfSense kernel: VT(vga): text 80x25
    etc...

    Thanks,
    GyroK



  • I have the exact same issue. I think it has also happened on 2.4.0
    I do not have VMs, it does this on bare metal, with a Supermicro A1SRi-2558F.

    I can reproduce the problem by just copying some files through the IPSec tunnel.
    Luckily I do have crashdumps, there are three attached to this post. And of course, they have also been sent via the automatic crash dump thingy.

    crashdumps.zip



  • I disabled the AES-NI CPU-based crypto accelleration, rebooted. So far this seems to work.



  • I have the same issue on a SG-2440 unit.
    As soon the GB's are flowing through the IPSec tunnel the unit crashes within a few minutes.
    Also on the SG-2440 disabling AES-NI (System/Advanced/Misc) seems to prevent the crashes.
    This behavior is introduced since version 2.4.0, release 2.3.4-P1 was working fine.



  • Hello pfSense team,

    I did some research and found following bug https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219356

    I followed behavior description, and it is the same bug - after changing encryption from AES-GCM to AES tunnel is stable as was in pfSense 2.3

    Looks like some regression …

    Regards,

    GyroK



  • Who should be able to fix this bug?
    Is it the pfsense team, or should this be fixed by the FreeBSD developers?



  • @RMB:

    Is it the pfsense team, or should this be fixed by the FreeBSD developers?

    It has been fixed, in FreeBSD 11-STABLE, so this particular fix might get imported into pfSense. Don't know.
    Doesn't seem to be much activity, so I'll dump this in the pfSense bug tracker, because, well, I think we can safely say it's a bug.



  • Great, thanks!



  • Any news on this bug?
    The problem is still there in version 2.4.2.
    I have to disable AES-NI to prevent a kernel panic during load through an IPSec tunnel.



  • Nope, but feel free to comment on the redmine bug repo:
    https://redmine.pfsense.org/issues/8070

    Or find someone with a support contract that can complain.  ::)



  • Having what I believe is this issue since moving to 2.4.x
    Here is a picture of the console with the Kernel crash.
    No log available.
    Reverted back to version 2.3.x and the problem has not occurred as of yet.




  • One clarification on my application, using a supermicro motherboard with pfsense installed directly to hard drive.  No VM Software involved.



  • Anyone know if this release has a fix for this issue?

    2.4.3-DEVELOPMENT (amd64)
    built on Tue Mar 13 10:14:21 CDT 2018
    FreeBSD 11.1-RELEASE-p7

    I see this is a patched version of FreeBSD, and there was a reference to ipsec fixes in the release notes, but it wasn't clear if this fixed this same issue.



  • @Tacoma:

    Anyone know if this release has a fix for this issue?

    2.4.3-DEVELOPMENT (amd64)
    built on Tue Mar 13 10:14:21 CDT 2018
    FreeBSD 11.1-RELEASE-p7

    I see this is a patched version of FreeBSD, and there was a reference to ipsec fixes in the release notes, but it wasn't clear if this fixed this same issue.

    Unfortunately, this bug is still valid with the following SW version:

    2.4.3-RELEASE (amd64)
    built on Mon Mar 26 18:02:04 CDT 2018
    FreeBSD 11.1-RELEASE-p7

    GCM mode cannot be used on the machines with AES-NI.

    Regards,
    GyroK


  • Rebel Alliance Developer Netgate

    To claim it's unusable in general is untrue. The crash must be specific to a certain combination of hardware, traffic load, and/or pattern of traffic.

    Loads of people are using AES-NI and AES-GCM without crashing, including just about every Netgate employee from our home firewalls.



  • Can confirm this is occurring for me on two different systems.
    Both are running on ESXi 6.5, one on DL380 G8, the other on DL380 G9.
    NIC type is vmxnet3, open-vm-tools installed on both.
    Phase 1: AES128-GCM / 128 / SHA1 / DH2
    Phase 2: AES128-GCM / AES-XCBC / no PFS

    Hard crash with a reboot within 5 minutes of initiating continuous iperf run, sometimes one side, sometimes both.

    Switching to any non-AES-NI algorithms kills throughput, but doesn't hard crash.

    My```
    dmesg | grep -i aes

    Features2=0xffba2203 <sse3,pclmulqdq,ssse3,cx16,pcid,sse4.1,sse4.2,x2apic,popcnt,tscdlt,aesni,xsave,osxsave,avx,f16c,rdrand,hv>aesni0: <aes-cbc,aes-xts,aes-gcm,aes-icm>on motherboard</aes-cbc,aes-xts,aes-gcm,aes-icm></sse3,pclmulqdq,ssse3,cx16,pcid,sse4.1,sse4.2,x2apic,popcnt,tscdlt,aesni,xsave,osxsave,avx,f16c,rdrand,hv>

    
    I'll do some more testing this weekend when there's not as much production traffic flowing but for right now I'm knocked back down to plain AES.
    
    It does indeed make pfSense unusable for installations requiring decent IPSec interconnect speeds. Considering this issue I'll likely move to VyOS for my concentrators.
    
    Has anyone attempted to use the patch from the previous FreeBSD thread posted?
    
    Edit: both running 2.4.3-Release


  • @jimp we're experiencing the same problem. One client, using AES256-gcm, reliably crashes the SG-8860 w/pfsense 2.4.3 when using e.g. speedtest.net (during the upload phase), another can't bring it down at all. Switching back to plain AES with SHA512 seems to fix it for now. All clients are Macbook Pros.
    Kind regards,
    Lukas



  • is AES256-GCM better than AES?


  • Netgate

    AES-GCM is an authenticated cipher so you can eliminate the hashing step. If max IPsec performance is what you seek, AES-GCM is the way to go.



  • I got 2 sites connected via IPSEC using AES. Both have 100Mb connection to internet. And IPSEC uses the whole 100Mb bandwidth on file transfers. So what is the limitation for AES, compared to AES-GCM?


  • Netgate

    AES-GCM will consume fewer CPU cycles to accomplish the same task.



  • So basically, if you got powerful enough CPU/PC it dosn't matter which algorithm to use?


  • Netgate

    AES-GCM will use fewer CPU cycles to accomplish the same task.

    Fewer cycles means more cycles available for other tasks.

    You can waste them if you so desire.



  • ... as long as you can live with the occasional hard crash (at least on an SG-8860/Atom C7258 using AES-NI). I haven't yet found out what kind of traffic pattern causes this crash; switching to AES-CBC w/SHA512 removed the crashes reliably for us.



  • @jimp said in PfSense 2.4.1 - ikev2 IPSEC tunnel under load crashes whole firewall VM:

    To claim it's unusable in general is untrue. The crash must be specific to a certain combination of hardware, traffic load, and/or pattern of traffic.

    Loads of people are using AES-NI and AES-GCM without crashing, including just about every Netgate employee from our home firewalls.

    If that's the case, what are their hardware configurations? It seems to me that this issue is pretty common among users, and there isn't a pattern in hardware I can see.



  • Same issue with AES-GCM+AES-NI crashing the system:

    Version	2.4.3-RELEASE-p1 (amd64) 
    built on Thu May 10 15:02:52 CDT 2018 
    FreeBSD 11.1-RELEASE-p10 
    	 
    CPU Type	Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz
    Current: 3600 MHz, Max: 3601 MHz
    4 CPUs: 1 package(s) x 4 core(s)
    AES-NI CPU Crypto: Yes (active)
    

  • Netgate

    All kinds, but mostly Netgate devices such as the SG-2440, SG-4860, SG-3100.

    Intel(R) Atom(TM) CPU C2558 @ 2.40GHz
    4 CPUs: 1 package(s) x 4 core(s)
    AES-NI CPU Crypto: Yes (active)

    Uptime 122 Days 21 Hours 45 Minutes 01 Seconds

    AES_GCM_16
    MODP_2048
    IPComp: none

    I used TRex a few weeks ago to run terabytes and terabytes through AES-GCM IPsec trying to make it crash. Hundreds of megabits per second for days on end. Could not duplicate.



  • @derelict a few things were mentioned in the thread

    1. 2.3 was unaffected yet 2.4 was crashing
    2. issues appeared on VM and bare metal
    3. there was a bug in the kernel
    4. this "issue" first appeared 10 months ago
    5. usr @RMB is using Netgate product yet he was experiencing the issue

    Maybe I suggest:

    1. Let us know if that particular patch was merged
    2. Can you try running AES-GCM with any EC (say nist ecp384)
    3. What do you suggest for us the users to get to the bottom of this?

 

© Copyright 2002 - 2018 Rubicon Communications, LLC | Privacy Policy