pfSense Active CARP Member Crashed: aesni_process -> crypto_dispatch ...



  • Hello,

    We experienced a pfSense Firewall hard crash and reboot yesterday. Reviewing the debug information, we suspect an issue with IPSec/AES-NI. At the time, the firewall was processing ~70Mb/s of inbound IPSec traffic and 30Mb/s of outbound IPSec traffic. This is considerably below our peak and the Firewalls have been stable since we upgraded them to pfSense 2.4.4-p3 late last year.

    The Firewall itself is a Dell R330 with a Intel(R) Xeon(R) CPU E3-1270 v5 @ 3.60GHz. AES-NI CPU Crypto is active. We are using IPSec with AES256-CBC/SHA-256 for the phase 1 and phase 2. It is a site-to-site configuration.

    Confirmation of pfSense version:

    FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019
        root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense
    

    Buffer:

    Fatal trap 12: page fault while in kernel mode
    
    
    cpuid = 0; 
    Fatal trap 12: page fault while in kernel mode
    
    Fatal trap 12: page fault while in kernel mode
    cpuid = 7; apic id = 07
    cpuid = 5; 
    apic id = 05
    
    Fatal trap 12: page fault while in kernel mode
    fault virtual address	= 0x1
    cpuid = 1; fault code		= supervisor read data, page not present
    apic id = 01
    instruction pointer	= 0x20:0xffffffff80f447a4
    stack pointer	        = 0x28:0xfffffe0000309e90
    frame pointer	        = 0x28:0xfffffe000030a0c0
    code segment		= base 0x0, limit 0xfffff, type 0x1b
    			= DPL 0, pres 1, long 1, def32 0, gran 1
    fault virtual address	= 0x1
    fault code		= supervisor read data, page not present
    instruction pointer	= 0x20:0xffffffff80f447a4
    stack pointer	        = 0x28:0xfffffe000031de90
    frame pointer	        = 0x28:0xfffffe000031e0c0
    fault virtual address	= 0x1
    processor eflags	= interrupt enabled, resume, IOPL = 0
    current process		= 12 (irq287: igb2:que 5)
    

    Stack trace:

    Tracing pid 12 tid 100126 td 0xfffff80004704620
    pf_test() at pf_test+0x1d24/frame 0xfffffe000030a0c0
    pf_check_out() at pf_check_out+0x1d/frame 0xfffffe000030a0e0
    pfil_run_hooks() at pfil_run_hooks+0x90/frame 0xfffffe000030a170
    ip_output() at ip_output+0xb1d/frame 0xfffffe000030a2a0
    ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe000030a2f0
    esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe000030a350
    aesni_process() at aesni_process+0x151/frame 0xfffffe000030a400
    crypto_dispatch() at crypto_dispatch+0x140/frame 0xfffffe000030a440
    esp_output() at esp_output+0x5cc/frame 0xfffffe000030a4e0
    ipsec4_perform_request() at ipsec4_perform_request+0x37f/frame 0xfffffe000030a580
    ipsec4_forward() at ipsec4_forward+0x5a/frame 0xfffffe000030a5b0
    ip_forward() at ip_forward+0x221/frame 0xfffffe000030a650
    ip_input() at ip_input+0x72a/frame 0xfffffe000030a6b0
    netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe000030a700
    ether_demux() at ether_demux+0x173/frame 0xfffffe000030a730
    ether_nh_input() at ether_nh_input+0x32b/frame 0xfffffe000030a790
    netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe000030a7e0
    ether_input() at ether_input+0x26/frame 0xfffffe000030a800
    igb_rxeof() at igb_rxeof+0x6e1/frame 0xfffffe000030a890
    igb_msix_que() at igb_msix_que+0x110/frame 0xfffffe000030a8e0
    intr_event_execute_handlers() at intr_event_execute_handlers+0xe9/frame 0xfffffe000030a920
    ithread_loop() at ithread_loop+0xe7/frame 0xfffffe000030a970
    fork_exit() at fork_exit+0x83/frame 0xfffffe000030a9b0
    fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000030a9b0
    

    Has this issue been seen before? The only similar issue we can find reported on the forum is this one:
    https://forum.netgate.com/topic/122238/pfsense-2-4-1-ikev2-ipsec-tunnel-under-load-crashes-whole-firewall-vm/27

    Thanks
    M


  • Rebel Alliance Developer Netgate

    That doesn't quite look like the other thread you mentioned, which I believe was also tied to https://redmine.pfsense.org/issues/8070 -- your crash has a bit different backtrace and also those were using AES-GCM, not AES-256.

    I'm not finding anything else that lines up exactly with what you are seeing here on current versions of FreeBSD.

    Does the behavior change if you disable AES-NI/cryptodev?

    Could you possibly try a 2.4.5-RC snapshot and see if it happens there?



  • Thank you Jimp.
    We had a re-occurrence last night. That was 12 days after the first incident. We are going to leave it another 12 days and then upgrade to 2.4.5.


Log in to reply