Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Pfsense crashes after update to 2.2.6

    Scheduled Pinned Locked Moved Hardware
    3 Posts 2 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D
      dogvoiceman
      last edited by

      I have an identical pair of Dell Poweredge 610 firewalls that were recently upgraded to version 2.2.6-RELEASE (amd64).

      They are configured to synchronse with pfsync and with all traffic directed to VIPs managed by CARP and which are held on the primary firewall by default.

      For a couple of days after the upgrade, both firewalls were stable. Then the secondary firewall took to crashing 3-4 times a day - that is the one that is carrying a negligible amount of traffic.
      The primary firewall, that carries all the traffic (>100Mb/s) has remained stable.

      They each have a built-in 4-port Broadcomm interface:

      
      bce0: <qlogic netxtreme="" ii="" bcm5709="" 1000base-t="" (c0)="">mem 0xd6000000-0xd7ffffff irq 36 at device 0.0 on pci1</qlogic> 
      

      and two Intel PCI cards (4-ports each):

      
      em0: 
      

      They each have the following lines in /boot/loader.conf as are recommended for these interfaces.

      
      kern.ipc.nmbclusters="1048576"
      hw.bce.tso_enable=0
      hw.pci.enable_msix=0
      
      

      They have worked fine and stably with older versions of pksense for the last 3 years.

      The crash dumps show that there are two slightly different crashes. One seems to be linked to each of the NIC drivers, bce and em. The bits of the crash dumps that led me to think this are given below - they show the bit at the end, showing the nature of the crash followed by the bit that matches the frame pointer of the trap.

      This is extracted from one of the crash dumps associated with the em driver

      
      Fatal trap 9: general protection fault while in kernel mode
      cpuid = 1; apic id = 22
      instruction pointer	= 0x20:0xffffffff80b2ee60
      stack pointer	        = 0x28:0xfffffe009f7dd680
      frame pointer	        = 0x28:0xfffffe009f7dd6a0
      code segment		= base 0x0, limit 0xfffff, type 0x1b
      			= DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags	= interrupt enabled, resume, IOPL = 0
      current process		= 0 (em0 que)
      version.txt06000027512663341510  7620 ustarrootwheelFreeBSD 10.1-RELEASE-p25 #0 c39b63e(releng/10.1)-dirty: Mon Dec 21 15:20:13 CST 2015
          root@pfs22-amd64-builder:/usr/obj.RELENG_2_2.amd64/usr/pfSensesrc/src.RELENG_2_2/sys/pfSense_SMP.10
      
      ...
      
      db:0:kdb.enter.default>  bt
      Tracing pid 0 tid 100064 td 0xfffff800038b5920
      m_freem() at m_freem+0x20/frame 0xfffffe009f7dd6a0
      carp_input_c() at carp_input_c+0x24b/frame 0xfffffe009f7dd7a0
      ip_input() at ip_input+0x118/frame 0xfffffe009f7dd7f0
      netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe009f7dd860
      ether_demux() at ether_demux+0x149/frame 0xfffffe009f7dd890
      ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe009f7dd8f0
      netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe009f7dd960
      ether_demux() at ether_demux+0xa5/frame 0xfffffe009f7dd990
      ether_nh_input() at ether_nh_input+0x347/frame 0xfffffe009f7dd9f0
      netisr_dispatch_src() at netisr_dispatch_src+0x62/frame 0xfffffe009f7dda60
      em_rxeof() at em_rxeof+0x40a/frame 0xfffffe009f7ddaf0
      em_handle_que() at em_handle_que+0x41/frame 0xfffffe009f7ddb30
      taskqueue_run_locked() at taskqueue_run_locked+0xe5/frame 0xfffffe009f7ddb80
      taskqueue_thread_loop() at taskqueue_thread_loop+0xa8/frame 0xfffffe009f7ddbb0
      fork_exit() at fork_exit+0x9a/frame 0xfffffe009f7ddbf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009f7ddbf0
      --- trap 0, rip = 0, rsp = 0xfffffe009f7ddcb0, rbp = 0 ---
      
      

      This is extracted from one of the crash dumps associated with the bce driver:

      
      Fatal trap 9: general protection fault while in kernel mode
      cpuid = 0; apic id = 20
      instruction pointer     = 0x20:0xffffffff80b30a53
      stack pointer           = 0x28:0xfffffe009f7d0a60
      frame pointer           = 0x28:0xfffffe009f7d0a90
      code segment            = base 0x0, limit 0xfffff, type 0x1b
                              = DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags        = interrupt enabled, resume, IOPL = 0
      current process         = 12 (irq259: bce3)
      
      ...
      
      db:0:kdb.enter.default>  bt
      Tracing pid 12 tid 100063 td 0xfffff800037a2000
      m_cat() at m_cat+0x13/frame 0xfffffe009f7d0a90
      bce_intr() at bce_intr+0x4f9/frame 0xfffffe009f7d0b20
      intr_event_execute_handlers() at intr_event_execute_handlers+0xab/frame 0xfffffe009f7d0b60
      ithread_loop() at ithread_loop+0x96/frame 0xfffffe009f7d0bb0
      fork_exit() at fork_exit+0x9a/frame 0xfffffe009f7d0bf0
      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009f7d0bf0
      --- trap 0, rip = 0, rsp = 0xfffffe009f7d0cb0, rbp = 0 ---
      db:0:kdb.enter.default>  ps
        pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
      ...
         12     0     0     0  RL      (threaded)                  [intr]
      
      

      The iDRAC console on the crashing server shows that all of its health checks are good.

      I am puzzled as to what could have changed to cause this sort of failure only on the idle server.

      Any ideas?

      1 Reply Last reply Reply Quote 0
      • C
        cmb
        last edited by

        Using limiters? Can't combine pfsync and limiters in 2.2.x and newer.

        1 Reply Last reply Reply Quote 0
        • D
          dogvoiceman
          last edited by

          Thanks very much cmb. I do have a limiter configured. I'll disable it and report back.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.