Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Fatal error every other day

    General pfSense Questions
    1
    1
    448
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J
      JohnWick
      last edited by

      First of all, thank you for all your effort; that a corporation like mine can rely on your products for two separate solutions is pretty awesome!
      (Sorry if it's incorrect to use the exclamation mark in as the message Icon. I just found it appropriate for a crash  :) )

      Now, to

      My setup:

      I have a couple of PfSense boxes located on two Dell blades (iDracs), PowerEdge R210 II. Each have a virtual bridged interface between WAN and LAN and function as a bridged firewall. They are redundantly configured via STP, so that connection is cut to the secondary firewall when ever the primary firewall is responding with BPDU-packets.

      Hardware:
      _igb0-3 (the bridged interfaces):
      Intel(R) PRO/1000 Network Connection version - 2.4.0
      Using MSIX interrupts with 5 vectors

      Intel(R) Xeon(R) CPU E3-1220 V2 @ 3.10GHz
      Current: 3100 MHz, Max: 3101 MHz
      4 CPUs: 1 package(s) x 4 core(s)_

      And my build:
      2.2.6-RELEASE (amd64)
      built on Mon Dec 21 14:50:08 CST 2015
      FreeBSD 10.1-RELEASE-p25

      The problem:

      Every other or third day, the primary firewall crashes, failing over to the secondary. I have attached a text-file with a dump.
      I take note of the following message, even though I am not 100% sure of how I should interpret it:

      Fatal trap 12: page fault while in kernel mode
      cpuid = 2; apic id = 04
      fault virtual address    = 0x1d
      fault code        = supervisor read data, page not present
      instruction pointer    = 0x20:0xffffffff80b904b7
      stack pointer            = 0x28:0xfffffe001a3d06c0
      frame pointer            = 0x28:0xfffffe001a3d0740
      code segment        = base 0x0, limit 0xfffff, type 0x1b
                  = DPL 0, pres 1, long 1, def32 0, gran 1
      processor eflags    = interrupt enabled, resume, IOPL = 0
      current process        = 12 (irq276: igb2:que 2)
      version.txt06000027512746101761  7624 ustarrootwheelFreeBSD 10.1-RELEASE-p25 #0 c39b63e(releng/10.1)-dirty: Mon Dec 21 15:20:13 CST 2015
          root@pfs22-amd64-builder:/usr/obj.RELENG_2_2.amd64/usr/pfSensesrc/src.RELENG_2_2/sys/pfSense_SMP.10

      Observations:

      I have monitored traffic on the inside (LAN) interfaces of the firewalls, and you can see two attached images of our primary and secondary firewalls.
      On the graphs, "outbound" means outbound from the firewall via the LAN-interface, i.e. from WAN to LAN.

      Firstly, I have attached an image of what I believe to be a precursor;

      Normally, I expect equal amounts of traffic on both firewalls, as they function as bridges and simply pass on all packets (firewalled, of course). Packets are blocked by STP on a later switch on the WAN-side. On the "precursor-graphs", we see a sudden spike in traffic on only the primary firewall, after which traffic flows unevenly. The spike is around 200 Mbit, which is also observed in other "precursors".

      Next, I have attached an image of the actual crash;

      About an hour or two later, everything looks fine, except that the primary firewall just "disappears" on the graphs all of a sudden. This is because of the kernel crash.

      Now I do not know if the spikes and the crashes are even related - they may not be. I just found it odd. Especially since this abnormality has been observed more than once. See the file "another-crash".

      Dianosis?:

      Since the crash report says "current process        = 12 (irq276: igb2:que 2)", I have given it some thought that it may be because our TCP queue length is insufficient on the WAN-interface (igb2), and that a queue too large triggers a crash. The queue is set to a default of 1000, which can be turned up in case of heavy load. This guy (https://forum.pfsense.org/index.php?topic=68919.0) has done something similar, although he doesn't experience crashes as we do.

      I would love any feedback on this, as it is hard for me to troubleshoot this.
      Remember, I am not sure my "precursor"-observations are even relevant. It just seems odd.

      Cheers! :)
      firewall-precursor.PNG
      firewall-precursor.PNG_thumb
      firewall-crash.PNG
      firewall-crash.PNG_thumb
      another-crash.PNG
      another-crash.PNG_thumb
      fw-1-panick.txt

      1 Reply Last reply Reply Quote 0
      • First post
        Last post
      Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.