Navigation

    Netgate Discussion Forum
    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search

    PfSense Crash, cannot find root cause. Help!!

    General pfSense Questions
    3
    11
    403
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      scottys last edited by

      Hey everyone, I am hoping I can get a resolution as to the root cause as to why our pfsense crashed. I have the crash dumps, and I have googled just about everything in there and none of it seems to fit. I have looked into mbuf, however with 32gb of ram, I find it extremely hard to believe. All hardware checks are coming back as good and I have tested all the NICs and their ports and all of them are good. I have hit a wall and hoping someone can find the cause of the crash from the dump. It doesn't look to be a kernel panic as there is no trace or anything.

      Full textdump.tar.0 is attached: textdump.tar.0

      info.0 file:

      Dump header from device: /dev/da1p2
        Architecture: amd64
        Architecture Version: 1
        Dump Length: 156160
        Blocksize: 512
        Dumptime: Mon Apr 29 16:08:00 2019
        Hostname: pfSense.mydomain.net
        Magic: FreeBSD Text Dump
        Version String: FreeBSD 11.2-RELEASE-p3 #17 e6b497fa0a3(RELENG_2_4_4): Thu Sep 20 09:04:45 EDT 2018
          root@buildbot3:/crossbuild/ce-244/obj/amd64/WvDslnYb/crossbuild/ce-244/pfSense/tmp/FreeBSD-src/sys/pfS
        Panic String: 
        Dump Parity: 2586087224
        Bounds: 0
        Dump Status: good
      

      Thank you!

      1 Reply Last reply Reply Quote 0
      • KOM
        KOM last edited by

        Kernel panics are usually caused by misbehaving hardware. I'm not a FreeBSD tech but nobody else has replied yet. I may be totally off-base here.

        When your crash happens, it seems to be servicing the NIC:

        curthread    = 0xfffff8000b9ec620: pid 12 "irq296: igb4:que 0"
        current process		= 12 (irq296: igb4:que 0)
        

        Also:

        <7>sonewconn: pcb 0xfffff804073d21d0: Listen queue overflow: 193 already in queue awaiting acceptance (27 occurrences)
        

        which might be fixed by adding kern.ipc.somaxconn=4096 in System - Advanced - System Tunables.

        Read this and pay attention to the section on igb(4) cards. Try what is recommended re: setting kern.ipc.nmbclusters.

        https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html

        S 1 Reply Last reply Reply Quote 1
        • S
          scottys last edited by

          Thank you, I read that but the system has been running smoothly for over a year that I thought it couldnt it so I stopped reading before getting to the cards. All my WANs are located on "bce" card (4 port, 4 WANs) and my LAN is on the "igb" card (4 port, 1 used for LAN)

          So basically it looks like there was a mbufs overflow on the NIC(s) (from what you can tell, I mean obviously there was something happening as this is repeated 50 times in the dump

          sonewconn: pcb 0xfffff804073d21d0: Listen queue overflow: 193 already in queue awaiting acceptance (27 occurrences)
          

          So basically I just need to increase the memory allocation size for my NICs? The reason I find it hard to believe is looking at the backup pfsense currently running, right now is about the peak traffic so it is under the most load right now and looking at MBUF Usage: 3% (29136/1000000)
          And it never really moves from that 3% (I have yet to see it above 3%)

          1 Reply Last reply Reply Quote 0
          • KOM
            KOM last edited by KOM

            The crash happened while the system was talking to the igb NIC driver. What it was doing I can't tell you. Those sonewconn errors might have nothing to do with it, or everything. I don't know that either. I'm just trying to give you suggestions and options. What you do is up to you.

            I also noticed snort in your process list. While debugging this, you might want to temporarily disable any heavy packages like snort, suricata, or pfblocker just to rule them out. For example, there was an issue several months ago where a pfB list exceeded some threshold which started causing problems for people until they bumped a system tunable.

            S 1 Reply Last reply Reply Quote 0
            • S
              scottys @KOM last edited by

              @KOM said in PfSense Crash, cannot find root cause. Help!!:

              I don't know that either. I'm just trying to give you suggestions and options. What you do is up to you.

              I understand completely, just trying to understand

              For example, there was an issue several months ago where a pfB list exceeded some threshold which started causing problems for people into they bumped a system tunable.

              Do you happen to know what this is? (the tuneable).

              I was running Snort, pfBlockerNG, SquidProxy and SquidGuard at the time of the crash. Since the crash all services have been disabled. The only thing I can think of that would cause this is the OpenVAS Vulnerability Scan going running on our networks, but we have been hit with them from the outside and this isn't the first time I have ran the scan - the scan is ran about once every 3 months or so. So this pfsense has gone through at least 4 internal scans, and I know our servers have been hit with the same scanners as I see them on snort.

              1 Reply Last reply Reply Quote 0
              • KOM
                KOM last edited by

                @scottys said in PfSense Crash, cannot find root cause. Help!!:

                Do you happen to know what this is? (the tuneable).

                It was actually the firewall state table size, which is controlled via System - Advanced - Firewall & NAT - Firewall Maximum States. Default is 200000 and they recommend bumping it to 400000.

                S 1 Reply Last reply Reply Quote 0
                • S
                  scottys @KOM last edited by

                  @KOM Looking at the description, I think this could be the culprit
                  "Maximum number of table entries for systems such as aliases, sshguard, snort, etc, combined"

                  Since I did see some stuff with sshguard (OpenVAS scanning) and tens of thousands of sorts alerts, add pfBlockerNG country blocking and SquidGuard's list blocking, i think it could easily hit 400,000 entries.

                  Besides bumping it up, do you know of some kind of maintenance I can do to ensure that table stays under 400k? (if that was the culprit of the crash)

                  1 Reply Last reply Reply Quote 0
                  • KOM
                    KOM last edited by

                    No, not really. There are several Zabbix packages, but I don't know if that metric is being tracked or not with the FreeBSD OS template.

                    1 Reply Last reply Reply Quote 1
                    • S
                      scottys last edited by

                      bump just in case that isn't the issue and it is something else

                      @KOM Thank you for your help. I am in no way disreguarding what you have told me. Currently in testing with our backup to ensure stability with the new tunables. You did say

                      I'm not a FreeBSD tech but nobody else has replied yet. I may be totally off-base here

                      I just need to ensure that you are right on target

                      Thank you for all your help

                      1 Reply Last reply Reply Quote 0
                      • S
                        Stewart @KOM last edited by

                        @KOM said in PfSense Crash, cannot find root cause. Help!!:

                        Kernel panics are usually caused by misbehaving hardware. I'm not a FreeBSD tech but nobody else has replied yet. I may be totally off-base here.

                        When your crash happens, it seems to be servicing the NIC:

                        curthread    = 0xfffff8000b9ec620: pid 12 "irq296: igb4:que 0"
                        current process		= 12 (irq296: igb4:que 0)
                        

                        Also:

                        <7>sonewconn: pcb 0xfffff804073d21d0: Listen queue overflow: 193 already in queue awaiting acceptance (27 occurrences)
                        

                        which might be fixed by adding kern.ipc.somaxconn=4096 in System - Advanced - System Tunables.

                        Read this and pay attention to the section on igb(4) cards. Try what is recommended re: setting kern.ipc.nmbclusters.

                        https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html

                        Nothing really to add but I find it ironic that you say "I'm not a FreeBSD tech..." and then go on to troubleshoot the crash dump, suggest what appears to be a kernel change in the System Tunables, and give references. Then start talking about adjusting the Firewall State sizes. I kinda think that makes you "...a FreeBSD tech...", at least more than you think you are. :)

                        1 Reply Last reply Reply Quote 2
                        • KOM
                          KOM last edited by

                          I try to help out where I can. Even though I've been here five years or so, I still remember the feeling of being new and posing a question into the void and getting no response. If I think I can even point them in the right direction, I'll reply. You might notice that this forum has very few unanswered posts. Not all issues can be resolved via the community forums, but I think we have a pretty high success rate and that helps the project's reputation & success.

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post