• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Help Understanding a Crash [kernel panic]

General pfSense Questions
crash kernel panic pfsense help log
4
31
4.5k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N
    None 0
    last edited by Aug 3, 2021, 2:47 PM

    Hello,

    My pfsense recently crashed, and I don't know how to analyse the reports to track what caused it. Any help would be appreciated.
    ddb.txt

    On the msgbuf.txt, there are 96855 lines like these:
    <6>arp: 192.168.33.22 moved from [macaddress1] to [macaddress2] on igb1
    <6>arp: 192.168.33.37 moved from [macaddress3] to [macaddress4] on igb1
    And them:
    <6>igb1: link state changed to DOWN
    441.698420 [1071] netmap_obj_free ouch, double free on buffer 3512
    <6>igb1: link state changed to UP
    <6>igb1: link state changed to DOWN
    <6>igb1: link state changed to UP
    919.000342 [1684] nm_txsync_prologue igb1 TX0: fail 'head > kring->rtail && head < kring->rhead' h 571 c 571 t 512 rh 572 rc 572 rt 512 hc 572 ht 512
    919.000372 [1787] netmap_ring_reinit called for igb1 TX0
    167.534458 [1684] nm_txsync_prologue igb1 TX0: fail 'head > kring->rtail && head < kring->rhead' h 735 c 735 t 512 rh 736 rc 736 rt 512 hc 736 ht 512
    167.534487 [1787] netmap_ring_reinit called for igb1 TX0
    <6>igb1: link state changed to DOWN
    <6>igb1: link state changed to UP
    <6>igb1: link state changed to DOWN
    <6>igb1: link state changed to UP

    Fatal trap 9: general protection fault while in kernel mode
    cpuid = 0; apic id = 00
    instruction pointer = 0x20:0xffffffff812087db
    stack pointer = 0x28:0xfffffe00915e08d0
    frame pointer = 0x28:0xfffffe00915e08e0
    code segment = base 0x0, limit 0xfffff, type 0x1b
    = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags = interrupt enabled, resume, IOPL = 0
    current process = 28990 (FR#01)
    trap number = 9
    panic: general protection fault
    cpuid = 2
    time = 1627992019
    KDB: enter: panic

    ...

    Thanks!

    V 1 Reply Last reply Aug 3, 2021, 3:15 PM Reply Quote 0
    • V
      viragomann @None 0
      last edited by Aug 3, 2021, 3:15 PM

      @none-0 said in Help Understanding a Crash [kernel panic]:

      On the msgbuf.txt, there are 96855 lines like these:
      <6>arp: 192.168.33.22 moved from [macaddress1] to [macaddress2] on igb1
      <6>arp: 192.168.33.37 moved from [macaddress3] to [macaddress4] on igb1

      This might overrun the ARP cache at the end.

      You should figure out why these addresses move between two MACs. Possibly assigned them to multiple devices or interfaces?

      N 1 Reply Last reply Aug 3, 2021, 5:45 PM Reply Quote 0
      • N
        None 0 @viragomann
        last edited by Aug 3, 2021, 5:45 PM

        Hi @viragomann,
        I fixed the address problems (also checked System Log for new entries), but that alone could cause a crash?

        V 1 Reply Last reply Aug 3, 2021, 6:55 PM Reply Quote 0
        • V
          viragomann @None 0
          last edited by Aug 3, 2021, 6:55 PM

          @none-0
          Seems so. I've seen several threads here complaining crashes, while having IP / MAC flapping.

          1 Reply Last reply Reply Quote 1
          • S
            stephenw10 Netgate Administrator
            last edited by Aug 3, 2021, 10:57 PM

            I wouldn't expect that to if it's the same MACs each time. It's quite common to see that on systems that don't crash:
            https://docs.netgate.com/pfsense/en/latest/troubleshooting/logs-arp-moved.html

            I would guess it's a netmap issue. You are running Suricata in in-line mode?
            Try running it in legacy mode or non-blocking as a test if so.

            Steve

            1 Reply Last reply Reply Quote 1
            • B
              bmeeks
              last edited by bmeeks Aug 4, 2021, 2:54 AM Aug 4, 2021, 2:54 AM

              Those netmap errors are indicative of multiple threads fiddling with the netmap ring "first", "cur" and "head" pointers. I've seen that often in Suricata when using netmap (which the Inline IPS Mode does). That's something I'm working with the upstream Suricata team.

              1 Reply Last reply Reply Quote 2
              • N
                None 0
                last edited by Aug 4, 2021, 5:00 PM

                Thanks @stephenw10 and @bmeeks, I'm indeed using Suricata with Inline mode.
                Other than IP's conflicting, should I keep an eye for other problems that might "overwhelm" netmap in this case?

                B 1 Reply Last reply Aug 4, 2021, 5:08 PM Reply Quote 0
                • B
                  bmeeks @None 0
                  last edited by bmeeks Aug 4, 2021, 5:09 PM Aug 4, 2021, 5:08 PM

                  @none-0 said in Help Understanding a Crash [kernel panic]:

                  Thanks @stephenw10 and @bmeeks, I'm indeed using Suricata with Inline mode.
                  Other than IP's conflicting, should I keep an eye for other problems that might "overwhelm" netmap in this case?

                  It's not so much netmap being "overwhelmed" as it is an issue with the "non-thread safe nature" of the host stack ring pair exposed by netmap with FreeBSD. Currently the Suricata binary runs in autofp runmode. That means it uses multiple capture threads, but there is only a single pair of host stack netmap rings that have to be shared among those capture threads (the threads are capturing data going to and from the NIC and kernel host stack). The Suricata binary is not doing any kind of locking with those netmap threads accessing the single host stack ring pair. That is likely the root cause of the netmap errors. The threads are probably modifying some memory pointers concurrently, and that is confusing netmap. It detects the problem, and attempts to fix it by calling the netmap_ring_reinit() function. I'm working with an upstream developer to hopefully fix that in Suricata 6.0.4, due out later this year. That fix will include opening multiple host rings and including some mutex locking to keep things synchronized.

                  N 1 Reply Last reply Aug 4, 2021, 7:16 PM Reply Quote 3
                  • N
                    None 0 @bmeeks
                    last edited by Aug 4, 2021, 7:16 PM

                    Really appreciate that you explained it, @bmeeks!
                    I'll take that in consideration if another crash happens. Gonna switch to Legacy for a while, tho.

                    @bmeeks said in Help Understanding a Crash [kernel panic]:

                    I'm working with an upstream developer to hopefully fix that in Suricata 6.0.4, due out later this year. That fix will include opening multiple host rings and including some mutex locking to keep things synchronized.

                    Concurrency control must be difficult to implement. Keep up the good work!

                    1 Reply Last reply Reply Quote 0
                    • N
                      None 0
                      last edited by Aug 12, 2021, 2:12 PM

                      So, I turned Inline mode ON again, and after a day or so:

                      <118>pfSense 2.5.2-RELEASE amd64 Fri Jul 02 15:33:00 EDT 2021
                      <118>Bootup complete
                      <6>pid 88136 (suricata), jid 0, uid 0: exited on signal 11 (core dumped)
                      <6>arp: 192.168.0.84 moved from 00:15:5d:XX:XX:XX to 00:15:5d:XX:XX:XX on igb1
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      787.976946 [1684] nm_txsync_prologue igb1 TX0: fail 'head > kring->rtail && head < kring->rhead' h 975 c 975 t 512 rh 976 rc 976 rt 512 hc 976 ht 512
                      787.976965 [1787] netmap_ring_reinit called for igb1 TX0
                      730.812187 [1684] nm_txsync_prologue igb1 TX0: fail 'head > kring->rtail && head < kring->rhead' h 642 c 642 t 512 rh 643 rc 643 rt 512 hc 643 ht 512
                      730.812205 [1787] netmap_ring_reinit called for igb1 TX0
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      443.587062 [1684] nm_txsync_prologue igb1 TX0: fail 'head > kring->rtail && head < kring->rhead' h 804 c 804 t 512 rh 805 rc 805 rt 512 hc 805 ht 512
                      443.587089 [1787] netmap_ring_reinit called for igb1 TX0
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>arp: 192.168.0.43 moved from 00:1f:c3:XX:XX:XX to 24:f5:ab:XX:XX:XX on igb1
                      <6>arp: 192.168.0.43 moved from 24:f5:ab:XX:XX:XX to 00:1f:c3:XX:XX:XX on igb1
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP
                      <6>igb1: link state changed to DOWN
                      <6>igb1: link state changed to UP

                      Fatal trap 9: general protection fault while in kernel mode

                      Since inline + Suricata is causing crashes, you guys suggest to use Snort instead? Would single thread affect performance on a giga network with 100+ devices (generally speaking)?

                      1 Reply Last reply Reply Quote 0
                      • B
                        bmeeks
                        last edited by Aug 12, 2021, 2:19 PM

                        Performance would probably be about the same with Snort. That's because even though Suricata itself is multithreaded, the actual interface between the host OS stack and the NIC mediated by netmap is currently constrained by the fact only a single pair of host stack TX/RX rings is available. That in turn constrains Suricata to single-threaded operation for packet acquisition (same as Snort).

                        But this is something I am currently working on improving along with the Suricata development team. Perhaps this improvement is ready by the release of 6.0.4 Suricata. We will see.

                        In the meantime, why not just switch over to Legacy Blocking Mode for a while? Only if that really is a problem for you, would I switch over to Snort. Switching would mean reconfiguring a lot of stuff.

                        N 1 Reply Last reply Aug 17, 2021, 3:41 PM Reply Quote 0
                        • N
                          None 0 @bmeeks
                          last edited by Aug 17, 2021, 3:41 PM

                          @bmeeks Right. Thanks.

                          In legacy, do I need to enable and config SID Mgmt to ET rules to work, or just set IPS Policy + Ruleset?

                          B 1 Reply Last reply Aug 17, 2021, 3:58 PM Reply Quote 0
                          • B
                            bmeeks @None 0
                            last edited by Aug 17, 2021, 3:58 PM

                            @none-0 said in Help Understanding a Crash [kernel panic]:

                            @bmeeks Right. Thanks.

                            In legacy, do I need to enable and config SID Mgmt to ET rules to work, or just set IPS Policy + Ruleset?

                            IPS Policies are only available for Snort Subscriber Rules. That's because that feature depends on a special metadata tag that the Snort rules authors include with their rules package. The ET rules do not have that metadata tag. Each Snort rule is tagged by the authors with one or more IPS Policy tags, and the IPS Policy feature in the Snort and Suricata packages keys off that tag to select rules. Since the ET rules don't have the tag, the feature can't work for them.

                            So when using ET rules, you will need to manually enable the ET categories and/or the individual rules you want. You can do that either via SID MGMT (the best way, in my opinion), or on the CATEGORIES and RULES tabs by checking boxes and clicking icons.

                            You can use both techniques at the same time. So use IPS Policy to let it auto-select the Snort Subscriber rules, and then use either SID MGMT or the manual process to choose your ET rules.

                            1 Reply Last reply Reply Quote 1
                            • N
                              None 0
                              last edited by Aug 30, 2021, 7:54 PM

                              Hello guys,

                              I had another crash, and apparently for the same reason: IPs bouncing between two MACs, and then calling the netmap_ring_reinit. Strangely I was using the Legacy mode this time (so it may be another type of problem), and it's the first one since 13 days ago.

                              ddb.txt
                              msgbuf.txt

                              B 1 Reply Last reply Aug 30, 2021, 8:04 PM Reply Quote 0
                              • B
                                bmeeks @None 0
                                last edited by Aug 30, 2021, 8:04 PM

                                @none-0 said in Help Understanding a Crash [kernel panic]:

                                Hello guys,

                                I had another crash, and apparently for the same reason: IPs bouncing between two MACs, and then calling the netmap_ring_reinit. Strangely I was using the Legacy mode this time (so it may be another type of problem), and it's the first one since 13 days ago.

                                ddb.txt
                                msgbuf.txt

                                Using Legacy Mode would take netmap 100% out of the equation, and thus the netmap_ring_init() function could not be called. That is a netmap-specific function, and it will only be called by netmap itself. So you had something running using netmap operation, and Snort and Suricata are the only two packages I am aware of on pfSense that use netmap. Perhaps you had a duplicate zombie process running on a Suricata interface (two instances of Suricata running on the same interface). One old instance might have been running netmap mode and a newer one Legacy Mode, but both on the same interface.

                                You are running 2.5.2 RELEASE, so you don't have access to the latest Suricata updates with the new 6.0.3 binary. It will be interesting to see how that update performs for you when it becomes available.

                                N 1 Reply Last reply Aug 30, 2021, 8:31 PM Reply Quote 1
                                • N
                                  None 0 @bmeeks
                                  last edited by Aug 30, 2021, 8:31 PM

                                  oh man 😦
                                  I knew if this was the case, the other process shouldn't still be running after the restart, but anyway:
                                  /usr/local/bin/suricata -i igb1 -D -c /usr/local/etc/suricata/suricata_25400_igb1/suricata.yaml --pidfile /var/run/suricata_igb125400.pid
                                  Only this one now.

                                  You are running 2.5.2 RELEASE, so you don't have access to the latest Suricata updates with the new 6.0.3 binary. It will be interesting to see how that update performs for you when it becomes available.

                                  I'm looking forward to the release! Unfortunatly I can't move to devel and try it right now, but I will update this topic (if possible) when I do.

                                  Thanks again, @bmeeks!

                                  1 Reply Last reply Reply Quote 0
                                  • N
                                    None 0
                                    last edited by Sep 8, 2021, 5:32 PM

                                    Had another crash today :/
                                    Same reason. Should I uninstall n install Suricata again?

                                    B 1 Reply Last reply Sep 8, 2021, 7:04 PM Reply Quote 0
                                    • B
                                      bmeeks @None 0
                                      last edited by bmeeks Sep 8, 2021, 7:14 PM Sep 8, 2021, 7:04 PM

                                      @none-0 said in Help Understanding a Crash [kernel panic]:

                                      Had another crash today :/
                                      Same reason. Should I uninstall n install Suricata again?

                                      No, uninstalling and then reinstalling the exact same binary will not likely have any effect on that error.

                                      It is likely a problem within the compiled binary code, and the exact same code will get installed again if you remove and reinstall the package. The only way to have a meaningful test would be to move to the 2.6.0 snapshot branch and install the newer Suricata from there. That package has a completely different binary in it.

                                      In my experience with that error, I have not seen it cause a kernel panic, but that does not mean it couldn't. Maybe it is tickling something in your system just right such that it triggers the crash. My suspicion is multithreaded access to the single pair of netmap RX/TX rings exposed by the host stack in the older netmap API used by Suricata in pfSense CE and pfSense+ RELEASE versions today. There is new code to address that in the 6.0.3 Suricata package currently available in the 2.6.0 Snapshots branch.

                                      N 1 Reply Last reply Sep 8, 2021, 7:36 PM Reply Quote 0
                                      • N
                                        None 0 @bmeeks
                                        last edited by Sep 8, 2021, 7:36 PM

                                        @bmeeks Just thought it could be a problem with my installation, but it doesn't seem to make sense...
                                        Alright, disabled blocking, and gonna try the DEV branch when I have time.

                                        Thanks!

                                        B 1 Reply Last reply Sep 8, 2021, 7:39 PM Reply Quote 0
                                        • B
                                          bmeeks @None 0
                                          last edited by bmeeks Sep 8, 2021, 7:41 PM Sep 8, 2021, 7:39 PM

                                          @none-0 said in Help Understanding a Crash [kernel panic]:

                                          @bmeeks Just thought it could be a problem with my installation, but it doesn't seem to make sense...
                                          Alright, disabled blocking, and gonna try the DEV branch when I have time.

                                          Thanks!

                                          If you are using Legacy Blocking Mode, then netmap is 100% completely and totally out of the picture. You should never see a netmap_ring_reinit() error in Legacy Mode. The only way to see that error is if something is still running with the netmap kernel device. If you are still getting kernel crashes and not using Inline IPS Mode on any interface, then Suricata is not the root cause of the problem.

                                          Everything I mentioned above about the new Suricata in the Snapshots branch only applies when using Inline IPS Mode.

                                          N 1 Reply Last reply Sep 8, 2021, 8:54 PM Reply Quote 1
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.