Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense Kernel panic even on new hardware

    Scheduled Pinned Locked Moved General pfSense Questions
    28 Posts 4 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fireix @NogBadTheBad
      last edited by

      @nogbadthebad Didn't have any effect with.. I made sure only one nic was connected and it doesn't report the disconnect/connect to the QNAP in the error log, but it still crashed again just now.

      NogBadTheBadN 1 Reply Last reply Reply Quote 0
      • NogBadTheBadN
        NogBadTheBad @fireix
        last edited by

        @fireix ah it was worth a try :(

        Andy

        1 x Netgate SG-4860 - 3 x Linksys LGS308P - 1 x Aruba InstantOn AP22

        1 Reply Last reply Reply Quote 0
        • F
          fireix @stephenw10
          last edited by

          @stephenw10 Any idea what it can be? It must be some kind of combability issue with the Supermicro/pfSense software?

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Nope, I don't know what would cause that. It's similar to some other crashes we have seen in the past related to IPSec but those are now fixed.

            Can you test a 2.7 snapshot to see if the issue still happens there?

            Steve

            F 4 Replies Last reply Reply Quote 0
            • F
              fireix @stephenw10
              last edited by

              @stephenw10 Bit hard to test realistic with a snapshot version, since fw is live with clients :/ I do have 2nd backup fw server with same config (that also failed like this under load) so I can first see if that reserve-one gets a problem after a few weeks (without load) and then upgrade that one if I'm that lucky. But if the problem only appear during load of traffic, I guess I run out of options.

              I do have 4 IPSec-tunnels active, with low load (just a 5-6 Mbps at most active).

              1 Reply Last reply Reply Quote 0
              • F
                fireix @stephenw10
                last edited by

                @stephenw10 I did try the snapshop on live server. Less than 15 hours later, same happens on 2.7. If this happens to me, it should happen to many others also on standard Supermicro hardware also...

                1 Reply Last reply Reply Quote 0
                • F
                  fireix @stephenw10
                  last edited by

                  @stephenw10

                  Here are the dumps if it can help:

                  textdump.txt
                  info.txt

                  Since I have run one machine with traffic and one without (with the exact same config), I can say that the crash only occur with some traffic. The machine without any traffic on same version did not crash once.

                  1 Reply Last reply Reply Quote 0
                  • F
                    fireix @stephenw10
                    last edited by

                    @stephenw10 When searching for the kernel panic error code, I came across this post - a user with same error code during traffic :

                    https://forums.freebsd.org/threads/fatal-trap-12-page-fault-while-in-kernel-mode-during-network-operations.80474/

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      That isn't really similar, the only thing that's the same is the page fault but that backtrace is completely different. You are still seeing:

                      db:0:kdb.enter.default>  bt
                      Tracing pid 0 tid 100041 td 0xfffff800055b8000
                      kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004db230
                      vpanic() at vpanic+0x194/frame 0xfffffe00004db280
                      panic() at panic+0x43/frame 0xfffffe00004db2e0
                      trap_fatal() at trap_fatal+0x38f/frame 0xfffffe00004db340
                      trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004db3a0
                      calltrap() at calltrap+0x8/frame 0xfffffe00004db3a0
                      --- trap 0xc, rip = 0xffffffff80de5585, rsp = 0xfffffe00004db470, rbp = 0xfffffe00004db480 ---
                      turnstile_broadcast() at turnstile_broadcast+0x45/frame 0xfffffe00004db480
                      __mtx_unlock_sleep() at __mtx_unlock_sleep+0x7f/frame 0xfffffe00004db4b0
                      pf_test() at pf_test+0x9af/frame 0xfffffe00004db620
                      pf_check_out() at pf_check_out+0x1d/frame 0xfffffe00004db640
                      pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00004db6e0
                      ip_output() at ip_output+0xa74/frame 0xfffffe00004db830
                      ip_forward() at ip_forward+0x3aa/frame 0xfffffe00004db900
                      ip_input() at ip_input+0x854/frame 0xfffffe00004db9b0
                      netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00004dba00
                      ether_demux() at ether_demux+0x16a/frame 0xfffffe00004dba30
                      ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe00004dba90
                      netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00004dbae0
                      ether_input() at ether_input+0x89/frame 0xfffffe00004dbb40
                      iflib_rxeof() at iflib_rxeof+0xaa6/frame 0xfffffe00004dbc20
                      _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00004dbc60
                      gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe00004dbcc0
                      gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd2/frame 0xfffffe00004dbcf0
                      fork_exit() at fork_exit+0x7e/frame 0xfffffe00004dbd30
                      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004dbd30
                      --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                      

                      Which is almost identical to the previous panics.

                      That shouldn't happen but there has to be something unusual in your config there that's triggering it. Otherwise we would see numerous reports of it.

                      Steve

                      F 1 Reply Last reply Reply Quote 0
                      • F
                        fireix @stephenw10
                        last edited by

                        @stephenw10 I have a pretty simple setup now I think. No vlans, no segmentation, no custom routing, no dhcp/dns or special features.... Beside the IPSec VPN, I'm simply using it as firewall with a max 50 Mpbs traffic. Problem occour with very little traffic at random times.

                        Guess my only choice is to set it up from ground up and see if it helps. Have a ton of aliases/firewall rules, so that would be the boring part... But with a week work it should be possible. Hopefully I can at least export/import the IPSec VPN seperatly. The current config is simply imported from the config-backup-file to this new machine (that also fails), so the problem must be in it somewhere. And system then apparantly accept the config as valid.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          How is the IPSec configured? Using an unusual cipher perhaps?

                          F 1 Reply Last reply Reply Quote 0
                          • F
                            fireix @stephenw10
                            last edited by fireix

                            @stephenw10

                            Seems like AES.

                            AES_CBC (256)
                            HMAC_SHA2_256_128
                            PRF_HMAC_SHA2_256
                            MODP_2048

                            On the status-page under SAD, I found these 4 variants active:

                            aes-cbc (enc algo)

                            And these auth algos:
                            hmac-sha1
                            hmac-sha2-512
                            hmac-sha2-256

                            Is it some of these I should standarize on, just to be sure?

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Well I would avoid SHA1 if you can but all that should work fine. Nothing there is particularly obscure.
                              AES-GCM will be significantly faster at P2 if you can use it.

                              Steve

                              F 1 Reply Last reply Reply Quote 0
                              • F
                                fireix @stephenw10
                                last edited by fireix

                                @stephenw10 Instead of just dumping full backup and restore on new hardware like I have done before, I have now prepared like this:

                                Configured pfSense from scratch (basically only set up WAN and LAG-interface consisting of LAN1/LAN2) on different machine
                                Import NAT, Firewall Rules, Alias and IPSec (and only those 4 sections, not a single more)

                                These are the only packages/things I need/use today. I also have pfBlocker package, but can do without.

                                Can there really be some mistakes (by me) based on only those four configs that could crash pfSense/BSD?

                                At least this would rule out any advanced settings done in the past outside the config-files exported/imported, but of course if it is a mistake in a NAT fw rule, alias or IPSec somehow it wouldn't help.

                                Anything else I can do to track down this?

                                PS: This forum is so annoying. I had to switch IP in order to post. It gave just "Error" when tried to post the content above. No reason, nothing..

                                P 1 Reply Last reply Reply Quote 0
                                • P
                                  Patch @fireix
                                  last edited by Patch

                                  @fireix said in pfSense Kernel panic even on new hardware:

                                  PS: This forum is so annoying. I had to switch IP in order to post. It gave just "Error" when tried to post the content above. No reason,

                                  Happens frequently for me. The solution I use is:

                                  • Click on the web browser refresh button. (Doing so retains the post I'm creating in my experience but I sometimes copy it first just in case).

                                  • Click on the forum post "Submit" button (again)

                                  • The post is accepted without error

                                  1 Reply Last reply Reply Quote 0
                                  • stephenw10S
                                    stephenw10 Netgate Administrator
                                    last edited by

                                    That will be a good test.

                                    It's either something in the config triggering some unusual code path or something in the hardware. The hardware seems very unlikely since that's basically the same device we shipped for years without issue.

                                    Steve

                                    F 2 Replies Last reply Reply Quote 0
                                    • F
                                      fireix @stephenw10
                                      last edited by

                                      @stephenw10 Yeah, I have basically used this server since 2017 and the new of same model in 2018 (just for spare), so I have actually had it trouble free for 4 years before this started happening. What I did a year before this started was to convert it from transparent-mode to a more normal setup (different network on WAN/LAN) with 1-1 NAT. Didn't notice any issues for months. And adding 2 IPSec tunnels, instead of using OpenVPN. That is the only change in 5 years, it has been pretty stable. So very weird this should start now. I have also deactivated one IpSec tunnel I'm no longer using, so until I can shedule a downtime to switch over to the one I have prepared now, that can be a test also.

                                      1 Reply Last reply Reply Quote 0
                                      • F
                                        fireix @stephenw10
                                        last edited by

                                        @stephenw10 After I removed the LACP-lag I have had for ages (two ports in LACP against two stacked switches, configured for LACP on both sides, with good status), the problem stopped. No more kernel panics or issues since.

                                        1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Ah, well that's a good catch! Hmm, interesting. Nothing there really indicates lagg or lacp directly so I guess enabling that is somehow touching some other code... 🤔

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.