Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfSense Kernel panic even on new hardware

    Scheduled Pinned Locked Moved General pfSense Questions
    28 Posts 4 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fireix
      last edited by fireix

      For one year, I have had issues where pfSense crash about one time per month or less on host fw1, with same type of error logged after login to the dashboard. I reported it here in forum previous year, as you can see at bottom. I have tried uninstalling all packages (I don't need many), but this time I did a completely new install (lets call it fw2) on "new" server and it continues to happen even there!

      I have two boxes, one that I started with 4+ years ago and that now has had a break for 1,5 year - fw2. It has been offline and never had the problem. So after having this kernel-panic for a year now (Luckly, by random, it has usually gone down during night where there are less load, users and traffic. ), I finally managed to get time to reinstall a fresh pfSense on the backup-machine fw2 and imported the backup. After just one week on this new machine fw2, the kernel panic also appeared on this one!

      I was suspecting a hardware issue before this, but since I run on completely separate system and only the software/config is the same (with new install), we can rule out hardware errors. The problems all started after an upgrade of pfSense long time ago.

      How can I go ahead and better find out what this is? It is super annoying. The fw goes down for a reboot and it works fine when it comes back up. Usually it last for 30 days and then it just does it again.

      It is running Supermicro SYS-5018A-FTN4 with latest one having zfs raid 1 and 16 GB RAM.
      https://www.supermicro.com/en/products/system/1U/5018/SYS-5018A-FTN4.cfm

      Running latest stable version of pfSense.

      Re: Crash report

      Update: I tried to post the content of the log in the forum, but the "Akismet" determined that I was a spam-robot and had to get a new computer/IP and I lost what I had prepared ;) This is a bad Mnday.. A usual problem I have had at this forum though.

      Ok, here is the current log:

      I get "Akismet spam post" when tried to update the post with the code below.. trying to post a reply to see if that goes better...[error.txt]

      Debug-log:
      (/assets/uploads/files/1651644825072-error.txt)

      1 Reply Last reply Reply Quote 3
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        I upvoted some of your posts so your rep is now >5.
        See if you can now post.

        F 1 Reply Last reply Reply Quote 0
        • F
          fireix @stephenw10
          last edited by fireix

          @stephenw10 Thanks. Worked to upload the debug-file. Maybe someone could find out what this is.. The actual debug files was empty (0 in size), but the viewer window showed this debug-message:

          error.txt

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Hmm, so identical backtrace again then.

            No console buffer again. Are you able to send that to me separately? Or at least check it for errors yourself?

            It's the same config you're running on different hardware that does this? Have you ever run this config and had it not crash?

            I see you're running Zeek which is unusual. Can you disable that as a test?

            Steve

            F 1 Reply Last reply Reply Quote 0
            • F
              fireix @stephenw10
              last edited by

              @stephenw10 The debug file size was zero in this last case, probably have clicked it away earlier.

              Yes, I have had this current config before also, with no crash. I have previously disabled Zeek because I saw it in the debug-logs, so it was my first suspect. But it didn't help. So I enabled it again. I have now disabled it again. It can be up to 30 days before the error occur.

              One change I have done during last year, is to enable the crypto for VPN-speed. I guess it is this one "AES-NI CPU Crypto: Yes (active)". Maybe I should disable it for now?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                It doesn't look like anything in crypto. I doubt it's anything to do the AES-NI but it's an easy test.

                F 2 Replies Last reply Reply Quote 0
                • F
                  fireix @stephenw10
                  last edited by

                  @stephenw10 Nothing to do with Zeek at least. I ran with it disabled and it crashed again now. Machine rebooted by itself after the crash.

                  Attacked is debug-logs - I have anonymized the IPs that appear in the log.

                  textdump.tar.anon.txt

                  info.anon.txt

                  NogBadTheBadN 1 Reply Last reply Reply Quote 0
                  • NogBadTheBadN
                    NogBadTheBad @fireix
                    last edited by NogBadTheBad

                    @fireix Tried disconnecting one of the Lagg0 interfaces that goes to the QNAP?

                    There are loads of ARP moves.

                    Andy

                    1 x Netgate SG-4860 - 3 x Linksys LGS308P - 1 x Aruba InstantOn AP22

                    F 2 Replies Last reply Reply Quote 0
                    • F
                      fireix @NogBadTheBad
                      last edited by

                      @nogbadthebad I'm trying that now to see if it makes any difference.

                      1 Reply Last reply Reply Quote 0
                      • F
                        fireix @NogBadTheBad
                        last edited by

                        @nogbadthebad Didn't have any effect with.. I made sure only one nic was connected and it doesn't report the disconnect/connect to the QNAP in the error log, but it still crashed again just now.

                        NogBadTheBadN 1 Reply Last reply Reply Quote 0
                        • NogBadTheBadN
                          NogBadTheBad @fireix
                          last edited by

                          @fireix ah it was worth a try :(

                          Andy

                          1 x Netgate SG-4860 - 3 x Linksys LGS308P - 1 x Aruba InstantOn AP22

                          1 Reply Last reply Reply Quote 0
                          • F
                            fireix @stephenw10
                            last edited by

                            @stephenw10 Any idea what it can be? It must be some kind of combability issue with the Supermicro/pfSense software?

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Nope, I don't know what would cause that. It's similar to some other crashes we have seen in the past related to IPSec but those are now fixed.

                              Can you test a 2.7 snapshot to see if the issue still happens there?

                              Steve

                              F 4 Replies Last reply Reply Quote 0
                              • F
                                fireix @stephenw10
                                last edited by

                                @stephenw10 Bit hard to test realistic with a snapshot version, since fw is live with clients :/ I do have 2nd backup fw server with same config (that also failed like this under load) so I can first see if that reserve-one gets a problem after a few weeks (without load) and then upgrade that one if I'm that lucky. But if the problem only appear during load of traffic, I guess I run out of options.

                                I do have 4 IPSec-tunnels active, with low load (just a 5-6 Mbps at most active).

                                1 Reply Last reply Reply Quote 0
                                • F
                                  fireix @stephenw10
                                  last edited by

                                  @stephenw10 I did try the snapshop on live server. Less than 15 hours later, same happens on 2.7. If this happens to me, it should happen to many others also on standard Supermicro hardware also...

                                  1 Reply Last reply Reply Quote 0
                                  • F
                                    fireix @stephenw10
                                    last edited by

                                    @stephenw10

                                    Here are the dumps if it can help:

                                    textdump.txt
                                    info.txt

                                    Since I have run one machine with traffic and one without (with the exact same config), I can say that the crash only occur with some traffic. The machine without any traffic on same version did not crash once.

                                    1 Reply Last reply Reply Quote 0
                                    • F
                                      fireix @stephenw10
                                      last edited by

                                      @stephenw10 When searching for the kernel panic error code, I came across this post - a user with same error code during traffic :

                                      https://forums.freebsd.org/threads/fatal-trap-12-page-fault-while-in-kernel-mode-during-network-operations.80474/

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        That isn't really similar, the only thing that's the same is the page fault but that backtrace is completely different. You are still seeing:

                                        db:0:kdb.enter.default>  bt
                                        Tracing pid 0 tid 100041 td 0xfffff800055b8000
                                        kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004db230
                                        vpanic() at vpanic+0x194/frame 0xfffffe00004db280
                                        panic() at panic+0x43/frame 0xfffffe00004db2e0
                                        trap_fatal() at trap_fatal+0x38f/frame 0xfffffe00004db340
                                        trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004db3a0
                                        calltrap() at calltrap+0x8/frame 0xfffffe00004db3a0
                                        --- trap 0xc, rip = 0xffffffff80de5585, rsp = 0xfffffe00004db470, rbp = 0xfffffe00004db480 ---
                                        turnstile_broadcast() at turnstile_broadcast+0x45/frame 0xfffffe00004db480
                                        __mtx_unlock_sleep() at __mtx_unlock_sleep+0x7f/frame 0xfffffe00004db4b0
                                        pf_test() at pf_test+0x9af/frame 0xfffffe00004db620
                                        pf_check_out() at pf_check_out+0x1d/frame 0xfffffe00004db640
                                        pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00004db6e0
                                        ip_output() at ip_output+0xa74/frame 0xfffffe00004db830
                                        ip_forward() at ip_forward+0x3aa/frame 0xfffffe00004db900
                                        ip_input() at ip_input+0x854/frame 0xfffffe00004db9b0
                                        netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00004dba00
                                        ether_demux() at ether_demux+0x16a/frame 0xfffffe00004dba30
                                        ether_nh_input() at ether_nh_input+0x33b/frame 0xfffffe00004dba90
                                        netisr_dispatch_src() at netisr_dispatch_src+0xb9/frame 0xfffffe00004dbae0
                                        ether_input() at ether_input+0x89/frame 0xfffffe00004dbb40
                                        iflib_rxeof() at iflib_rxeof+0xaa6/frame 0xfffffe00004dbc20
                                        _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe00004dbc60
                                        gtaskqueue_run_locked() at gtaskqueue_run_locked+0x121/frame 0xfffffe00004dbcc0
                                        gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xd2/frame 0xfffffe00004dbcf0
                                        fork_exit() at fork_exit+0x7e/frame 0xfffffe00004dbd30
                                        fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004dbd30
                                        --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                                        

                                        Which is almost identical to the previous panics.

                                        That shouldn't happen but there has to be something unusual in your config there that's triggering it. Otherwise we would see numerous reports of it.

                                        Steve

                                        F 1 Reply Last reply Reply Quote 0
                                        • F
                                          fireix @stephenw10
                                          last edited by

                                          @stephenw10 I have a pretty simple setup now I think. No vlans, no segmentation, no custom routing, no dhcp/dns or special features.... Beside the IPSec VPN, I'm simply using it as firewall with a max 50 Mpbs traffic. Problem occour with very little traffic at random times.

                                          Guess my only choice is to set it up from ground up and see if it helps. Have a ton of aliases/firewall rules, so that would be the boring part... But with a week work it should be possible. Hopefully I can at least export/import the IPSec VPN seperatly. The current config is simply imported from the config-backup-file to this new machine (that also fails), so the problem must be in it somewhere. And system then apparantly accept the config as valid.

                                          1 Reply Last reply Reply Quote 0
                                          • stephenw10S
                                            stephenw10 Netgate Administrator
                                            last edited by

                                            How is the IPSec configured? Using an unusual cipher perhaps?

                                            F 1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.