Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfsense 2.7.2-RELEASE crashes several times a day

    Scheduled Pinned Locked Moved General pfSense Questions
    11 Posts 3 Posters 1.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • W
      wesselloff
      last edited by

      Hi everyone!
      Recently, my pfSense started crashing several times a day.
      The background.
      I had version 2.7.0 installed. It worked very stably, the uptime was more than six months. I updated it to version 2.7.2. After the update, the system did not boot.
      I installed version 2.7.2 from scratch and uploaded the config from the previous version to it. After that, stability problems began.
      Unfortunately, I do not have enough knowledge to understand the reason for the failures from the dump.
      Please help me understand the reason for the failures.
      Thanks!
      textdump.tar.0
      info.0

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Backtrace:

        db:0:kdb.enter.default>  bt
        Tracing pid 43223 tid 100264 td 0xfffffe00af97f740
        kdb_enter() at kdb_enter+0x32/frame 0xfffffe00ae1854d0
        vpanic() at vpanic+0x163/frame 0xfffffe00ae185600
        panic() at panic+0x43/frame 0xfffffe00ae185660
        vm_fault() at vm_fault+0x15c5/frame 0xfffffe00ae185770
        vm_fault_trap() at vm_fault_trap+0xb0/frame 0xfffffe00ae1857c0
        trap_pfault() at trap_pfault+0x1d9/frame 0xfffffe00ae185820
        calltrap() at calltrap+0x8/frame 0xfffffe00ae185820
        --- trap 0xc, rip = 0xfffffe00ae185b60, rsp = 0xfffffe00ae1858f8, rbp = 0xfffff80005c7d740 ---
        ??() at 0xfffffe00ae185b60/frame 0xfffff80005c7d740
        .L.str.22() at .L.str.22+0x1/frame 0xb7f2
        

        Panic1:

        Fatal trap 12: page fault while in kernel mode
        cpuid = 2; apic id = 02
        fault virtual address	= 0x0
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff80dcd2df
        stack pointer	        = 0x28:0xfffffe00ae171b60
        frame pointer	        = 0x28:0xfffffe00ae171b60
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 843 (sh)
        rdi: fffff8000a6c5b58 rsi: 0000000000000001 rdx: 000000000000000d
        rcx: fffff8000a5e3a80  r8: 0000000000000000  r9: 0000000307d96cb8
        rax: 000000000000007f rbx: fffff8000a5e3a80 rbp: fffffe00ae171b60
        r10: 0000000000000000 r11: 0000000000000000 r12: fffffe00af93b900
        r13: 0000000000000001 r14: fffff8000596d000 r15: fffff8000a722cb0
        trap number		= 12
        panic: page fault
        cpuid = 2
        time = 1718885881
        KDB: enter: panic
        

        Panic2:

        <6>pid 86703 (sh), jid 0, uid 0: exited on signal 11 (core dumped)
        <6>pid 87613 (sh), jid 0, uid 0: exited on signal 11 (core dumped)
        
        
        Fatal trap 9: general protection fault while in kernel mode
        cpuid = 2; apic id = 02
        instruction pointer	= 0x20:0xffffffff8128487c
        stack pointer	        = 0x28:0xfffffe00ae1b3770
        frame pointer	        = 0x28:0xfffffe00ae1b38c0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 8787 (sh)
        rdi: fffff80001711600 rsi: 0010000000000008 rdx: fffffe00033f20c8
        rcx: fffff801ee5718d0  r8: fffff801ee22f000  r9: 0000000000000111
        rax: 0000000000000111 rbx: fffffffff0000000 rbp: fffffe00ae1b38c0
        r10: fffff801ee2d35e8 r11: fffff801ee22f000 r12: 00000001ee2d3067
        r13: fffffe001000ff80 r14: fffffe00033f2090 r15: 000000007fdea405
        trap number		= 9
        panic: general protection fault
        cpuid = 2
        time = 1718900423
        KDB: enter: panic
        

        Panic3:

        <6>pid 28342 (grep), jid 0, uid 0: exited on signal 6 (core dumped)
        
        
        Fatal trap 12: page fault while in kernel mode
        cpuid = 2; apic id = 02
        fault virtual address	= 0xa0
        fault code		= supervisor read data, page not present
        instruction pointer	= 0x20:0xffffffff81150557
        stack pointer	        = 0x28:0xfffffe00b0e79c28
        frame pointer	        = 0x28:0xfffffe00b0b92020
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 25082 (grep)
        rdi: fffff80220d80c60 rsi: 0000000000000000 rdx: 0000000000000000
        rcx: 0000000000003210  r8: fffff8023f762300  r9: fffff8023f762300
        rax: 0000000000000001 rbx: fffffe00b0b92020 rbp: fffffe00b0b92020
        r10: 0000000007d74000 r11: 0000000007d73fff r12: fffffe00b02c80c0
        r13: 0000000000000001 r14: fffff80220d80c60 r15: fffffe0086260000
        trap number		= 12
        panic: page fault
        cpuid = 2
        time = 1718917883
        KDB: enter: panic
        

        Panic4

        Fatal trap 9: general protection fault while in kernel mode
        cpuid = 3; apic id = 03
        instruction pointer	= 0x20:0xffffffff80cc8299
        stack pointer	        = 0x28:0xfffffe00ae18eea0
        frame pointer	        = 0x28:0xfffffe00ae18eee0
        code segment		= base 0x0, limit 0xfffff, type 0x1b
        			= DPL 0, pres 1, long 1, def32 0, gran 1
        processor eflags	= interrupt enabled, resume, IOPL = 0
        current process		= 1014 (pgrep)
        rdi: ffffffff83020980 rsi: 0000000000000000 rdx: 0000000000000004
        rcx: 0010000000000000  r8: 0000000000000000  r9: fffffe00ae18f200
        rax: 0000000000000010 rbx: fffffe00ae18f200 rbp: fffffe00ae18eee0
        r10: 0000000000000000 r11: fffffe00afbfac40 r12: fffffe00afbfa720
        r13: 0000000000000000 r14: fffffe0011955ae0 r15: ffffffff83020980
        trap number		= 9
        panic: general protection fault
        cpuid = 3
        time = 1718917930
        KDB: enter: panic
        ---<<BOOT>>---
        

        Panic5:

        panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe00b0739000
        cpuid = 3
        time = 1718920720
        KDB: enter: panic
        

        Panic6:

        panic: vm_fault_lookup: fault on nofault entry, addr: 0xfffffe00ae185000
        cpuid = 3
        time = 1718920799
        KDB: enter: panic
        

        Unfortunately we only have the backtrace from the most recent panic and it's not very helpful.
        Do you have other crash reports to compare?

        Potentially it could be bad RAM, but that's not clear.

        Do you have anything custom running?

        Steve

        W 1 Reply Last reply Reply Quote 0
        • W
          wesselloff @stephenw10
          last edited by

          @stephenw10
          Hi, Steve!
          Thanks a lot for the reply.
          I have attached a fresh dump.
          textdump.tar.0
          info.0

          When the frequent reboots started, the first thing I did was replace the RAM module. The problem did not go away, the system continued to reboot in the same way. Nothing has changed.
          The following packages are installed in the system:
          Screenshot_20240621_095633.png
          Packages only, there is nothing homemade.
          The Cron, acme and iperf packages have just been installed, but are not in use yet. WireGuard is actively used.

          GertjanG 1 Reply Last reply Reply Quote 0
          • GertjanG
            Gertjan @wesselloff
            last edited by Gertjan

            @wesselloff said in pfsense 2.7.2-RELEASE crashes several times a day:

            Packages only, there is nothing homemade.

            Except for one (arpwatch), none are up to date, not only 'acme'.
            What pfSense version are you using ?

            edit : ah, ok, 2.7.2.

            If possible, install pfSense on another device - or a VM, for a while.
            This will pin point hardware- or not - issues straight away.

            No "help me" PM's please. Use the forum, the community will thank you.
            Edit : and where are the logs ??

            W 1 Reply Last reply Reply Quote 0
            • W
              wesselloff @Gertjan
              last edited by

              @Gertjan
              thanks for the reply.
              I plan to move the SSD to another similar computer next weekend and look at further behavior.
              In my opinion, it is impractical to install on a virtual machine, because I will not be able to create working conditions similar to the current home network.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, unfortunately that's still pretty generic:

                db:0:kdb.enter.default>  bt
                Tracing pid 42874 tid 101438 td 0xfffffe00b04b3560
                kdb_enter() at kdb_enter+0x32/frame 0xfffffe00b070c4d0
                vpanic() at vpanic+0x163/frame 0xfffffe00b070c600
                panic() at panic+0x43/frame 0xfffffe00b070c660
                trap_fatal() at trap_fatal+0x40c/frame 0xfffffe00b070c6c0
                trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b070c720
                calltrap() at calltrap+0x8/frame 0xfffffe00b070c720
                --- trap 0xc, rip = 0x7ffffffff, rsp = 0xfffffe00b070c7f0, rbp = 0xfffff80005dbb458 ---
                ??() at 0x7ffffffff/frame 0xfffff80005dbb458
                ??() at 0xfffff80001db0800
                

                Some of those earlier panics looked more interesting so you might get lucky(er).

                The panics are all in different processes which makes it much more difficult to diagnose.

                1 Reply Last reply Reply Quote 0
                • W
                  wesselloff
                  last edited by wesselloff

                  Hello everyone.
                  The router rebooted 5 times that night. Right now I can't connect to it either via ssh or via the web interface. ssh just hangs, the web interface outputs "504 Gateway Time-out".
                  At the same time, the network is working fine, there is access to both the Internet and resources inside the home network.
                  We'll have to reboot it on power. After the reboot, I will attach a dump.

                  Upd.
                  Hmm, there are no dumps.
                  What was it at night then? The electricity was definitely not lost.

                  Upd2
                  Oh, there's just been a crash and a reboot.
                  The dump appeared.
                  textdump.tar.0
                  info.0

                  stephenw10S 1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator @wesselloff
                    last edited by

                    Hmm, backtrace still very generic:

                    db:0:kdb.enter.default>  bt
                    Tracing pid 47703 tid 100289 td 0xfffffe00b00c7ac0
                    kdb_enter() at kdb_enter+0x32/frame 0xfffffe00ae1ea830
                    vpanic() at vpanic+0x163/frame 0xfffffe00ae1ea960
                    panic() at panic+0x43/frame 0xfffffe00ae1ea9c0
                    vm_fault() at vm_fault+0x15c5/frame 0xfffffe00ae1eaad0
                    vm_fault_trap() at vm_fault_trap+0xb0/frame 0xfffffe00ae1eab20
                    trap_pfault() at trap_pfault+0x1d9/frame 0xfffffe00ae1eab80
                    calltrap() at calltrap+0x8/frame 0xfffffe00ae1eab80
                    --- trap 0xc, rip = 0xfffffe00ae1ead60, rsp = 0xfffffe00ae1eac58, rbp = 0x1 ---
                    ??() at 0xfffffe00ae1ead60/frame 0x1
                    

                    We do see the igb1 NIC going up and down repeatedly there but that shouldn't be an issue.

                    There are no timestamps on the message buffer, what is the timing here? How long after it boots does it panic?

                    Or is the NIC changing link immediately before the panic?

                    I would probably try disabling one or more services as a test at this point. It's likely some package is triggering this.

                    W 1 Reply Last reply Reply Quote 0
                    • W
                      wesselloff @stephenw10
                      last edited by

                      @stephenw10
                      I unplugged the cable from igb1. This is a backup Internet provider. Nothing has changed, reboots again.
                      What can be turned off for testing?
                      textdump.tar.0
                      info.0

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Try disabling wireguard if you can. That appears close to last in all the logs and the processes in the panic could well be from the wireguard scripts. But that's a guess!

                        1 Reply Last reply Reply Quote 0
                        • W
                          wesselloff
                          last edited by

                          Hello everyone
                          In general, the cause of the problems was a physical malfunction of the computer.
                          I had a second computer that was completely identical to the problem one. I installed pfSense on it from scratch and transferred all the settings to it manually. I haven't installed any additional packages yet. Since then, there has not been a single unplanned reboot, the system is completely stable. It's been over two months. I plan to reinstall the necessary packages in the near future and continue monitoring.

                          After transferring the system to a new computer, I decided to experiment with the old one.
                          To begin with, I decided to completely reinstall pfSense with SSD formatting. I booted from the LiveCD and started the installation. I didn't even have time to rebuild the disk, as I received an error and a reboot. I thought that the SSD was faulty (although his SMART is fine), I replaced it with another one. The error was repeated. That is, it's not about the disk or RAM, because I changed it earlier. But in the end, after 3-4 attempts, pfSense was still installed. But after standing on for a while, the computer spontaneously rebooted. Then again and again. No settings have been made yet.
                          Next, I decided to try installing Windows 10 on my computer to test it. The installation freezes completely after the first step.
                          As a result, the ideas ran out, the computer was turned off and put away. Maybe I'll throw it away later.

                          Thank you all so much for your help!

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.