• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

23.01 crashing and won't reboot without console connection

Scheduled Pinned Locked Moved General pfSense Questions
21 Posts 2 Posters 1.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C
    Cloudless Smart Home
    last edited by Feb 16, 2023, 7:33 PM

    23.01 upgrade has been a real dumpster fire. this system has been so stable on 22.05. not one crash in over a year running, and now, not only does it kernel panic when trying to rename the previous boot environment, but I can't even run 22.05 anymore because packages are all out of date, and there seems to be no way to install or use any packages because they are all for 23.01 now. this is totally unbelievable. after it crashes, the only way to get it back up is connecting a console cable, because it seems to be getting stuck rebooting too. at least I'm not running this mess on anything other than a home lab, because if this was on a customers network, I would be totally screwed right now.

    my hardware: Protectli Vault FW2B - 2 Port, Firewall Micro Appliance/Mini PC - Intel Dual Core, AES-NI, 8GB RAM, 120GB mSATA SSD

    I have asked this question before and not gotten an answer. does anyone at negate want my crash log?

    1 Reply Last reply Reply Quote 0
    • S
      stephenw10 Netgate Administrator
      last edited by Feb 16, 2023, 11:57 PM

      Yes, lets see the log.

      What does it show on the console when you connect? What do you do to reboot it?

      The 22.05 packages are all still present. Just set the upgrade repo to 'Previous stable version (22.05)'.

      Steve

      C 1 Reply Last reply Feb 17, 2023, 12:05 AM Reply Quote 0
      • C
        Cloudless Smart Home @stephenw10
        last edited by Cloudless Smart Home Feb 17, 2023, 12:11 AM Feb 17, 2023, 12:05 AM

        @stephenw10 I don't really want to post it because I don't know how much private info that it contains. can I send it to you?

        I am running zfs, so I can just reboot to the 22.05 boot environment. I have the console connected through a terminal emulator that doesn't initialize it display until I hit the enter key, so it is always progressing through the boot when I connect, but no way to know where it was stuck when I connected the terminal. right now, I am still sticking with 23.01 but keeping the terminal connected so I can keep watch on it.

        1 Reply Last reply Reply Quote 0
        • S
          stephenw10 Netgate Administrator
          last edited by rcoleman-netgate Feb 19, 2023, 4:07 PM Feb 17, 2023, 12:08 AM

          Sure, upload it here: [URL Removed]

          C 1 Reply Last reply Feb 17, 2023, 12:14 AM Reply Quote 0
          • C
            Cloudless Smart Home @stephenw10
            last edited by Feb 17, 2023, 12:14 AM

            @stephenw10 thanks, I added some notes above. was editing when you replied. I have also unintalled pfblockerng, because it seemed likely that it's causing problems and always causing me trouble.

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Feb 17, 2023, 12:26 AM

              That's only the header, do you have the full crash report?

              The panic is interesting though:

                Panic String: spin lock held too long
              

              That was when you attempted to rename a boot environment snap?

              Steve

              C 1 Reply Last reply Feb 17, 2023, 12:28 AM Reply Quote 0
              • C
                Cloudless Smart Home @stephenw10
                last edited by Cloudless Smart Home Feb 17, 2023, 12:28 AM Feb 17, 2023, 12:28 AM

                @stephenw10 yes last crash was cleaning up the boot env name after upgrade. how can I get the rest?

                1 Reply Last reply Reply Quote 0
                • S
                  stephenw10 Netgate Administrator
                  last edited by Feb 17, 2023, 12:34 AM

                  I would expect to see a much larger tar file presented in the gui. Unless it was remoevd it should be in /var/crash.

                  C 1 Reply Last reply Feb 17, 2023, 12:46 AM Reply Quote 0
                  • C
                    Cloudless Smart Home @stephenw10
                    last edited by Cloudless Smart Home Feb 17, 2023, 12:47 AM Feb 17, 2023, 12:46 AM

                    @stephenw10 I deleted it then, I was given 2 options, download tar file or file, and I downloaded the file. didn't realize that might be less info. nothing in /var/crash. it seems you have to delete it to get rid of the banner message.

                    1 Reply Last reply Reply Quote 0
                    • S
                      stephenw10 Netgate Administrator
                      last edited by Feb 17, 2023, 12:51 AM

                      Ah, unfortunate. Well you could try to trigger it again. That would prove its not something difficult to repeat. I'll try to replicate it here.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • S
                        stephenw10 Netgate Administrator
                        last edited by Feb 17, 2023, 12:53 AM

                        What were you renaming it to?

                        It works as expected for simply renaming the BE. On my test box at least.

                        C 1 Reply Last reply Feb 17, 2023, 12:57 AM Reply Quote 0
                        • C
                          Cloudless Smart Home @stephenw10
                          last edited by Cloudless Smart Home Feb 17, 2023, 12:58 AM Feb 17, 2023, 12:57 AM

                          @stephenw10 I noticed that the name actually stuck after the crash / reboot. renamed to 23_01-working. have since renamed it to 23_01-crashing, and no crash this time.

                          1 Reply Last reply Reply Quote 0
                          • S
                            stephenw10 Netgate Administrator
                            last edited by Feb 17, 2023, 1:04 AM

                            It could have been coincidental. Looking at previous similar panics they are all in pf and nothing to do with ZFS or the filesystem at all.
                            If you see it again we'd love to review it.

                            C 1 Reply Last reply Feb 25, 2023, 2:38 PM Reply Quote 1
                            • C
                              Cloudless Smart Home @stephenw10
                              last edited by Cloudless Smart Home Feb 25, 2023, 2:49 PM Feb 25, 2023, 2:38 PM

                              @stephenw10 woke up to crashed pfsense again, and still won't boot after a crash without console connected. I have a much larger textdump file if I can send it to you again, would appreciate you looking at it. I see openvpn going up and down. its there any chance this could be affecting it...

                              https://forum.netgate.com/topic/177491/automatically-start-openvpn-server-when-my-phone-is-not-on-home-wifi-project-writeup?_=1677335639800

                              1 Reply Last reply Reply Quote 0
                              • S
                                stephenw10 Netgate Administrator
                                last edited by Feb 25, 2023, 3:07 PM

                                Sure if you can upload the full crashdump here I can review it:
                                https://nc.netgate.com/nextcloud/s/WEJQTXincHFo884

                                Steve

                                C 1 Reply Last reply Feb 25, 2023, 3:13 PM Reply Quote 0
                                • C
                                  Cloudless Smart Home @stephenw10
                                  last edited by Feb 25, 2023, 3:13 PM

                                  @stephenw10 thanks. ok, uploaded. thanks. maybe I need to contact protectli about bios settings or something preventing it from rebooting unattended after a crash. I suspect it's doing something like windows does after a major crash and is asking whether to boot normally or not, but since it's a major pain when the internet is out, I just plug in the console cable and hit enter on the blank screen, and it boots.

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    stephenw10 Netgate Administrator
                                    last edited by stephenw10 Feb 25, 2023, 3:32 PM Feb 25, 2023, 3:32 PM

                                    Ah the backtrace may be telling here:

                                    db:1:pfs> bt
                                    Tracing pid 63462 tid 100268 td 0xfffffe00d2cbb720
                                    kdb_enter() at kdb_enter+0x32/frame 0xfffffe00d2a704e0
                                    vpanic() at vpanic+0x182/frame 0xfffffe00d2a70530
                                    panic() at panic+0x43/frame 0xfffffe00d2a70590
                                    _mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x68/frame 0xfffffe00d2a705a0
                                    _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe00d2a70610
                                    uart_cnputc() at uart_cnputc+0xaf/frame 0xfffffe00d2a70640
                                    cnputc() at cnputc+0x4c/frame 0xfffffe00d2a70670
                                    cnputsn() at cnputsn+0x6a/frame 0xfffffe00d2a706b0
                                    putchar() at putchar+0x14a/frame 0xfffffe00d2a70740
                                    kvprintf() at kvprintf+0xf5/frame 0xfffffe00d2a70860
                                    _vprintf() at _vprintf+0x8c/frame 0xfffffe00d2a70950
                                    printf() at printf+0x53/frame 0xfffffe00d2a709b0
                                    trap_fatal() at trap_fatal+0x280/frame 0xfffffe00d2a70a10
                                    trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00d2a70a70
                                    calltrap() at calltrap+0x8/frame 0xfffffe00d2a70a70
                                    --- trap 0xc, rip = 0xffffffff80f9352c, rsp = 0xfffffe00d2a70b40, rbp = 0xfffffe00d2a70b70 ---
                                    X_ip_mrouter_done() at X_ip_mrouter_done+0x31c/frame 0xfffffe00d2a70b70
                                    rip_detach() at rip_detach+0x3f/frame 0xfffffe00d2a70ba0
                                    sorele_locked() at sorele_locked+0x89/frame 0xfffffe00d2a70bc0
                                    soclose() at soclose+0xeb/frame 0xfffffe00d2a70c20
                                    _fdrop() at _fdrop+0x11/frame 0xfffffe00d2a70c40
                                    closef() at closef+0x24b/frame 0xfffffe00d2a70cd0
                                    fdescfree() at fdescfree+0x4b3/frame 0xfffffe00d2a70d90
                                    
                                    --Exceeded input buffer--
                                    exit1() at exit1+0x4c7/frame 0xfffffe00d2a70df0
                                    sys_exit() at sys_exit+0xd/frame 0xfffffe00d2a70e00
                                    amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00d2a70f30
                                    fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00d2a70f30
                                    --- syscall (1, FreeBSD ELF64, sys_exit), rip = 0x8233c886a, rsp = 0x820ca3bd8, rbp = 0x820ca3bf0 ---
                                    

                                    And the panic:

                                    Fatal trap 12: page fault while in kernel mode
                                    cpuid = 0; apic id = 00
                                    fault virtual address	= 0x0
                                    fault code		= supervisor read data, page not present
                                    instruction pointer	= 0x20:0xffffffff80f9352c
                                    stack pointer	        = 0x28:0xfffffe00d2a70b40
                                    frame pointer	        = 0x28:0xfffffe00d2a70b70
                                    code segment		= base 0x0, limit 0xfffff, type 0x1b
                                    			= DPL 0, pres 1, long 1, def32 0, gran 1
                                    processor eflags	= interrupt enabled, resume, IOPL = 0
                                    ns8250: UART FCR is broken
                                    spin lock 0xfffff80001a9ec40 (uart_hwmtx) held by 0xfffffe0010cf6e40 (tid 100006) too long
                                    panic: spin lock held too long
                                    cpuid = 0
                                    time = 1677315261
                                    KDB: enter: panic
                                    

                                    And this could certainly be why it doesn't reboot. It looks like something is continually sending data to the UART console perhaps. That would interrupt the boot. What consoles do you have connected there and what are they connected to?

                                    C 1 Reply Last reply Feb 25, 2023, 4:11 PM Reply Quote 0
                                    • C
                                      Cloudless Smart Home @stephenw10
                                      last edited by Feb 25, 2023, 4:11 PM

                                      @stephenw10 this router is in my basement, so I have a console cable plugged in with an ethernet extension on the other end and nothing plugged in to the upstairs side, unless it goes down. don't know if that sounds clear but basically an ethernet cable plugged into the console port but open on the other side, like an extension cord. then, after a crash, I plug into that cable with a usb console cable and plug into my mac and open my terminal emulator program. do I need to go to the router and unplug the extension cable from the router side? do you think having a console cable plugged in to the protectli and open on the other end could cause this?

                                      btw, the script that I referenced above has a php echo command in it, but that works without errors, so I don't think it shouldn't cause issues. I do have that script set up to run on a cron job every 5 minutes, but pfsense sends lots of info to the terminal in normal operation, so that php echo shouldn't hurt anything, right? did you see in the dump file that openvpn was going up down up down? that concerns me too since my custom script does exactly that.

                                      1 Reply Last reply Reply Quote 0
                                      • S
                                        stephenw10 Netgate Administrator
                                        last edited by Feb 25, 2023, 4:22 PM

                                        Yes, I could imagine an unterminated very long console cable could generate enough random noise to send some characters.
                                        Does it reboot normally from the gui with the console in that state? That would no different after a panic.
                                        It may also be that something is generating the interference triggering the panic and then interrupting the reboot and doesn't apply at other times.

                                        C 1 Reply Last reply Feb 25, 2023, 4:28 PM Reply Quote 0
                                        • C
                                          Cloudless Smart Home @stephenw10
                                          last edited by Feb 25, 2023, 4:28 PM

                                          @stephenw10 if you are asking if I can reboot from the gui with that cable plugged in, the answer is yes I can. but for now, I will unplug it on the router side too, just in case. then plug in both sides next time. I also need to just drag a monitor and keyboard down to the basement next time, so I can see what the message on the screen says and hopefully figure out why it hangs after a crash. I saw this post this morning that fed my suspicion that there's a message on the screen after a crash, that just needs to be acknowledged for booting to continue...

                                          https://www.thegeekpub.com/14848/pfsense-hangs-at-booting/

                                          1 Reply Last reply Reply Quote 0
                                          • First post
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received