• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Need some help. Random pfSense crashes.

Scheduled Pinned Locked Moved General pfSense Questions
25 Posts 5 Posters 2.5k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    aaronouthier
    last edited by Oct 7, 2022, 1:39 AM

    So, I have setup my pfsense box on a shiny new Zimaboard, because my aging APU 1D4 was, well, aging.

    I was having issues with losing all connectivity to/from the router until someone came along and physically pulled the plug and reseated it.

    Now, with this nice, new system, I seem to be having the same symptoms.

    Running pfSense 2.6.0, on a Zimaboard x64 SBC with Apollo Lake CPU, 4 GB RAM & 32 GB eMMC. I have 4 VLANS and a DLink DGS-1100-108V2 managed switch.

    I saved the crash report this time. Is there somewhere I can send it?

    Problem occurs anywhere from 1-2000 minutes after cold-rebooting and often during peak hours, but occasionally in the wee hours of the night.

    1 Reply Last reply Reply Quote 0
    • A
      aaronouthier
      last edited by Oct 7, 2022, 3:53 AM

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • A
        aaronouthier
        last edited by Oct 7, 2022, 3:55 AM

        Crash report.txt

        G 1 Reply Last reply Oct 7, 2022, 8:05 AM Reply Quote 0
        • G
          Gertjan @aaronouthier
          last edited by Gertjan Oct 7, 2022, 8:07 AM Oct 7, 2022, 8:05 AM

          @aaronouthier

          When booting, the file system was already in a dirty state.
          And when crashing, this might get worse.

          When handling some file system related tasks, the system crashed.

          How to Run a pfSense Software File System Check (5/2020)

          edit : when removing power without a admin initiated shut-down using the GUI or SSH/console, this can happen.
          Like your Windows/MAC PC : rip out the power (or battery several times. I'll bet your system won't boot any more after several tries.

          No "help me" PM's please. Use the forum, the community will thank you.
          Edit : and where are the logs ??

          1 Reply Last reply Reply Quote 0
          • A
            aaronouthier
            last edited by Oct 7, 2022, 12:10 PM

            I was reading the logs. It looked like it was doing an fsck on bootup. I have just scheduled a reboot with filesystem check. I’m crossing my fingers.

            I’ll also make a config file backup in case I need to wipe & reinstall.

            1 Reply Last reply Reply Quote 0
            • S
              stephenw10 Netgate Administrator
              last edited by Oct 7, 2022, 4:57 PM

              Mmm, that's not the normal bad ufs filesystem panic:

              dev = ufsid/4effa82c0d8b10f4, block = 6903583, fs = /
              panic: ffs_blkfree_cg: freeing free frag
              cpuid = 3
              time = 1665104129
              KDB: enter: panic
              panic.txt0600004114317674401  7142 ustarrootwheelffs_blkfree_cg: freeing free fragversion.txt0600007514317674401  7544 ustarrootwheelFreeBSD 12.3-STABLE RELENG_2_6_0-n226742-1285d6d205f pfSense
              

              But yeah, first reinstall clean. Are you running from eMMC?

              A 1 Reply Last reply Oct 7, 2022, 5:20 PM Reply Quote 0
              • A
                aaronouthier @stephenw10
                last edited by Oct 7, 2022, 5:20 PM

                @stephenw10

                Correct, running from eMMC.
                System hasn’t crashed since running the fs check, but it’s early yet. Now, I need to find a piece of wood to go knock on…

                A 1 Reply Last reply Oct 10, 2022, 4:12 AM Reply Quote 1
                • A
                  aaronouthier @aaronouthier
                  last edited by Oct 10, 2022, 4:12 AM

                  @aaronouthier

                  Ok now, same system, same symptoms, but no relief in sight!

                  I have wiped the emmc now and reinstalled the OS. An hour later, network access came to a grinding halt.

                  I did a wireshark trace on both LAN, and on a USB ethernet adapter with a LAN Tap (Hak5 PlunderBug) connected between my opt1 and my managed switch. For reference, opt1 has my vlans going through it, and LAN I normally keep as a dangling ethernet wire for a “failsafe”. You know, in case I do something stupid and mess up my config so badly I can’t access my system via any of my Vlans.

                  A regular Nmap scan on my LAN port (nmap <ip addr>) reported ports 80 and 443 open, however, nothing seems to be actively listening on port 443, and connecting to port 80 gives an nginx http 500 error with the words “bad gateway” written in large letters. I am able to ping the LAN port, an I am getting a DHCP lease upon disconnecting and reconnecting ethernet.

                  Also of note: the WAN port seems to be cycling the link light off-on-off-on-etc. every few seconds.

                  Also, no crash report is being generated any more.

                  For privacy reasons, I do not wish to post the wireshark dumps in public. If a mod wishes to take a look I can DM them a link to my Nextcloud, or if there is a support email I should contact, let me know.

                  Is there a way to capture the live log files to, say, a flash drive or something? I think if we had the logs at the time of “death”, it would be very helpful!

                  B 1 Reply Last reply Oct 10, 2022, 5:22 AM Reply Quote 0
                  • B
                    bingo600 @aaronouthier
                    last edited by Oct 10, 2022, 5:22 AM

                    @aaronouthier
                    According to this :
                    https://www.zimaboard.com/zimaboard/product

                    You have Realtek : 2 x Realtek 8111H - Lan adapters.
                    And you have : 2 x 6Gb SATA Ports.

                    I'd try to :
                    1:
                    Run a serious Memory check - Can be found on a Linux Live BootStick.

                    2:
                    Find/load the Alternate Realtek Driver

                    3:
                    Switch to a SATA Disk.

                    /Bingo

                    If you find my answer useful - Please give the post a 👍 - "thumbs up"

                    pfSense+ 23.05.1 (ZFS)

                    QOTOM-Q355G4 Quad Lan.
                    CPU  : Core i5 5250U, Ram : 8GB Kingston DDR3LV 1600
                    LAN  : 4 x Intel 211, Disk  : 240G SAMSUNG MZ7L3240HCHQ SSD

                    A 1 Reply Last reply Oct 10, 2022, 5:49 AM Reply Quote 0
                    • A
                      aaronouthier @bingo600
                      last edited by Oct 10, 2022, 5:49 AM

                      @bingo600 said in Need some help. Random pfSense crashes.:

                      I'd try to :
                      1:
                      Run a serious Memory check - Can be found on a Linux Live BootStick.

                      2:
                      Find/load the Alternate Realtek Driver

                      3:
                      Switch to a SATA Disk.

                      1. I can do that.

                      2. Never heard of such a thing. To the best of my knowledge, you can’t load/run custom software on a pfSense box.

                      3. I am fresh out of spare SATA disks, and out of money. Also, I don’t need much storage space. Finally, it was quite tricky to wall-mount my ZimaBoard. If I were to install a SATA Drive, it would be hanging/dangling from the bottom of the Zimaboard. I don’t think that would be good. Both of my USB ports are in use, so a flash drive is not possible either.

                      1 Reply Last reply Reply Quote 0
                      • S
                        stephenw10 Netgate Administrator
                        last edited by Oct 10, 2022, 12:18 PM

                        It's likely it isn't actually crashing the first time it fails. The filesystem panics are probably due to resettting it after whatever the initial issue. What you need to do is find out what that is.

                        Is it still responsive at the console when the network fails?

                        Can it still connect out from there?

                        Look at the message buffer and system logs at the console and see what it's showing.

                        If the WAN is link-flapping that will create a lot of logs by itself which is unhelpful.

                        Some Realtek NICs behave badly with the default FreeBSD driver that ships in pfSense. There is an alternative driver you can try by loading it as a kernel module. See for example:
                        https://forum.netgate.com/post/1064399

                        Normally I would not recommend doing that unless you can definitely see the re NIC(s) failing with the default driver. Typically they might throw some watchdog errors and just stop passing traffic.

                        Steve

                        A 1 Reply Last reply Oct 10, 2022, 1:57 PM Reply Quote 0
                        • A
                          aaronouthier @stephenw10
                          last edited by Oct 10, 2022, 1:57 PM

                          @stephenw10
                          Man!

                          Ok. The console is a great idea. I should’ve thought of it!
                          Alas, where the internet comes into the house is a good 30 feet from the nearest hdmi video source. I don’t have any HDMI cords that long. My 2 usb ports are also occupied, so no keyboard access either.

                          I’m not trying to be difficult, I swear.

                          I do have a generic HDMI USB capture card, and a USB Hub. I can connect my laptop to the HDMI out and see what happens.

                          1 Reply Last reply Reply Quote 0
                          • S
                            stephenw10 Netgate Administrator
                            last edited by Oct 10, 2022, 2:44 PM

                            Can you still SSH into it? Or ping it even when this happens?

                            A 1 Reply Last reply Oct 10, 2022, 3:31 PM Reply Quote 0
                            • A
                              aaronouthier @stephenw10
                              last edited by Oct 10, 2022, 3:31 PM

                              @stephenw10
                              I can ping it from the LAN port, yes. Although I tried to ssh into it, I realized later that I had changed the SSH port and had forgotten. As such, I’ll need to wait until the next time the issue surfaces to retest.

                              1 Reply Last reply Reply Quote 0
                              • S
                                stephenw10 Netgate Administrator
                                last edited by Oct 10, 2022, 4:11 PM

                                Yes, if you can still SSH into it that makes it much easier to find out what's happening.

                                1 Reply Last reply Reply Quote 0
                                • A
                                  aaronouthier
                                  last edited by Oct 11, 2022, 5:20 AM

                                  Ok, so. Some progress, I hope.

                                  The first time it happened (earlier today), I connected my laptop, and I couldn’t access the system at all. Couldn’t ping, and couldn’t ssh, nothing.

                                  It happened again just now, at about 21:50 PDT-7. This time, I could ssh in. Alas, I don’t know enough of the inner workings to know for what I should be looking.

                                  Restarting the Web Configurator from the main menu did cause cause the network to come back up, but only for about 3 seconds, and then it was back down again. Selecting the last option (16?) yielded the same result.

                                  I checked the nginx logs, but they were empty. Nginx.log showed the server starting up, but nothing else. Error.log was 0 bytes in size.

                                  I ultimately did a restart with FS check.

                                  Aaaaand, it crashed again after less than 15 minutes! Grrrr!

                                  A 1 Reply Last reply Oct 11, 2022, 5:28 AM Reply Quote 0
                                  • A
                                    aaronouthier @aaronouthier
                                    last edited by Oct 11, 2022, 5:28 AM

                                    Forgot to mention. When I do get a crash report now, the reason mentions a “page fault in kernel mode”, or some such thing.

                                    1 Reply Last reply Reply Quote 0
                                    • S
                                      stephenw10 Netgate Administrator
                                      last edited by Oct 11, 2022, 1:00 PM

                                      That's potentially more useful that the filesystem fault. The console message buffer contents and back trace may show us something if you have that crash report.

                                      The main system log is where I would start looking if you're able to SSH in.

                                      Also try to check what is or isn't working. Can you ping out from the console? To LAN clients? To external hosts? By IP and by FQDN?

                                      Steve

                                      A 1 Reply Last reply Oct 11, 2022, 1:26 PM Reply Quote 0
                                      • A
                                        aaronouthier @stephenw10
                                        last edited by Oct 11, 2022, 1:26 PM

                                        @stephenw10
                                        Most recent crash around 1 AM (01:00 PDT-7). I was asleep. By 6 AM (06:00), I had no access.

                                        Crash report from last night attached.textdump.txt

                                        1 Reply Last reply Reply Quote 0
                                        • S
                                          stephenw10 Netgate Administrator
                                          last edited by Oct 11, 2022, 1:38 PM

                                          Backtrace:

                                          db:0:kdb.enter.default>  bt
                                          Tracing pid 16 tid 100070 td 0xfffff80005951740
                                          kdb_enter() at kdb_enter+0x37/frame 0xfffffe000059c500
                                          vpanic() at vpanic+0x197/frame 0xfffffe000059c550
                                          panic() at panic+0x43/frame 0xfffffe000059c5b0
                                          trap_fatal() at trap_fatal+0x391/frame 0xfffffe000059c610
                                          trap_pfault() at trap_pfault+0x4f/frame 0xfffffe000059c660
                                          trap() at trap+0x286/frame 0xfffffe000059c770
                                          calltrap() at calltrap+0x8/frame 0xfffffe000059c770
                                          --- trap 0xc, rip = 0xffffffff80d6f3f7, rsp = 0xfffffe000059c840, rbp = 0xfffffe000059c8c0 ---
                                          __mtx_lock_sleep() at __mtx_lock_sleep+0xd7/frame 0xfffffe000059c8c0
                                          ieee80211_node_psq_drain() at ieee80211_node_psq_drain+0x108/frame 0xfffffe000059c910
                                          node_cleanup() at node_cleanup+0x65/frame 0xfffffe000059c940
                                          node_free() at node_free+0x25/frame 0xfffffe000059c960
                                          ieee80211_tx_complete() at ieee80211_tx_complete+0x8c/frame 0xfffffe000059c990
                                          rtwn_bulk_tx_callback() at rtwn_bulk_tx_callback+0x78/frame 0xfffffe000059c9d0
                                          usbd_callback_wrapper() at usbd_callback_wrapper+0x7c6/frame 0xfffffe000059ca30
                                          usb_command_wrapper() at usb_command_wrapper+0xb5/frame 0xfffffe000059ca50
                                          usb_callback_proc() at usb_callback_proc+0xc8/frame 0xfffffe000059ca70
                                          usb_process() at usb_process+0x116/frame 0xfffffe000059cab0
                                          fork_exit() at fork_exit+0x7e/frame 0xfffffe000059caf0
                                          fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000059caf0
                                          --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
                                          

                                          Panic:

                                          Fatal trap 12: page fault while in kernel mode
                                          cpuid = 1; apic id = 02
                                          fault virtual address	= 0x410
                                          fault code		= supervisor read data, page not present
                                          instruction pointer	= 0x20:0xffffffff80d6f3f7
                                          stack pointer	        = 0x28:0xfffffe000059c840
                                          frame pointer	        = 0x28:0xfffffe000059c8c0
                                          code segment		= base 0x0, limit 0xfffff, type 0x1b
                                          			= DPL 0, pres 1, long 1, def32 0, gran 1
                                          processor eflags	= interrupt enabled, resume, IOPL = 0
                                          current process		= 16 (usbus0)
                                          trap number		= 12
                                          panic: page fault
                                          cpuid = 1
                                          time = 1665380436
                                          KDB: enter: panic
                                          

                                          That is in the rtwn(4) driver. You have a Realtek USB wifi device attached. Try removing it.
                                          You also have a USB Ethernet device attached. You should remove that too at least until you have proven it is stable with only the onboard NICs.

                                          Steve

                                          1 Reply Last reply Reply Quote 0
                                          20 out of 25
                                          • First post
                                            20/25
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.
                                            This community forum collects and processes your personal information.
                                            consent.not_received