Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Weird Issue: pfSense hangs/freezes silently with no log messages - not hardware?

    Scheduled Pinned Locked Moved Problems Installing or Upgrading pfSense Software
    7 Posts 3 Posters 6.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M
      Mech
      last edited by

      Hi all,

      I've got myself a new box to pfSense-ify after storm Doris destroyed the HDD in my Watchguard: A Celestix Scorpio II RSA SecurID Appliance.  Features include:

      • Very nice 40x2 LCD & Jog Wheel

      • 2x em(4) NICs

      • 2x fxp(4) NICs

      • 2x Serial port

      • VGA port

      • 4GB RAM

      I've run into a problem where pfSense installs fine and runs great, but after what seems like a random length of time between 1 minute and an hour, the LCD displays "OFF" and the box is unresponsive on network or serial, a monitor connected to VGA goes black, yet the fans keep running at the same speed and holding down the power button for 5 seconds works.  Nothing is mentioned in the logs, the entries just stop.

      This can even happen in the boot process before pfSense loads completely, in which case holding down the power button for 5 seconds doesn't work.

      I'm 99.9% sure this isn't a hardware issue, and I'll explain what brings me to that conclusion.  (Or at least the underlying hardware isn't broken, there may be a insidious incompatibility.)

      So I've followed https://doc.pfsense.org/index.php/Boot_Troubleshooting and https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards to the letter.

      The first suggestion was to run a memory test, that did actually find a bad byte in one of the 4 sticks of RAM (Kingston Value RAM unsurprisingly.) so I removed that stick, taking the total down to 3GB (2 sticks in dual channel, one in single.) until I order a new stick.

      I did get one kernel panic (see attached.) but that was fixed after removing the faulty memory module.  Interestingly, the mangled entry that caused it was at the EXACT same inode as another user: https://technologyand.me/2015/08/25/pfsense-boot-loop/

      Still not working - Memory not the cause.

      Next I tried modifying the BIOS to ensure it was set to LBA and not Auto.  There are extensive settings in the BIOS (Award, "04/27/2005-Springdale-G-6A79AWD9C-00") for Power Management, which control global timers for S1 and S3 standby, I wondered if the board was somehow going into standby, but I've tried disabling standby, resetting the timers on HDD, USB and network activity and also allowing wake from USB keyboard - none of which work.

      Still not working - BIOS not the cause. (Although if you know of an updated BIOS I will try installing it, I couldn't find one though.)

      I've tried running a pf live environment off a usb, this seems to last a bit longer before it hangs (30mins as opposed to 10-20mins), but it still dies like the rest.  I don't know if this is due to the "using multiple small partitions" issue or not.

      Still not working - HDD not the cause.

      I've also tried running a Debian live system on the hardware, and even with stress-ng maxing everything out it doesn't hang or crash or anything.  Just in case it needed to be installed on the hardware I also installed Debian to disk and ran another stress test for several hours - no crash.  So the Linux kernel works faultlessly on this hardware.

      Still not working - Hardware not the cause.

      Originally I was restoring the xml config from my old router to this one, but I stopped doing that just in case there was an incompatibility, nope, still crashes.

      I've tried pfSense versions 2.2.3, 2.2.6 and 2.3.4 (The very latest i386 out there).

      There's other steps I tried too, but I've forgotten them.  ::)  (It's been 3 days so far…)

      So I need your help.  Is there a a way to get more verbose logging so I can diagnose the issue more thoroughly, because at the moment I am completely in the dark as to the underlying cause.

      Or perhaps you have experience with something similar?

      Tomorrow I'm going to try to run it on a hypervisor over the same hardware and see if that abstraction layer has any effect.

      Thanks.

      crash.txt

      1 Reply Last reply Reply Quote 0
      • M
        Mech
        last edited by

        Interesting development:

        It doesn't crash in debian (I have it running another stress-ng at 100% CPU and 90% RAM with 4 HDD workers right now.) but it does crash in grub!

        I tried to start debian up (from the HDD) this morning and it got as far as grub and then crashed 1 second after grub displayed on the screen.  It did this consistently 4 times in a row, until I removed the USB mouse and keyboard.  I don't know if that was the cause or just a coincidence.

        Thoughts?

        1 Reply Last reply Reply Quote 0
        • M
          Mech
          last edited by

          At the risk of being lynched, OPNSense has been running on the box for 12 hours now, no crashes…  :-X

          As this router does a lot of business critical things for me, I need a solution or a workaround of any kind, and this will do.

          I am happy to spend some time with someone knowledgeable about pfSense's internal workings trying to figure out the root cause, if you have any suggestions, I will still try them.  I want to support the continued development of pfSense.

          1 Reply Last reply Reply Quote 0
          • chpalmerC
            chpalmer
            last edited by

            Take this for what it is worth but that is actually a pretty old box to be doing the job of "business critical".  ;)

            My guess is going to be a driver error of some kind. Id be curious if pfSense 2.1.5 would run fine for you. But only as a test as that version is not supported and includes all that goes with "not supported".

            Other-  have you tried with powerd disabled?

            Triggering snowflakes one by one..
            Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

            1 Reply Last reply Reply Quote 0
            • M
              Mech
              last edited by

              @chpalmer:

              Take this for what it is worth but that is actually a pretty old box to be doing the job of "business critical".  ;)

              Fair point!  I am cheap.  ::)  To paraphrase Sam Vimes; "there's nothing quite as expensive as being poor."

              To stop powerd it would be as simple as "service powerd stop" yes?  I'll try that.

              Thanks for the tip.

              1 Reply Last reply Reply Quote 0
              • chpalmerC
                chpalmer
                last edited by

                Simpler..

                System/Advanced/Miscellaneous/  Power Savings-  uncheck the box..  :)

                I can't remember if it is checked by default or not..

                Triggering snowflakes one by one..
                Intel(R) Core(TM) i5-4590T CPU @ 2.00GHz on an M400 WG box.

                1 Reply Last reply Reply Quote 0
                • w0wW
                  w0w
                  last edited by

                  I think it could be memory issue, it just fails with power saving modes  and it can be stable under load, when full voltage applied. To check this you need to install memtester package.
                  For 2.4  version it would be:

                  fetch http://pkg.freebsd.org/freebsd:11:x86:64/latest/All/memtester-4.3.0.txz

                  pkg install memtester-4.3.0.txz

                  to run, use
                  memtester (size to test in MB) (loops)
                  memtester 512 10

                  Memtester for other FreeBSD version can be found here — http://portsmon.freebsd.org/portoverview.py?category=sysutils&portname=memtest
                  For examle 2.3 pfSense based on freebsd 10 needs this package http://pkg.freebsd.org/freebsd:10:x86:64/latest/All/memtester-4.3.0.txz

                  If pfSense also would not hang, then it definitely memory powersavings incompatibility issue.

                  1 Reply Last reply Reply Quote 0
                  • First post
                    Last post
                  Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.