System freezing?



  • Hi guys,

    Let me first wish you all a VERY good new year with a bit of all good things.

    We are using pfsense for almost 1 year now and it had been running very well and without any problems so that I almost forgot we had it.
    The system is installed on an older dell desktop p4/1.8GHz/512Mb  2 nic's, 1 connected to a DSL modem.
    The modem is configured for bridge modus and the PPPoE configuration is set to reconnect every night to avoid the reconnect of our provider at inconvenient moments.
    The system is up to date as well as the packages that are installed.

    Shortly, internet traffic to the clients was not getting through anymore quite randomly.
    Yesterday I replaced the modem since it had a problem with attaining wan bandwidth.

    When this problem occurs:

    • What I noticed now is that in the console, there is no reaction to any input from the keyboard and short pressing the power button, doesn't issue a shutdown command, at the moment there is no internet traffic coming through.
    • Pinging from the LAN side also doesn't give a reply

    Because at first I thought this was PPPoE problem, I was looking at that logfile but couldn't find anything suspicious at all in any of the others either but maybe I didn't look for the right thing/place.

    Any help/advice on approaching this problem is highly appreciated!
    TIA
    Peter


  • Netgate Administrator

    In a system of that age with those symptoms I would suspect hardware. Possibly you shifted something slightly when you disconnected the modem.
    Check the capacitors on the board. Test the RAM, though it would normally panic with bad ram. Make sure all the components are seated properly.

    Steve



  • Hi Steve, thank you for taking the time to read & reply!

    Because the modem is an external box, connected to 1 of the NIC's no hardware has been touched.
    Also, the problem started appearing before switching the modem and since I didn't know how to approach this, I checked the modem first and together with the provider, we saw that there was an issue with it. Therefore it has been replaced.
    Of course there still can be a hardware problem with the pc/hardware itself and I was hoping to get some more experienced information on troubleshooting this, based on logs/behaviour/…

    Do you think it can be ruled out that it's a software issue?

    Peter


  • Netgate Administrator

    How frozen is it? Does the numlock/capslock button still work on the keyboard?
    It may just be a rogue process using all the CPU cycles causing it to be unresponsive.
    Try running 'top -SH' at the console and then, while it running, triggering the 'freeze'. You should see if a process goes rogue.

    Steve



  • When I wrote "no response to any input from keyboard" I referred to numlock/capslock.
    Isn't the fact that the power down command from the power switch at that time is not working also a sign that the software is not responding? I checked this and normally, the system shuts down nicely with the power switch in normal circumstances.

    Right now, I'm not in the office but running the top -SH command shows that the system is idling most of the time at +/- 98% (just my vpn traffic, nobody present currently)
    Is there a way to keep track of this result to be able to look at afterwards or any other way?

    Peter


  • Netgate Administrator

    Numlock LED not responding is an indication that nothing is working. The power button not responding is an indication that nothing is watching it but that could just be there's not enough cycles free (an extreme case).
    That type of hard lockup is almost always a hardware fault. Is it overheating? Unfortunately bad capacitors are pretty common on that age of board.

    If it's locking up hard like that it almost certainly won't be able to log anything useful.  :(

    Steve



  • I have almost exactly the same issue.
    Been running pfSense fine for 2+ years without any issues.

    Earlier today the (old) Dell desktop that pfSense was installed on suddenly became totally unresponsive and all network traffic stops (ie can't access the webgui).
    Noticed that the desktop was showing a YELLOW power light, wouldn't shut down when power was pressed, and had to be turned off by pressing and holding power.
    pfSense then reboots without any issues and works fine for 1-2 hours before repeating the problem.
    WTF? This worked fine for 2+ years and the fact that it boots without issues makes me doubt that it is a hardware problem.



  • Steve, you were right: it was hardware related!  ;)