2.3.1 Hard Lock Up



  • I moved up from 2.2.x to 2.3.1 last weekend and its been running fine for a few days. I had the issue with fragments limit reached and saw that was fixed in 2.3.2. I hadn't upgraded yet and after a couple more days, pfsense looked like it locked up. Nothing worked including the attached keyboard and I had to to a hard restart. However, this causes the HDD to end up dirty and it stopped working (corrupted file system - saw some recovery blocks) so had to reinstall.

    Is there any  procedure to workaround (I guess not as its a hard lock). How would one resolve and investigate the cause on a next reboot?

    There seems to be something wrong in my setup causing a lock as reinstating the old config causes it to lock quite quickly (minutes after a restart). My setup isn't too funky - summary:

    • Wan via PPOE
    • LAN with a single additional VLAN for guest.
    • Ubiquiti AP on LAN.
    • GUEST network on VLAN.
    • DHCP  on LAN and GUEST
    • Unbound DNS
    • 2 VPN clients to AirVPN with gateway groups.
    • OpenVPN server
    • Squid and SquidGuard - with transparent proxy - no MITM SSL
    • PfBlockNG
    • SquidLite reports
    • NUT package for UPS

    Any help appreciated.



  • Downloaded 2.3.2 and used that as a fresh install and uploaded backup config.xml.

    This still locked up after a reinstall of packages - it happens sometime after the successful reinstall of all packages. I cold restarted the box and it comes up successfully but shortly after it locks up again.

    I tried disabling pfBlockerNG as the start of a process of elimination but after a reboot with this, it again locks up hard.



  • nothing useful in logs? no crashdump after reboot ?



  • You have to disable pfBlockerNG and DNSBL before restoring a config
    Unbound will not start until you remove the "server:include: /var/unbound/pfb_dnsbl.conf" from Services/DNS Resolver/General Settings/Custom Options

    or change the config file
    <unbound><custom_options>c2VydmVyOmluY2x1ZGU6IC92YXIvdW5ib3VuZC9wZmJfZG5zYmwuY29uZg==</custom_options>

    to</unbound>



  • No crash dumps on restart - just dirty filesystem.

    I didn't get a chance to check syslog as it locks up soon after.

    Thanks for the Unbound - thats exactly what I did (see other thread in same forum). The gets going - it doesn't seem to be PfBlockerNG as next reboot it looks up. Hopefully have time for a trial with a process of elimination.

    I tried to mount the installation USB in linux so I can copy a version of the backup XML but linux cannot read the filesystem. Any ideas how to mount the installation USB? This will speed up the reinstall and restart times.



  • Looks like hardware issue, for example faulty RAM or whatever else.



  • Hard locks typically mean faulty hardware
    Black screen then reboot typically means overheat or faulty hardware
    Kernel fault typically means corrupt data, faulty hardware, or buggy drivers/kernel(very much less so kernel)



  • Found the issue now. Its not hardware.

    Setup:
    PfSense with a LAN and a WAN.
    LAN plugged into a Ubiquiti Switch. On the switch I have a couple of dumb switches to the rest of the house, and one Ubiquiti AP Pro.

    The Ubiquiti has a default Network configured for my internal network with address of 192.168.1.0/24 (it looks like you have to set up a 'corporate' network). I then setup an additional network VLAN with 192.168.10.0/24 as a GUEST network with VLAN tag 90.

    Switch is configured to allow LAN to the dumb switches and ALL for the AP and PfSense. PfSense has the LAN to my 192.168.1.0/24 and a VLAN to match the Ubiquiti setup. The AP has two SSID to match the LAN and the VLAN.

    Now whenever I try and attach a client to the GUEST network the PfSense box locks up  - I have a keyboard plugged in and it is unresponsive. I cannot ping the box from the cable network either.

    Any ideas why a GUEST VLAN fails and locks up? There is nothing I can see in the logs just before a hard restart. Nothing in /var/crash apart from a file called minfree with the value 2048.

    Attached is a quick hand sketch of the network topology. This is the first time I have used VLANs so not sure if I am doing anything wrong,




  • The subnets must not overlap (or be the same):
    LAN 192.168.1.0/24 - the system should not have let you use the bottom IP address of the subnet (.0) - the LAN interface IP should be in the range 1 to 254

    VLAN 192.168.1.10/24 - the subnet is 192.168.1.0 through 255 - the same as 192.168.1.0/24 - you must not do that.

    Certainly the routing will get confused. But the box itself should not "lock up" - i.e. the menu should still work from the console.



  • My Bad - in my haste in sketching I wrote the wrong thing (corrected image now uploaded).

    LAN: 192.168.1.0/24
    VLAN: 192.168.10.0/24



  • This:
    @ak:

    Found the issue now. Its not hardware.

    isn't proved by this :
    @ak:

    Setup:
    PfSense with a LAN and a WAN.
    LAN plugged into a Ubiquiti Switch. On the switch I have a couple of dumb switches to the rest of the house, and one Ubiquiti AP Pro.

    The Ubiquiti has a default Network configured for my internal network with address of 192.168.1.0/24 (it looks like you have to set up a 'corporate' network). I then setup an additional network VLAN with 192.168.10.0/24 as a GUEST network with VLAN tag 90.

    Switch is configured to allow LAN to the dumb switches and ALL for the AP and PfSense. PfSense has the LAN to my 192.168.1.0/24 and a VLAN to match the Ubiquiti setup. The AP has two SSID to match the LAN and the VLAN.

    Now whenever I try and attach a client to the GUEST network the PfSense box locks up  - I have a keyboard plugged in and it is unresponsive. I cannot ping the box from the cable network either.

    Any ideas why a GUEST VLAN fails and locks up? There is nothing I can see in the logs just before a hard restart. Nothing in /var/crash apart from a file called minfree with the value 2048.

    Attached is a quick hand sketch of the network topology. This is the first time I have used VLANs so not sure if I am doing anything wrong,

    Even a messy setup can't not 'dirty' your hard drive.
    Drives get dirty when sectors a badly written - or when important file structures are filled up with non-sense.

    I tend to say : you DO HAVE hardware problems.



  • All I got to go on are the observations.

    HDD is dirty because when it hard locks, I get no response on the network or the plugged in keyboard. Forcing a hard reboot to get going. On startup, the screen shows that the HDD was not safely unmounted and so 'dirty' - it then runs a disk check and mentions how many inodes its recovered/lost/or marked. (can't remember the terminology).

    The cause being the VLAN is due to again my cause and effect observations. As soon as I try and connect a wireless client to the GUEST network (that has a VLAN configured), the machine locks up. To be honest I cannot guarantee this as I only have had time to attempt this twice. Will need to try a couple more times to prove this.

    I have been running successfully for the pass 24 hours and it has been fine - however, this is without the GUEST network available on the AP (I have turned this on as we need internet connectivity.



  • Why just not let memtest run over night, just to be sure?
    http://www.memtest86.com/
    VLAN connection may be placed in bad bit memory address and this causes system to lock up immediately or soon after.
    Many others have VLAN configured without any issue.


Log in to reply