2.3.1 Hard Lock Up
-
I moved up from 2.2.x to 2.3.1 last weekend and its been running fine for a few days. I had the issue with fragments limit reached and saw that was fixed in 2.3.2. I hadn't upgraded yet and after a couple more days, pfsense looked like it locked up. Nothing worked including the attached keyboard and I had to to a hard restart. However, this causes the HDD to end up dirty and it stopped working (corrupted file system - saw some recovery blocks) so had to reinstall.
Is there any procedure to workaround (I guess not as its a hard lock). How would one resolve and investigate the cause on a next reboot?
There seems to be something wrong in my setup causing a lock as reinstating the old config causes it to lock quite quickly (minutes after a restart). My setup isn't too funky - summary:
- Wan via PPOE
- LAN with a single additional VLAN for guest.
- Ubiquiti AP on LAN.
- GUEST network on VLAN.
- DHCP on LAN and GUEST
- Unbound DNS
- 2 VPN clients to AirVPN with gateway groups.
- OpenVPN server
- Squid and SquidGuard - with transparent proxy - no MITM SSL
- PfBlockNG
- SquidLite reports
- NUT package for UPS
Any help appreciated.
-
Downloaded 2.3.2 and used that as a fresh install and uploaded backup config.xml.
This still locked up after a reinstall of packages - it happens sometime after the successful reinstall of all packages. I cold restarted the box and it comes up successfully but shortly after it locks up again.
I tried disabling pfBlockerNG as the start of a process of elimination but after a reboot with this, it again locks up hard.
-
nothing useful in logs? no crashdump after reboot ?
-
You have to disable pfBlockerNG and DNSBL before restoring a config
Unbound will not start until you remove the "server:include: /var/unbound/pfb_dnsbl.conf" from Services/DNS Resolver/General Settings/Custom Optionsor change the config file
<unbound><custom_options>c2VydmVyOmluY2x1ZGU6IC92YXIvdW5ib3VuZC9wZmJfZG5zYmwuY29uZg==</custom_options>to</unbound>
-
No crash dumps on restart - just dirty filesystem.
I didn't get a chance to check syslog as it locks up soon after.
Thanks for the Unbound - thats exactly what I did (see other thread in same forum). The gets going - it doesn't seem to be PfBlockerNG as next reboot it looks up. Hopefully have time for a trial with a process of elimination.
I tried to mount the installation USB in linux so I can copy a version of the backup XML but linux cannot read the filesystem. Any ideas how to mount the installation USB? This will speed up the reinstall and restart times.
-
Looks like hardware issue, for example faulty RAM or whatever else.
-
Hard locks typically mean faulty hardware
Black screen then reboot typically means overheat or faulty hardware
Kernel fault typically means corrupt data, faulty hardware, or buggy drivers/kernel(very much less so kernel) -
Found the issue now. Its not hardware.
Setup:
PfSense with a LAN and a WAN.
LAN plugged into a Ubiquiti Switch. On the switch I have a couple of dumb switches to the rest of the house, and one Ubiquiti AP Pro.The Ubiquiti has a default Network configured for my internal network with address of 192.168.1.0/24 (it looks like you have to set up a 'corporate' network). I then setup an additional network VLAN with 192.168.10.0/24 as a GUEST network with VLAN tag 90.
Switch is configured to allow LAN to the dumb switches and ALL for the AP and PfSense. PfSense has the LAN to my 192.168.1.0/24 and a VLAN to match the Ubiquiti setup. The AP has two SSID to match the LAN and the VLAN.
Now whenever I try and attach a client to the GUEST network the PfSense box locks up - I have a keyboard plugged in and it is unresponsive. I cannot ping the box from the cable network either.
Any ideas why a GUEST VLAN fails and locks up? There is nothing I can see in the logs just before a hard restart. Nothing in /var/crash apart from a file called minfree with the value 2048.
Attached is a quick hand sketch of the network topology. This is the first time I have used VLANs so not sure if I am doing anything wrong,
-
The subnets must not overlap (or be the same):
LAN 192.168.1.0/24 - the system should not have let you use the bottom IP address of the subnet (.0) - the LAN interface IP should be in the range 1 to 254VLAN 192.168.1.10/24 - the subnet is 192.168.1.0 through 255 - the same as 192.168.1.0/24 - you must not do that.
Certainly the routing will get confused. But the box itself should not "lock up" - i.e. the menu should still work from the console.
-
My Bad - in my haste in sketching I wrote the wrong thing (corrected image now uploaded).
LAN: 192.168.1.0/24
VLAN: 192.168.10.0/24 -
This:
@ak:Found the issue now. Its not hardware.
isn't proved by this :
@ak:Setup:
PfSense with a LAN and a WAN.
LAN plugged into a Ubiquiti Switch. On the switch I have a couple of dumb switches to the rest of the house, and one Ubiquiti AP Pro.The Ubiquiti has a default Network configured for my internal network with address of 192.168.1.0/24 (it looks like you have to set up a 'corporate' network). I then setup an additional network VLAN with 192.168.10.0/24 as a GUEST network with VLAN tag 90.
Switch is configured to allow LAN to the dumb switches and ALL for the AP and PfSense. PfSense has the LAN to my 192.168.1.0/24 and a VLAN to match the Ubiquiti setup. The AP has two SSID to match the LAN and the VLAN.
Now whenever I try and attach a client to the GUEST network the PfSense box locks up - I have a keyboard plugged in and it is unresponsive. I cannot ping the box from the cable network either.
Any ideas why a GUEST VLAN fails and locks up? There is nothing I can see in the logs just before a hard restart. Nothing in /var/crash apart from a file called minfree with the value 2048.
Attached is a quick hand sketch of the network topology. This is the first time I have used VLANs so not sure if I am doing anything wrong,
Even a messy setup can't not 'dirty' your hard drive.
Drives get dirty when sectors a badly written - or when important file structures are filled up with non-sense.I tend to say : you DO HAVE hardware problems.
-
All I got to go on are the observations.
HDD is dirty because when it hard locks, I get no response on the network or the plugged in keyboard. Forcing a hard reboot to get going. On startup, the screen shows that the HDD was not safely unmounted and so 'dirty' - it then runs a disk check and mentions how many inodes its recovered/lost/or marked. (can't remember the terminology).
The cause being the VLAN is due to again my cause and effect observations. As soon as I try and connect a wireless client to the GUEST network (that has a VLAN configured), the machine locks up. To be honest I cannot guarantee this as I only have had time to attempt this twice. Will need to try a couple more times to prove this.
I have been running successfully for the pass 24 hours and it has been fine - however, this is without the GUEST network available on the AP (I have turned this on as we need internet connectivity.
-
Why just not let memtest run over night, just to be sure?
http://www.memtest86.com/
VLAN connection may be placed in bad bit memory address and this causes system to lock up immediately or soon after.
Many others have VLAN configured without any issue.