Diagnosing a "Dead" box
-
I have an APU2 unit that appears to have died but I'm not sure what happened. I know it was working until about 10:00am the other day and then it just stopped passing traffic. I've tried getting into the gui but it doesn't respond. Sniffing with wireshark shows no packets. Using a console cable didn't give me anything until a reboot.
On reboot it boots pfSense and everything seem to load fine until it gets to the line "Bootup Complete" where it just hangs. Normally when it gets to that point then it's a 9600vs115200 speed issue but that isn't the case. This unit is 2.4.4-p3 and the bootup shows "/boot/config: -S115200 -h". I reinstalled from a config backup and it lets me in on 115200. On bootup it checks the filesystem each time and says it is clean. Booting to Single User Mode has similar results in that it appears to boot normal until "Bootup Complete" where it just hangs.
We had a second client that appears to have had something similar happen at about the same time. They are running but you can't get into the GUI or SSH. Console doesn't respond. I'm afraid if I reboot then it'll just do the same thing as the unit from the first client. There is no connection between the 2 devices. Different companies. Different industries. Different cities. Other than the same ISP there's no connection between the two. I've built up another unit to replace the second unit if need be but I'd like to learn how to troubleshoot the issue. Any advice would be great. Thanks.
-
If it gets to 'Bootup complete' that's beyond where it would fail because of an baud rate mismatch, which would be after the bootloader. That usually indicates the serial console is not enabled. You see everything but no console menu. Check the config file. Make sure the <system> section cointains:
<enableserial></enableserial>
What image was used to install to this APU?
Steve
-
FWIW, I had a problem last year where the performance dropped. I rebooted and it wouldn't come up. It would partially load, but I couldn't get anything to work. I wound up replacing the computer with the one in my sig. Incidentally, I had previously used the old computer for a Linux firewall and I couldn't even install Linux on it again, so that was a definite clue the hardware had failed.
-
@stephenw10 It was made from the usb installer something like 3 years ago. It's worked fine up until this point.
How do I get to the config file? I guess I could pull the drive out and plug it into the USB port of another unit. I have an internal mSata adapter that I can externally power and plug it into the SATA port. I'm a bit rusty on my mount commands, though. I'll give it a shot.
-
@jknott The unit seems fine. Could be the drive, though. Usually I can get in to run fsck but that doesn't seem to be an option so far.
-
You said you installed a config backup in the first post, I assumed you had access to backups to check?
-
@stephenw10 Oh, yes it's enabled in the old configs. I got the old drive mounted
mkdir /mnt/drive
mount /dev/ada1s1 /mnt/driveLooks like it ran out of space. That old Suricata log bug where it doesn't rotate and keeps writing. I thought I had fixed all of those. Once that happens then the config file gets corrupted with whole sections missing. That's why it seems to boot but doesn't.
I'll be looking at the second unit today. Maybe the same issue? Guess we'll see.
-
Hmm, interesting. If the config is corrupted it should try to load the last good config.
-
@stephenw10 Back when we got bit by this bug a few times it didn't ever seem to load a good config. However, if you look at the configs there are generally several that are over 100KB+ (can't remember exactly) but the last few are just a few KB in size. Perhaps it does load the last config but it is also corrupted but it can't really do anything about it since there is no disk space to swap through. That's just been my assumption.
As for the second unit, everything was fine after a reboot. Disk space is fine, no errors in the logs. Just couldn't access the box via gui or ssh or console. Strange. It was 2.4.5-p1 so I've upgraded it to 2.5.2 and we'll see how it goes.