Occasional boot hang at late stage



  • Booting pfSense stops at the position indicated by the arrow:

    Feb 17 20:13:54 kernel: ppc0: parallel port not found.
    Feb 17 20:13:54 kernel: sc0: <system console="">at flags 0x100 on isa0
    Feb 17 20:13:54 kernel: sc0: VGA <16 virtual consoles, flags=0x300>
    Feb 17 20:13:54 kernel: sio1 at port 0x2f8-0x2ff irq 3 on isa0
    Feb 17 20:13:54 kernel: sio1: type 16550A
    Feb 17 20:13:54 kernel: sio1: [FILTER]
    Feb 17 20:13:54 kernel: vga0: <generic isa="" vga="">at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
    Feb 17 20:13:54 kernel: ukbd0: <hp 0="" 2="" virtual="" keyboard,="" class="" 0,="" rev="" 1.10="" 0.02,="" addr="">on uhub5
    Feb 17 20:13:54 kernel: kbd2 at ukbd0
    Feb 17 20:13:54 kernel: ums0: <hp 0="" 2="" virtual="" keyboard,="" class="" 0,="" rev="" 1.10="" 0.02,="" addr="">on uhub5
    Feb 17 20:13:54 kernel: ums0: 3 buttons.
    Feb 17 20:13:54 kernel: uhub6: <hp 3="" 9="" virtual="" hub,="" class="" 0,="" rev="" 1.10="" 0.01,="" addr="">on uhub5
    Feb 17 20:13:54 kernel: uhub6: 7 ports with 7 removable, self powered
    Feb 17 20:13:54 kernel: Timecounters tick every 1.000 msec
    Feb 17 20:13:54 kernel: Fast IPsec: Initialized Security Association Processing.
    Feb 17 20:13:54 kernel: hptrr: no controller detected.
    Feb 17 20:13:54 kernel: da0 atSMP: AP CPU #1 Launched!
    Feb 17 20:13:54 kernel: ciss0 bus 0 target 0 lun 0
    Feb 17 20:13:54 kernel: da0: <compaq 1="" raid="" volume="" ok="">Fixed Direct Access SCSI-5 device
    Feb 17 20:13:54 kernel: da0: 135.168MB/s transfers
    Feb 17 20:13:54 kernel: da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C)
    –--------------------------------------------->>>
    Feb 17 20:13:54 kernel: Trying to mount root from ufs:/dev/da0s1a

    At the position of the arrow the boot process hangs infinitely. I waited at least 1 and a half hour without noticing any progress. The last line “Trying to mount…” is just for information, what would normally come next, but it is actually not displayed anymore.

    There is a somehow strange way to get the machine booting: wait for 20 minutes, then power cycle the machine and it will boot without any issues. pfSense will run then flawlessly for any time until a manual reboot.

    Observation #1: I can power cycle the machine waiting less than 20 minutes and the boot process will hang at the same point. 20 or more minutes after the first (!) boot attempt it will boot fine after a power cycle without problems. It does not make a difference how many (power cycled) boot attempts I perform within these 20 minutes. To save the hard disks I typically wait these 20 minutes…

    Observation #2: Within these 20 minutes the hard disk lights indicate heavy activity. It looks like a RAID consistency check, but this is just guessed. The flashing of the LEDs starts some seconds before the boot process hangs. It seems to be triggered by pfSense, not by the RAID-Controller BIOS or so. After the hard disks LEDs stop flashing a power cycle leads to a perfect boot. What speaks against the consistency check theory is that the hard disk lights flash similarly during a working boot process at this step of the process.

    Observation #3: Not at every reboot this issue occurs. Sometimes the machine reboots without any issues, sometimes not. I was not able to find any rule (making the above observations was time consuming already…), except the following. After a reinstall of pfSense it works for the first reboot after configuration upload. In case of a second reboot from the web GUI after about 10 minutes the issue occurred always.

    Observation #4: The server is not completely dead. In my last attempt to power cycle I did not press the power button long enough. Then a message was displayed that the machine was not ready for this command. Obviously pfSense interpreted the short power button press as a shutdown command – which could then not be executed.

    Hardware: HP server model DL 320/G5p, less than a year old. I have replaced the original 2 x 250 GB SATA drives steered by an onboard RAID controller, because this is not on the hardware compatibility list. Actually the drives could not be run in RAID-1, no redundancy was possible. So I decided to buy hardware being on the HCL. The new hardware is a HP P400 SAS controller and 2 x 146 GB SAS drives running perfectly as RAID-1 – in case they boot… The issue occurred immediately after the installation of the new hardware. Before that I never had any trouble.

    Software: pfSense version 1.2.2

    Googling and many experimental attempts from my side did not show any results than the above. Any help is highly appreciated. Anyone with an idea?

    Thanks to the pfSense team and the community for this great software! Despite the issue above pfSense is still the best option for a firewall that I have seen so far and it is a pleasure to use! Thanks!

    Christian</compaq></hp></hp></hp></generic></system>



  • Sounds like either flaky hardware or a FreeBSD bug. 1.2.3 is based on a newer FreeBSD version, it might not suffer from that issue if it is a FreeBSD bug. That really sounds like flaky hardware though, given your description.



  • Thanks for the quick response and the info.

    I installed version 1.2.3 before the office filled up with people. It is running now for 2 hours without any issue. The boot process was normal.

    I also suspected some hardware hassle. Typically I would assume timing or temperature being involved in such a case. I just do not have any explanation why this happens at the same point during the boot process and never during normal operation.

    When the office gets empty I will perform more boot tests to see whether the issue is solved with the new FreeBSD or not.

    Christian


Locked