[solved?] 2.0.1 boot error "run_interrupt_driven_hooks - waiting for xpt_config"
-
So I was attempting a 1.2.3 -> 2.0.1 auto-upgrade on a pair of HP DL380 G6 (w/CARP) and it completely blew up. After a little sifting through the wreckage I decided it was quicker and easier to just wipe them and do clean installs, whereupon I found that it wasn't the upgrade process that killed them, but rather 2.0.1 simply can't boot properly on these boxes! They both stop at, "run_interrupt_driven_hooks - waiting for xpt_config", (kindly informing me of this every 60sec until 300sec), then just hang completely.
Googling this shows it to be a freeBSD problem from some years ago, long since fixed(?). Work-arounds at the time included disabling FireWire in the BIOS and recompiling the kernel w/out sbp/cam. (Also a reference or two to scsi, usb and ciss drivers, which would certainly be relevant running on a DL380, but no real info on what to do about them.) But this is all really old info so it may all be red herring. Just for the heck of it I did try disabling the various HPET, virtualization support etc in the BIOS since many of the problem reports seemed to be timing/bus related but no joy that route. (Speaking of BIOS, there's a slightly newer firmware DVD on hp.com so I'll apply any updates from there once all 1.2GB :o finish downloading and see if that helps any, but not holding my breath…)
So I'm kind of wondering how best to approach this. I realize this is a freeBSD issue, not pfSense, but the boxes have no FireWire, only USB device is a keyboard and I'm not such a BSD guru that I would be comfortable compiling a new kernel for this kind of a customised run-time environment. I'd be paranoid they'd crash underload or something as soon as I deployed them! :)
Paranoia aside, is it wise to muck about with the loader.conf?
I did find one reference to trying:hw.ciss.force_transport=1
hw.ciss.force_interrupt=1Kinda casting about here. :-\ Anybody else already solved this…? ::)
-
You already tried this?:
http://doc.pfsense.org/index.php/Boot_Troubleshooting#Conflicting_HardwareSteve
-
Don't how I missed that link! :o
Looks like I was on the right track at least, but there's a couple more things for me to try. I'll report back here with whatever I find out.One other note, which I forgot to mention above. These are 64-bit boxes but I'm using the i386 version of 2.0.1 as they are Intel CPUs, not AMD and the 64-bit distro seems to be tuned more for AMD. Would it be appropriate/proper to use the AMD64 version instead? Would that possibly change the boot environment enough to have a positive effect on this issue?
-
Would it be appropriate/proper to use the AMD64 version instead?
It doesn't matter which version you run. Although it's called amd64 that's just because amd64 was the first 64bit architecture to be supported by FreeBSD. There is no reason not to run it on Intel 64bit capable hardware.
There are probably more people, in total, running the 32bit version of pfSense and so more people to find bugs etc. 64bit can handle more memory and may be slightly faster. Your choice.Would that possibly change the boot environment enough to have a positive effect on this issue?
I don't know, try it. ;)
Steve
-
So I think we've got it sorted, but it took all of these steps, (it was an iterative process, took all freakin' day!):
- apply latest firmware (I used the v10.0.0 firmware DVD image)
- disable everything "fancy" CPU-wise in the BIOS, including all VT, hyper-threading, Turbo mode etc etc
- disable all power controls, (or force them high), including Intel's bus power mgmt
- disable all the serial ports (including the virtal ones for iLO, remote console etc)
- disable all the USB ports (Interestingly, I did find reference to an onboard SD card slot under the USB settings. And thank goodness HP still ships servers with PS/2 ports!)
Now you can at least boot pfSense
…do Installation mode, and a Quick install (but wait until the countdown has gotten to 3, or even 2)...
...let it reboot, go through the NIC config, watch it hang right after at the WAN config - doh!Do a hard reset and boot to single user, shell, whatever, and add these lines:
kern.ipc.nmbclusters="131072"
hw.bce.tso_enable=0
hw.pci.enable_msix=0…to /boot/loader.conf.local, as per this article, (note that I misread the article and added them directly to /boot/loader.conf with no ill effects, but in case you make the same mistake see wallabybob's note below):
http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards
(Also should do the UDP stuff in the tunables, esp. if running TinyDNS which is UDP-only).Also edit /conf/config.xml and add promiscuity settings for any internal interfaces, (LAN, CARP etc):
<shellcmd>ifconfig bce0 promisc</shellcmd>
…above the line as per this thread:
http://www.mail-archive.com/support@pfsense.com/msg21826.htmlReboot again and you'll probably be okay.
However, the promiscuity lines in the config.xml do get lost on a semi-random basis depending on what you're doing in the GUI, (e.g. changing DNS servers will do it).These might not all be necessary, I just worked my way through problems as I came to them, then tried to shorten the process a wee bit here. (Could probably be shortened a lot by doing some config editing ahead of time in the ISO.) It's possible some steps are unnecessary, and it's possible I may not be done yet. (I'll edit this post if I come across anything terribly significant).
Note that these are all freeBSD issues, not pfSense!
Hopefully this will be a functional starting point for some otherpoor bastarddedicated sysadmin trying to get things going on HP hardware, (though I culled quite a bit of this from Dell threads as well, so YMMV, IANAL, etc).If anyone has any additional pointers/tips/simplifications, please reply with details…
-
Do a hard reset and boot to single user, shell, whatever, and add these lines:
kern.ipc.nmbclusters="131072"
hw.bce.tso_enable=0
hw.pci.enable_msix=0…to /boot/loader.conf.local, as per this article, (note that I misread the article and added them directly to /boot/loader.conf with no ill effects - and I'm sure as hell not changing them now!):
The boot loader processes /boot/loader.conf then /boot/loader.conf.local. The first file COULD be overwritten by a firmware upgrade, the second shouldn't. Therefore to "future proof" your system you should add those lines to /boot/loader.conf.local. There should be no harm in having duplicate entries.