PfSense 2.4.2 (and 2.4.3) crash at bootloader
I've spent all day on the forums trying things - I have probably missed something but hoping someone else might have experienced my problem.
So I'm currently running a Watchguard Firebox x750e (upgraded with industrial CF, max RAM and upgraded CPU). With the LCD and fan mods I've learnt from here it's working flawlessly.
Alas it's becoming EOL so I've been looking to replace it.
After much research on the best value for money solution I've bough a Dell R210-II with a Dell (Intel) i350-T4 4 port gigabit card.
The I350-T4 is a used server pull and I've checked screenshots to confirm it looks legit,
First thing I did was update all the firmware I could - so BIOS 2.9.0 with the latest microcode, BCM 1.95 for quiet fans etc
I removed everything I didn't feel I needed - ODD, iDRAC cards.
After installing 2.4.2 (amd64) on the internal HDD (configured as AHCI) I proceeded to configure it offline as per my watchguard - using all 4 interfaces on the I350.
However the troubles came when I went to swap the two units over - with more than 1 cable plugged into the I350 the system refused to boot past the BSD bootloader and instead just rebooted at that point or hung. If I unplugged the cables it appeared to boot OK.
Several checks of the sockets/reseating/cleaning with isopropynol etc occured but the problem didn't go away.
So I poured through the forums and tried several things - notably ending up with
- turning most interfaces off in the BIOS including the two built in adaptors
- setting the IRQs to "DEFAULT"
- setting the following boot/loader.conf.local to the following:
cat /boot/loader.conf.local kern.cam.boot_delay="10000" kern.ipc.nmbclusters="1000000" hw.ix.num_queues="1" hw.igb.num_queues="1" hw.igb.rxd=4096 hw.igb.txd=4096
Now after these steps in the BIOS and bootloader config, I can boot the unit from cold with all cables in and it will boot successfully.
However if I perform a reboot it will again crash at the BSD bootloader. The crash manifests itself as either:
- halting at the spinner
- rebooting at the spinner
- spamming the screen with reg dumps and BTX HALTED.
The combo I have seems a reasonably popular setup so I'll be interested if anyone can shed any light.
Thanks in advance
EDIT: Turns out today it doesn't boot from cold or from reboot with the cables in, but works if the cables are unplugged at boot and then plugged in.
So must have had some lucky timings yesterday.
I've also tried setting the kern.vty="sc" and "vt" and no change to behaviour.
More interesting is with the cables in, if I get to the bootloader and press <esc>to get into the menu, the keyboard stops working after that. Can't toggle caps lock or num-lock even.
I can get maybe 1 more keypress to get to a submenu in if I'm very very quick.
Also occasional graphics glitch occurs.
RAM is tested and passed.
As noted above, BIOS 2.9.0 is the latest version with the new microcode (spectre/meltdown).</esc>
If anyone has any Dell R210II with a I350-T4 card and can share BIOS/I350 settings that would be appreciated, especially with their version numbers.
If the unit is rebooted/powered up with the cables removed I can then plug in the cables and it runs fine.
I've also tried the latest 2.4.3 as a fresh install and still the same issues.
This is rather disappointing as I was hoping this would be an easier setup than the old watchguard but as it stands I've no upgrade path.
Is there a better section where I could get some answers on this, perhaps the hardware specific or installation section?
If so could the mods move it please?
Not sure who's reading this so for my own posterity….
Dell Server BIOS PowerEdge R210 II Version 2.8.0 PowerEdge R210 II BIOS Version 2.8.0 Fixes & Enhancements Enhancement: Updated Intel Xeon Processor E3-1200 V2 Series E1 stepping microcode (Patch ID=0x1B).
Dell Server BIOS PowerEdge R210 II Version 2.9.0 PowerEdge R210 II BIOS Version 2.9.0 Fixes & Enhancements Fixes - None Enhancements - Updated the Intel Xeon Processor Microcode to address CVE-2017-5715 (http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=2017-5715). - Updated Intel Xeon Processor E3-1200 V2 Series E1 stepping microcode (Patch ID=0x1F). - Updated Intel Xeon Processor E3-1200 Series D2 stepping microcode (Patch ID=0x2D). - CVE-2017-5753 and CVE-2017-5754 are addressed by Operating System & Hypervisor updates. For more information, visit http://www.dell.com/support/article/SLN308588.
I downgraded the BIOS from 2.9.0 to 2.8.0 but still the same issue (thought it was fixed but realised I hadn't moved all 4 cables over).
So it isn't the spectre/meltdown BIOS changes…
Since my last post I've tried:
- several cables
- several BIOS versions
- several combinations of BIOS settings (especially around the interrupts/serial console)
- almost all of the things on the boot problems page
- two hard disks
- 2 different versions of I350 firmware
- many combinations of loader.conf.local settings including disabling beastie_mode
- blanking out the SMBUS pins on the I350.
All my experiments still end up with "crash at the pfsense boot menu when there is traffic on the I350 during (or before) loading the kernel."
The network configuration boots fine with a Watchguard X750e (but that is stuck on nano-bsd)
Is there really nobody that can help?
I did ask the mods several weeks back if they could move this thread to somewhere more appropriate - was there or did they not look?
Is there any way to force the network ports to be disabled until the kernel boots up?