pfSense CE 2.8.0 upgrade stalls after reboot and gets stuck when loading
-
I've been using pfSense for about ten years and have never had an upgrade issue until today. My pfSense CE 2.8.0 upgrade stalls after reboot and gets stuck when loading.- Hardware: SuperMicro X12SDV-4C-SPT4F with latest firmware, Intel Xeon Processor D-1718T
- Using the 10GB (ix1) copper connections for both WAN and LAN
- Upgrading from 2.7.2-RELEASE with full system patches installed
- Removed all packages except for System_Patches
- Backed up configuration (as always)
- pfSense-CE-2.7.2-RELEASE-amd64.iso on hand just in case (thank god)
The upgrade via the web interface looked normal, no issues detected. After the reboot, pfSense CE 2.8.0 loads and starts to initialize the hardware, but then gets stuck at some point and won't continue. Resetting the system brings it back to the same place. See the screenshot of the console.
I had to revert back to 2.7.2 to get back up and running.
-
I have to wonder if somehow the ASPEED AST2600 BMC is somehow causing this issue. Looks like the OS is waiting on it before it will mount the disk.
-
Found this in the Troubleshooting guide.
https://docs.netgate.com/pfsense/en/latest/troubleshooting/boot-issues.html#troubleshooting-boot-console
"Vendor-Specific Issues: Certain Dell Blade servers may hang at boot if the system’s virtual USB media is enabled. Disable the virtual media in the BIOS and then it should boot normally."
-
@InstanceExtension what did you disable in the BIOS on the Supermicro board?
Is it enough to disable it in the IPMI webinterface? -
@slu I have not disabled anything yet, was just using the Troubleshooting guide entry as a reference to a possible solution. That said, why would this be fine in 2.7.2 and not 2.8? Seems like it needs to be addressed at the OS or installer level.
-
It's not clear that the upgrade actually stalled. Something about the fact that your board apparently has "Dual Console" enabled may be it at-play.
@stephenw10 assisted in a similar post from the past 24-hours which I cannot find to be able to link to at the moment.
EDIT: The similar post.
-
@tinfoilmatt This system has always used the Video console and as you can see in my screenshot that is set as the primary console. Plus, the web interface never comes up, so yes it is stuck at that point in the boot process.
-
Mmm it looks mostly like what you might see when not looking at the primary console. But, as you say, it's set to video as primary and it would normally show interfaces linking etc after that point. So it may actually be hanging there.
Hoe does that compare with the 2.7.2 boot? Most likely issue is some driver regression or some new driver that supports something in that box that was previously ignored.
Is it entirely unresponsive at that point? To ctl+t for example?
-
@stephenw10 Thanks for the reply. The system is completely non-responsive at that point. Hard reset (via IPMI console or Reset switch) is the only thing that gets it back.
I don't see any of this behavior in 2.7.2, the boot process is just quick and clean.
-
Hmm can you enable a serial console and check that?
I would definitely compare the boot log with 2.7.2 if you can though. I'd bet there is some new or changed driver at play here.
-
@stephenw10 This is what occurs in 2.7.2 after the Dual Console log message. Taken from the System logs so bottom up is the order.
May 29 12:50:11 kernel coretemp0: <CPU On-Die Thermal Sensors> numa-domain 0 on cpu0
May 29 12:50:11 kernel aesni0: <AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256>
May 29 12:50:11 kernel ue0: Ethernet address: be:3a:f2:b6:05:9f
May 29 12:50:11 kernel ue0: <USB Ethernet> on urndis0
May 29 12:50:11 kernel urndis0: <RNDIS Communications Control> on usbus0
May 29 12:50:11 kernel urndis0 numa-domain 0 on uhub1
May 29 12:50:11 kernel TSC: P-state invariant, performance statistics
May 29 12:50:11 kernel VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID,VID,PostIntr
May 29 12:50:11 kernel AMD Extended Feature......CPU features that won't post here.......
May 29 12:50:11 kernel Origin="GenuineIntel" Id=0x606c1 Family=0x6 Model=0x6c Stepping=1
May 29 12:50:11 kernel CPU: Intel(R) Xeon(R) D-1718T CPU @ 2.60GHz (2600.00-MHz K8-class CPU)
May 29 12:50:11 kernel g_vfs_done():da0p1[READ(offset=65536, length=8192)]error = 5
May 29 12:50:11 kernel Dual Console: Video Primary, Serial Secondary -
Hmm. Well seeing ue0 is always a concern but shouldn't hang entirely.
Nothing before that in the boot is different?
-
@stephenw10 I'll need to get some free time to get details on the boot process for 2.8.
It means I'll be down for some time when attempting the upgrade again. I don't have a redundant system to test on. It may not be until this weekend.First thing I will try is to turn off the "IPMI Host Interface" and let that fall back to KCS for communication between host and BMC.
-
@InstanceExtension said in pfSense CE 2.8.0 upgrade stalls after reboot and gets stuck when loading:
May 29 12:50:11 kernel g_vfs_done():da0p1[READ(offset=65536, length=8192)]error = 5
Interesting, especially since that's about where the upgrade hangs...
Doing some reading and shot in the dark—did/do you have a USB drive plugged-in while trying to perform the upgrade?
EDIT: Is
da0
the same drive asnda0
(the Samsung 1 TB SSD)?DOUBLE EDIT: Answer my own question—no. Different drivers.
So then the update has a problem about where a read error is detected on device
da0p1
during successful 2.7.2 boot—which may or may not be relevant to the 2.8 upgrade hang.I think I agree that either this...
@InstanceExtension said in pfSense CE 2.8.0 upgrade stalls after reboot and gets stuck when loading:
I have to wonder if somehow the ASPEED AST2600 BMC is somehow causing this issue. Looks like the OS is waiting on it before it will mount the disk.
...or some other driver/device (removable or not) is causing the upgrade hang.
-
I had a long pause at the same boot point as you (5-10 minutes) on both of my servers but they successfully completed the upgrade afterwards.
Both servers are bare metal Dell PowerEdge R220s.
-
Mmm, if it does eventually show the console menu that's a sign that it just isn't using the current console as primary. The menu is shown on all consoles.
-
I spent a couple of hours on this today. No resolution yet.
Tried:- Disabled BMC "Host Interface"
- Verbose logging
- Safe Mode
- Disabled Serial ports and console in BIOS
- Letting it sit trying to load for an hour
- Manually set Console to Video and Serial at pfSense boot screen
- Reset BIOS to optimized defaults. I did have a slight change in there to support Intel Speed Shift. Setting that to back to default did not help. Was set to : Hardware P-States: Native Mode (to support Speed Shift)
I just can't get it to load pfSense. This was the console output with Verbose logging turn on and using the Serial console output. (was the same when using just Video console)
I'm not sure what to do next.
-
@InstanceExtension This is a bare metal host, correct? What other storage is present aside from the Samsung NVMe? No other peripherals connected?
-
@tinfoilmatt said in pfSense CE 2.8.0 upgrade stalls after reboot and gets stuck when loading:
@InstanceExtension This is a bare metal host, correct? What other storage is present aside from the Samsung NVMe? No other peripherals connected?
Bare metal, correct. No other storage is present during the upgrade.
No other peripherals connected. Not even a keyboard, mouse or monitor although I did have them present to test and it made no difference.
Just the MB itself and its default components.
-
@InstanceExtension I wonder if you could get the upgrade to complete if you wiped 2.7.2 first, reinstalled, performed upgrade prior to restoring config, etc.—just to see if the hardware will complete the 2.8 upgrade.