Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.
-
EDIT:
I can boot in legacy mode
Hardware is an ASRock Rack E3C226D2I board. I can boot in legacy mode, but not in UEFI, no matter what I do. Boot Environment selection works, but the bootfs option (3) remains the same, it never changes from 25.07.1, and it fails to boot every time. -
I wanted to suppress the Intel driver spam, but this ended up making it impossible to do so. Every time I add options to loader.conf.local, they get cleaned on reboot (another bug?), so I used /boot/loader.conf instead. However, the spam is still there, which indicates some other problem.
This is what I see for comparision in legacy mode:
ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff80760600, 0) error 1 ipw_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_ibss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_ibss_fw, 0xffffffff807606b0, 0) error 1 ipw_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_monitor: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_monitor_fw, 0xffffffff80760760, 0) error 1 iwi_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_bss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_bss_fw, 0xffffffff8077fdd0, 0) error 1 iwi_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_ibss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_ibss_fw, 0xffffffff8077fe80, 0) error 1 iwi_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_monitor: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_monitor_fw, 0xffffffff8077ff30, 0) error 1 random: entropy device external interface wlan: mac acl policy registered kbd1 at kbdmux0 WARNING: Device "spkr" is Giant locked and may be deleted before FreeBSD 15.0. netgate0: <unknown hardware> netgate0: version: 0.1 vtvga0: <VT VGA driver> smbios0: <System Management BIOS> at iomem 0xf04c0-0xf04de smbios0: Version: 2.8, BCD Revision: 2.7 acpi0: <ALASKA A M I> acpi0: Power Button (fixed) cpu0: <ACPI CPU> on acpi0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 550 atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 ixl0: <Intel(R) Ethernet Controller X710 for 10GBASE-T - 2.3.3-k> mem 0xfbe000000-0xfbeffffff,0xfbf018000-0xfbf01ffff irq 16 at device 0.0 on pci1 ixl0: fw 9.820.73026 api 1.15 nvm 9.20 etid 8000d966 oem 22.5632.9 ixl0: PF-ID[0]: VFs 32, MSI-X 129, VF MSI-X 5, QPs 384, MDIO shared
-
Hmm, those should not be removed from loader.conf.local.
Are you able to get the full output from a UEFI boot leading up to the failure?
-
Are you able to try upgrading from 24.11 to 25.07.1 directly?
-
If you can replicate that try running
bt
at thedb>
prompt after it crashes to get the backtrace.Also try interrupting the loader to reach the
OK>
prompt and run:memmap
That will show if there's something hardware specific happening. -
@stephenw10
Thank you for the hints. I’ll test what I can do and report back as soon as possible. -
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
If you can replicate that try running bt at the db> prompt after it crashes to get the backtrace.
It just hangs there, no input possible.
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
and run: memmap
-
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
Are you able to get the full output from a UEFI boot leading up to the failure?
Not yet. I think capturing is possible, but didn't have tried yet.
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
Are you able to try upgrading from 24.11 to 25.07.1 directly?
I’m not sure here, maybe I still have an old boot environment left, then theoretically it’s possible, but it would require my presence, so it can only be done later after work.
-
Ok thanks. I'll relay that to the devs, make sure it looks rational.
-
If you're able to get the full boot output try doing so whilst booting verbose. So
boot -v
at the loader prompt. That should give us more. -
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
stephenw10
Netgate
Administrator
about 5 hours agoIf you're able to get the full boot output try doing so whilst booting verbose. So boot -v at the loader prompt. That should give us more.
pfsense_capture.zip Not very good, but I don't think I can do better.
Will see if I can find 24.11... -
No luck, after upgrading on 24.11 the symptoms remain the same.
-
I think we should be able to see something there. Let's see....
-
Hmm unfortunately it's obscuring some of the most useful output.
Are you able to boot with acpi disabled entirely?
set hint.acpi.0.disabled=1
at the loader prompt. -
@stephenw10 said in Upgrading from 25.07 to 25.07.1 causes a fatal trap 12 on boot.:
set hint.acpi.0.disabled=1
Is it the same as the 'ACPI off' option in the loader prompt under boot options?
I think I can provide the full UEFI boot output using the Netgate installer boot or even the console output — I’ll try something over the weekend.
On the other side, what changes were made in the bootloader between 25.07 and 25.07.1?
-
@w0w "ACPI off" is probably the same.
The difference between the 25.07 and 25.07.1 loader is the expansion of the memory area used to load the kernel and modules; the kernel has grown and some systems (such as those with many devices) rely on more memory to be reserved by the loader to be able to boot the kernel. We can't revert this expansion because it actually affects a substantial number of systems.
My guess with what's happening on your system is a BIOS firmware bug where the faulting address is not reported to be reserved for ACPI system use, and it coincidentally is where the kernel got loaded into. Because it's kernel code memory, the pages are marked as read-only, so a page-fault occurred when the ACPI driver tried to write to it.
If disabling ACPI doesn't work, then another thing to try is telling the loader to add slop space, which is a memory range to add on top of the expanded space. Go into the loader prompt and issue the command below to tell it to increase it further to 256MB, so that the kernel code doesn't overlap with what ACPI is trying to access:
staging_slop 268435456 boot -v
Unfortunately,
staging_slop
is a command, so it can't be added to loader.conf.Once the pfSense is booted (even if in 24.11), can you collect the ACPI tables to help us see what resources the BIOS is access or owns?
acpidump -dt | gzip -c > acpi.asl.gz
-
@ldangpfng
Thank you for support!
8c2764e4-d3ec-4872-b293-5d8d26535d1a-acpi.asl.gz
I hope this helps. This is from a 25.07.1 system booted with the recommended slop. -
There it is, a memory mapped PCI config space but the firmware has not marked the memory as reserved. Instead it's still in the Conventional Memory block, which means a loader or kernel could incorrectly allocate memory into that space.
Scope (_SB.PCI0.HEC2) { Name (H2BR, 0xBFF01000) <--- Name (H2ST, 0x0B) OperationRegion (NMFS, PCI_Config, 0x40, 0x04) Field (NMFS, DWordAcc, NoLock, Preserve) { , 30, DMEN, 1, NMEN, 1 }
This is a firmware bug; it should be reported to the vendor to do the right thing with memory map. Maybe they have a BIOS update available.