pfsense crashes (panic - ffs_valloc: dup alloc) and reboots on pfBlockerNG updates
-
Hi All,
pfSense was crashing every two hours, and rebooting, I've since changed the pfBlockerNG update to happen overnight to reduce the impact, and and it was due to crash at 08:15 local time, but that has now been and gone, and the system remained up.
Have collected dump headers, and the textdump tar files, will attach them, and here's the first one I encountered:
Dump header from device: /dev/gptid/d2f172ea-5325-11ea-9dd5-00e0671c6be4 Architecture: amd64 Architecture Version: 4 Dump Length: 75776 Blocksize: 512 Compression: none Dumptime: Tue Jan 17 22:15:20 2023 Hostname: Heimdallr.kn-mck Magic: FreeBSD Text Dump Version String: FreeBSD 12.3-STABLE RELENG_2_6_0-n226742-1285d6d205f pfSense Panic String: ffs_valloc: dup alloc Dump Parity: 646810427 Bounds: 0 Dump Status: good
I have recently upgraded pfBlockerNG to version 3.1.0_9 but the issue only started occurring after a power outage last night, not immediately after the upgrade. I did also manually run a pfBlockerNG update after it was upgraded, and that ran fine.
It crashed multiple times overnight and came back up cleanly after each crash, so it doesn't look like filesystem corruption due to the power outage.
Looks like I may have run into a bug, but wanted that confirmed before I opened an issue on redmine.
Here's the next textdump and dump header, from a quick skim it does appear to be the same.
textdump-0015.tar info-0015.txt -
That crash is a UFS fileseystem fault but it's probably caused by the initial crash where the firewall rebooted without unmounting the filesystem. That's panic is not shown though.
Firstly do a manual filesystem check to make sure any existing issues are fixed:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html#manual-filesystem-checkSteve
-
@stephenw10 thanks for that, it's just crashed again so it seems unlikely to be due to pfBlockerNG.
Here's the latest crash files. textdump-20230118.1500.tar info - 20230118.1500.txt
I'll run the manual offline check later tonight, but I've just run it now and there's a few errors.
[2.6.0-RELEASE][root@Heimdallr.kn-mck]/root: fsck -fy / ** /dev/gptid/d2f012b5-5325-11ea-9dd5-00e0671c6be4 (NO WRITE) SETTING DIRTY FLAG IN READ_ONLY MODE UNEXPECTED SOFT UPDATE INCONSISTENCY ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes INODE 81046: FILE SIZE 948963 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 393216 ADJUST? no ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts UNREF FILE I=80604 OWNER=unbound MODE=100644 SIZE=0 MTIME=Jan 18 15:18 2023 RECONNECT? no CLEAR? no UNREF FILE I=1284137 OWNER=root MODE=100666 SIZE=0 MTIME=Jan 18 15:01 2023 CLEAR? no UNREF FILE I=1284200 OWNER=uucp MODE=100666 SIZE=0 MTIME=Jan 17 20:31 2023 CLEAR? no ** Phase 5 - Check Cyl groups SUMMARY INFORMATION BAD SALVAGE? no BLK(S) MISSING IN BIT MAPS SALVAGE? no FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? no ALLOCATED FILE 1284200 MARKED FREE 44782 files, 998831 used, 2554189 free (48061 frags, 313266 blocks, 1.4% fragmentation)
-
@stephenw10 So have run the filesystem check, and its fixed the few issues, and flicked pfBlockerNG back to two hourly updates. No crashes since so looks all good.
Is it worthwhile creating a bug report on redmine? This definitely seems like it shouldn't happen, but I don't know how reproduceable it would be or if the logs I have would be enough to track down the issue, which is possibly upstream in FreeBSD anyway.
-
Not for the
ffs_valloc: dup alloc
crash. That's a known issue with UFS, part of the reason we switched to ZFS as the default install.
If the system is powered off without unmounting the filesystem it can be damaged beyond what a single fsck pass can repair which then requires a manual run.
If it wasn't manually powered down that was probably caused by some initial crash the cause of which is unknown. If that happens again we can review the crash report from it.Steve
-
@stephenw10 Yeah ok, looking at the emails I got, it looks like the UPS ran out of power before it could fully shut down, but it was shutting down when the UPS ran out.