pfsense crashes (panic - ffs_valloc: dup alloc) and reboots on pfBlockerNG updates

vertigo

Hi All,

pfSense was crashing every two hours, and rebooting, I've since changed the pfBlockerNG update to happen overnight to reduce the impact, and and it was due to crash at 08:15 local time, but that has now been and gone, and the system remained up.

Have collected dump headers, and the textdump tar files, will attach them, and here's the first one I encountered:

textdump.tar.0 info.0

Dump header from device: /dev/gptid/d2f172ea-5325-11ea-9dd5-00e0671c6be4
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 75776
  Blocksize: 512
  Compression: none
  Dumptime: Tue Jan 17 22:15:20 2023
  Hostname: Heimdallr.kn-mck
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 12.3-STABLE RELENG_2_6_0-n226742-1285d6d205f pfSense
  Panic String: ffs_valloc: dup alloc
  Dump Parity: 646810427
  Bounds: 0
  Dump Status: good

I have recently upgraded pfBlockerNG to version 3.1.0_9 but the issue only started occurring after a power outage last night, not immediately after the upgrade. I did also manually run a pfBlockerNG update after it was upgraded, and that ran fine.

It crashed multiple times overnight and came back up cleanly after each crash, so it doesn't look like filesystem corruption due to the power outage.

Looks like I may have run into a bug, but wanted that confirmed before I opened an issue on redmine.

Here's the next textdump and dump header, from a quick skim it does appear to be the same.
textdump-0015.tar info-0015.txt

stephenw10

That crash is a UFS fileseystem fault but it's probably caused by the initial crash where the firewall rebooted without unmounting the filesystem. That's panic is not shown though.

Firstly do a manual filesystem check to make sure any existing issues are fixed:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/filesystem-check.html#manual-filesystem-check

Steve

vertigo

@stephenw10 thanks for that, it's just crashed again so it seems unlikely to be due to pfBlockerNG.

Here's the latest crash files. textdump-20230118.1500.tar info - 20230118.1500.txt

I'll run the manual offline check later tonight, but I've just run it now and there's a few errors.

[2.6.0-RELEASE][root@Heimdallr.kn-mck]/root: fsck -fy /
** /dev/gptid/d2f012b5-5325-11ea-9dd5-00e0671c6be4 (NO WRITE)
SETTING DIRTY FLAG IN READ_ONLY MODE

UNEXPECTED SOFT UPDATE INCONSISTENCY
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
INODE 81046: FILE SIZE 948963 BEYOND END OF ALLOCATED FILE, SIZE SHOULD BE 393216
ADJUST? no

** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE  I=80604  OWNER=unbound MODE=100644
SIZE=0 MTIME=Jan 18 15:18 2023
RECONNECT? no


CLEAR? no

UNREF FILE I=1284137  OWNER=root MODE=100666
SIZE=0 MTIME=Jan 18 15:01 2023
CLEAR? no

UNREF FILE I=1284200  OWNER=uucp MODE=100666
SIZE=0 MTIME=Jan 17 20:31 2023
CLEAR? no

** Phase 5 - Check Cyl groups
SUMMARY INFORMATION BAD
SALVAGE? no

BLK(S) MISSING IN BIT MAPS
SALVAGE? no

FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? no

ALLOCATED FILE 1284200 MARKED FREE
44782 files, 998831 used, 2554189 free (48061 frags, 313266 blocks, 1.4% fragmentation)

vertigo

@stephenw10 So have run the filesystem check, and its fixed the few issues, and flicked pfBlockerNG back to two hourly updates. No crashes since so looks all good.

Is it worthwhile creating a bug report on redmine? This definitely seems like it shouldn't happen, but I don't know how reproduceable it would be or if the logs I have would be enough to track down the issue, which is possibly upstream in FreeBSD anyway.

stephenw10

Not for the ffs_valloc: dup alloc crash. That's a known issue with UFS, part of the reason we switched to ZFS as the default install.
If the system is powered off without unmounting the filesystem it can be damaged beyond what a single fsck pass can repair which then requires a manual run.
If it wasn't manually powered down that was probably caused by some initial crash the cause of which is unknown. If that happens again we can review the crash report from it.

Steve

vertigo

@stephenw10 Yeah ok, looking at the emails I got, it looks like the UPS ran out of power before it could fully shut down, but it was shutting down when the UPS ran out.