2.6.0 crashdump, possibly wireguard, new DMZ on a USB ethernet

pfrench

Great software, by the way
My first crash in ten years!!!!
textdump.tar.0
I've got a pretty complicated setup bridging two houses with wireguard and I'm recently putting my work computer behind a DMZ (I was out of ethernet ports so I added a USB dongle)

Machine hung HARD, failed to boot and the console showed it was very confused which ports went to which interfaces.

I thought it was the USB ethernet dongle I was using for a DMZ, but I found it would only successfully boot if I put the dongle back in.

When it did finally successfully reboot, ALL four of the wireguard interfaces had been removed from the UI

stephenw10

Backtrace:

db:0:kdb.enter.default>  bt
Tracing pid 85756 tid 100297 td 0xfffff80118864000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe002d0df6f0
vpanic() at vpanic+0x197/frame 0xfffffe002d0df740
panic() at panic+0x43/frame 0xfffffe002d0df7a0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe002d0df800
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe002d0df850
trap() at trap+0x286/frame 0xfffffe002d0df960
calltrap() at calltrap+0x8/frame 0xfffffe002d0df960
--- trap 0xc, rip = 0, rsp = 0xfffffe002d0dfa38, rbp = 0xfffffe002d0dfad0 ---
??() at 0/frame 0xfffffe002d0dfad0
devfs_write_f() at devfs_write_f+0xda/frame 0xfffffe002d0dfb40
dofilewrite() at dofilewrite+0xb0/frame 0xfffffe002d0dfb90
sys_write() at sys_write+0xc0/frame 0xfffffe002d0dfc00
amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe002d0dfd30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe002d0dfd30
--- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8014f4c6a, rsp = 0x7fffda9d2088, rbp = 0x7fffda9d20c0 ---

Panic:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address	= 0x0
fault code		= supervisor read instruction, page not present
instruction pointer	= 0x20:0x0
stack pointer	        = 0x28:0xfffffe002d0dfa38
frame pointer	        = 0x28:0xfffffe002d0dfad0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 85756 (ntopng)
trap number		= 12
panic: page fault
cpuid = 1
time = 1683788428
KDB: enter: panic

Hmm, not a particularly helpful backtrace unfortunately. Though it appears to have crashed trying to write out data from ntopng. Check the SMART data on the drive. That seems unrelated to the new interface.
USB Ethernet is generally not recommended but at least you have an actual axgbe device here. Using VLANs instead would still be better.
If you removed the USB NIC and it was assigned pfSense will have dumped you at the interfaces assign prompt. Wireguard interfaces, like the VPNs, are not included in the interface check because they may not have been created at that point. However if you reassign the interfaces there it would have created only those interfaces hence no WG interfaces. I would roll-back the config to a backup before you added the USB NIC.

Steve

pfrench

@stephenw10
Thanks, this is really helpful.

The drive has 24,000 hours on it (yikes) but no errors....

Question: To reduce writes, I think I'm going to:

uninstall ntopng
move /tmp and /var to RAM
(recommended sizes for 4GB system running pfBlockerNG?)
I'll also likely put a new drive in it as this one looks EOL
other guidance?

I've got two of these devices bridging networks between two houses with wireguard and the other one had a similar death spiral before I reinstalled on a new drive (I just wasn't there to monitor it)

Things I noticed during last nights issue:

The config history was completely overwritten dozens of times in it's final ten minutes or so (package install/uninstall)
It's up and running again, but has cleared/disabled settings of pfBlockerNG, wireguard,
firewall rules have remained intact, but the aliases from pfBlockerNG aren't loaded as its config was lost

SMART tests show its ok

SMART Extended Comprehensive Error Log Version: 1 (64 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     24590         -
# 2  Short offline       Completed without error       00%     24590         -
# 3  Short offline       Completed without error       00%     23581         -
# 4  Extended offline    Completed without error       00%         0         -

Device Statistics (GP Log 0x04)
Page  Offset Size        Value Flags Description
0x01  =====  =               =  ===  == General Statistics (rev 1) ==
0x01  0x008  4              33  ---  Lifetime Power-On Resets
0x01  0x010  4           24590  ---  Power-on Hours
0x01  0x018  6    205737116177  ---  Logical Sectors Written
0x01  0x028  6     48580279094  ---  Logical Sectors Read
0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1              33  ---  Current Temperature
0x05  0x020  1              33  ---  Highest Temperature
0x05  0x028  1              33  ---  Lowest Temperature
0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4             224  ---  Number of Hardware Resets
0x06  0x018  4               0  ---  Number of Interface CRC Errors
0x07  =====  =               =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1              84  ---  Percentage Used Endurance Indicator

pfrench

@stephenw10
FWIW: I restored my (manual) config save from yesterday afternoon and I'm golden.
Thanks for the assist

stephenw10

Hmm, the fact it saved a crashlog at all shows that the drive didn't fail entirely.

Using ram disks can be problematic with larger packages like that.