2.6.0 crashdump, possibly wireguard, new DMZ on a USB ethernet
-
Great software, by the way
My first crash in ten years!!!!
textdump.tar.0
I've got a pretty complicated setup bridging two houses with wireguard and I'm recently putting my work computer behind a DMZ (I was out of ethernet ports so I added a USB dongle)Machine hung HARD, failed to boot and the console showed it was very confused which ports went to which interfaces.
I thought it was the USB ethernet dongle I was using for a DMZ, but I found it would only successfully boot if I put the dongle back in.
When it did finally successfully reboot, ALL four of the wireguard interfaces had been removed from the UI
-
Backtrace:
db:0:kdb.enter.default> bt Tracing pid 85756 tid 100297 td 0xfffff80118864000 kdb_enter() at kdb_enter+0x37/frame 0xfffffe002d0df6f0 vpanic() at vpanic+0x197/frame 0xfffffe002d0df740 panic() at panic+0x43/frame 0xfffffe002d0df7a0 trap_fatal() at trap_fatal+0x391/frame 0xfffffe002d0df800 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe002d0df850 trap() at trap+0x286/frame 0xfffffe002d0df960 calltrap() at calltrap+0x8/frame 0xfffffe002d0df960 --- trap 0xc, rip = 0, rsp = 0xfffffe002d0dfa38, rbp = 0xfffffe002d0dfad0 --- ??() at 0/frame 0xfffffe002d0dfad0 devfs_write_f() at devfs_write_f+0xda/frame 0xfffffe002d0dfb40 dofilewrite() at dofilewrite+0xb0/frame 0xfffffe002d0dfb90 sys_write() at sys_write+0xc0/frame 0xfffffe002d0dfc00 amd64_syscall() at amd64_syscall+0x387/frame 0xfffffe002d0dfd30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe002d0dfd30 --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8014f4c6a, rsp = 0x7fffda9d2088, rbp = 0x7fffda9d20c0 ---
Panic:
Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 fault virtual address = 0x0 fault code = supervisor read instruction, page not present instruction pointer = 0x20:0x0 stack pointer = 0x28:0xfffffe002d0dfa38 frame pointer = 0x28:0xfffffe002d0dfad0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 85756 (ntopng) trap number = 12 panic: page fault cpuid = 1 time = 1683788428 KDB: enter: panic
Hmm, not a particularly helpful backtrace unfortunately. Though it appears to have crashed trying to write out data from ntopng. Check the SMART data on the drive. That seems unrelated to the new interface.
USB Ethernet is generally not recommended but at least you have an actual axgbe device here. Using VLANs instead would still be better.
If you removed the USB NIC and it was assigned pfSense will have dumped you at the interfaces assign prompt. Wireguard interfaces, like the VPNs, are not included in the interface check because they may not have been created at that point. However if you reassign the interfaces there it would have created only those interfaces hence no WG interfaces. I would roll-back the config to a backup before you added the USB NIC.Steve
-
@stephenw10
Thanks, this is really helpful.The drive has 24,000 hours on it (yikes) but no errors....
Question: To reduce writes, I think I'm going to:
- uninstall ntopng
- move
/tmp
and/var
to RAM - (recommended sizes for 4GB system running pfBlockerNG?)
- I'll also likely put a new drive in it as this one looks EOL
- other guidance?
I've got two of these devices bridging networks between two houses with wireguard and the other one had a similar death spiral before I reinstalled on a new drive (I just wasn't there to monitor it)
Things I noticed during last nights issue:
- The config history was completely overwritten dozens of times in it's final ten minutes or so (package install/uninstall)
- It's up and running again, but has cleared/disabled settings of pfBlockerNG, wireguard,
- firewall rules have remained intact, but the aliases from pfBlockerNG aren't loaded as its config was lost
SMART tests show its ok
SMART Extended Comprehensive Error Log Version: 1 (64 sectors) No Errors Logged SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 24590 - # 2 Short offline Completed without error 00% 24590 - # 3 Short offline Completed without error 00% 23581 - # 4 Extended offline Completed without error 00% 0 - Device Statistics (GP Log 0x04) Page Offset Size Value Flags Description 0x01 ===== = = === == General Statistics (rev 1) == 0x01 0x008 4 33 --- Lifetime Power-On Resets 0x01 0x010 4 24590 --- Power-on Hours 0x01 0x018 6 205737116177 --- Logical Sectors Written 0x01 0x028 6 48580279094 --- Logical Sectors Read 0x04 ===== = = === == General Errors Statistics (rev 1) == 0x04 0x008 4 0 --- Number of Reported Uncorrectable Errors 0x05 ===== = = === == Temperature Statistics (rev 1) == 0x05 0x008 1 33 --- Current Temperature 0x05 0x020 1 33 --- Highest Temperature 0x05 0x028 1 33 --- Lowest Temperature 0x06 ===== = = === == Transport Statistics (rev 1) == 0x06 0x008 4 224 --- Number of Hardware Resets 0x06 0x018 4 0 --- Number of Interface CRC Errors 0x07 ===== = = === == Solid State Device Statistics (rev 1) == 0x07 0x008 1 84 --- Percentage Used Endurance Indicator
-
@stephenw10
FWIW: I restored my (manual) config save from yesterday afternoon and I'm golden.
Thanks for the assist -
Hmm, the fact it saved a crashlog at all shows that the drive didn't fail entirely.
Using ram disks can be problematic with larger packages like that.