Firewall rebooted unexpectedly
-
Netgate 6100 rebooted unexpectedly.
I have some crash dump files that i can upload.Crash report begins. Anonymous machine information: amd64 15.0-CURRENT FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/sources/FreeBS Crash report details: No PHP errors found. Filename: /var/crash/info.0 Dump header from device: /dev/nda0p3 Architecture: amd64 Architecture Version: 4 Dump Length: 371712 Blocksize: 512 Compression: none Dumptime: 2024-09-05 15:16:17 -0400 Hostname: GAFW Magic: FreeBSD Text Dump Version String: FreeBSD 15.0-CURRENT #0 plus-RELENG_24_03-n256311-e71f834dd81: Fri Apr 19 00:28:14 UTC 2024 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-24_03-main/obj/amd64/Y4MAEJ2R/var/j Panic String: page fault Dump Parity: 2857159027 Bounds: 0 Dump Status: good
-
rebooted again...somethings failing i think.
SSD is still in a good state
=== START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
-
Upload the crash report here: https://nc.netgate.com/nextcloud/s/mWWHieq9ZHL6seF
-
@stephenw10
files uploaded. I also have a TAC opened. Im not seeing any signs of hardware failure as suggested but could be wrong. -
Doesn't look like hardware, all those crashes are almost identical.
Backtrace:
db:1:pfs> bt Tracing pid 12 tid 100043 td 0xfffff80001688740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe00850ca270 panic() at panic+0x43/frame 0xfffffe00850ca2d0 trap_fatal() at trap_fatal+0x40f/frame 0xfffffe00850ca330 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00850ca390 calltrap() at calltrap+0x8/frame 0xfffffe00850ca390 --- trap 0xc, rip = 0xffffffff846626a7, rsp = 0xfffffe00850ca460, rbp = 0xfffffe00850ca490 --- export_pflow() at export_pflow+0x77/frame 0xfffffe00850ca490 pf_detach_state() at pf_detach_state+0x45b/frame 0xfffffe00850ca4d0 pf_state_insert() at pf_state_insert+0x854/frame 0xfffffe00850ca570 pf_test_rule() at pf_test_rule+0x28f8/frame 0xfffffe00850ca9c0 pf_test() at pf_test+0x1382/frame 0xfffffe00850cab90 pf_check_out() at pf_check_out+0x22/frame 0xfffffe00850cabb0 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe00850cabe0 ip_output() at ip_output+0xb60/frame 0xfffffe00850cace0 ip_forward() at ip_forward+0x3c2/frame 0xfffffe00850cad90 ip_input() at ip_input+0x705/frame 0xfffffe00850cadf0 swi_net() at swi_net+0x138/frame 0xfffffe00850cae60 ithread_loop() at ithread_loop+0x257/frame 0xfffffe00850caef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe00850caf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00850caf30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Looks like an issue in pflow, do you have that enabled?
The only other thing I see is:
<6>pid 67263 (pftop), jid 0, uid 0: exited on signal 6 (core dumped)
That could just be a symptom of the panic though. -
@stephenw10
I do have pflow enabled
Its been working great since the 24. update. Why is it acting up now? -
Good question. And it's set to Netflowv5 so not this: https://redmine.pfsense.org/issues/15446
What else has changed?
-
@stephenw10
I cant see the config history as now its flooded with (system): related messages.The Auto Configuration Backup / Restore has no backups for the device. Is this normal?
This started yesterday during the work day so for sure no changes. Later that night i updated a pfblocker DNSBL feed but its not related to pfblocker.
Anything else i can check? Any other clues in the crash dumps?
-
Hmm, ACB not seeing backups is probably unrelated. But check general connectivity from the firewall itself. Check if using the key in a different box can see the backups.
This looks like a bug in flow to me, we are looking into it.
How often is it panicking? Can you test disabling pflow?
-
@stephenw10
I can disable flow for now.The restart events are below
9/5 - 3:20pm EDT
9/5 - 3:40pm EDT
9/5 - 11:50pm EDT
9/6 - 03:30am EDT
9/6 - 05:40am EDT
9/6 - 07:00am EDT -
Hmm, OK it appears it probably is that bug. Or at least the same fix applies.
Something must have changed though for it to suddenly start hitting it.
-
@stephenw10 Even though the redmine points to it being related to IPFIX?
The only thing that "recently" changed was a NAT Port Forward rule and DHCP settings on 9/5 @ 09:32am EDT
I see there is a patch created.
-
There is a patch but it's a compile time patch. It's fixed in 24.08 but would need a rebuild for 24.03.
Yes, in the original bug report it only affected IPFIX which is why I initially thought it could not be that. But Kristof believes the root cause is the same here, the fix is the same.
It is odd though that you were not hitting it before though. Something must have changed. Hard to imagine a port forward would have done it.
-
@stephenw10
I honestly dont know what couldve change within 24hrs specifically to pflow. I added an additional collector configuration a while back agoI reviewed my changes from yesterday and confirmed only those changes i stated were done. Considering the bulk of the reboots happened while i was asleep and as far as i know i don't sleep walk (maybe i do) it wasn't anything I've done overnight to cause those reboots.
As of now the fix is ready but will be released with 24.08?
The workaround is to disable pflow? -
Well the first thing is to confirm it really is pflow by disabling it making sure it doesn't happen.