pfSense 2.5.2 keeps crashing periodically
-
As the title says... my pfSense physical machine keeps crashing and causing a network disconnect. I've tried replacing the boot drive, RAM, power supply (and power cable) and motherboard. The only thing I haven't changed is the NIC Intel PRO/1000 (I think).
Currently testing running off of a different power outlet not on my UPS.
Anyone able to help me decipher the logs? crashlog_pfsense.txt
-
<118>* WARNING! *
<118>* The current configuration has been created with a newer version of pfSense *
<118>* than this one! This can lead to serious misbehavior and even security *
<118>* holes! You are urged to either upgrade to a newer version of pfSense or *
<118>* revert to the default configuration immediately! *
<118>*************************************And the crash says pagefault while in kernel mode which is very os related.
No idea what causes it, but any chances you downgraded? -
I was getting the error when originally running on 2.5.x. Then I upgraded to 2.6.x to see if it would resolve the issue.
That didn't work, so I did a fresh install of 2.5.2 on a different boot disk and restored the 2.6.x configuration. So I would like to think that the downgrade didn't cause any of this... but who knows.
Starting to feel like a software issue (seeing as I have changed pretty much every part of my pfSense box). I've also tried to disable all of my packaes (HAProxy, ntop, etc.). No luck though...
Is my only remaining option to start from scratch?
-
@urquhaty Wait for other opinions.
However if you can do a fresh install and start config from scratch and it doesn't bomb, then we are getting somewhere
2.5.2 is quite stable, as a starting point too. -
The backtrace is the key part there:
db:0:kdb.enter.default> bt Tracing pid 12 tid 100040 td 0xfffff8000538b740 kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004ee280 vpanic() at vpanic+0x197/frame 0xfffffe00004ee2d0 panic() at panic+0x43/frame 0xfffffe00004ee330 trap_fatal() at trap_fatal+0x391/frame 0xfffffe00004ee390 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004ee3e0 trap() at trap+0x286/frame 0xfffffe00004ee4f0 calltrap() at calltrap+0x8/frame 0xfffffe00004ee4f0 --- trap 0xc, rip = 0xffffffff8109c7fa, rsp = 0xfffffe00004ee5c0, rbp = 0xfffffe00004ee630 --- pf_test_state_udp() at pf_test_state_udp+0x2ba/frame 0xfffffe00004ee630 pf_test() at pf_test+0x1db8/frame 0xfffffe00004ee870 pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00004ee890 pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00004ee930 ip_tryforward() at ip_tryforward+0x193/frame 0xfffffe00004ee9b0 ip_input() at ip_input+0x3fe/frame 0xfffffe00004eea60 swi_net() at swi_net+0x12b/frame 0xfffffe00004eead0 ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00004eeb30 fork_exit() at fork_exit+0x7e/frame 0xfffffe00004eeb70 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004eeb70 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- db:0:kdb.enter.default> ps
That's not anything I recognise directly.
Do you have any other crashes? Are they all similar?
If it's a hardware issue the crashes will be random.Steve
-
Every back trace that I've looked at is the same. (pf_test_state_udp()..., etc.)
-
@stephenw10 What is PID 12?
-
That's just the current process ID, it's not anything specific to the issue.
Nothing really jumps out from that boot log other than you have a load of devices enabled that don't need to be, sound card etc. I would disable all of that in the BIOS if you can.
Did this just start happening or has this hardware always crashed like this?
Steve
-
@stephenw10 This issue started happening on different hardware. Then I swapped everything out one by one to try and isolate what was causing the issue. Now the hardware is essentially a 'new' machine.
The only thing I can think of that changed elsewhere on my network would have been some Pi-hole configuration... I recently just re-enabled my domain controllers DNS server to handle local domain requests and maybe accidentally caused a DNS loop when I told Pi-hole to forward domain requests to the DC? Not sure if this is even relevant, but it's what I'm trying right now... kind of desperate at this point.
-
I can't imagine anything DNS related causing a kernel panic like that. Something must have changed though if it was running on that same hardware fine previously.
-
@stephenw10 I didn't think it would have either. It's been running for an hour and 20 minutes so far without a crash. I'm going to wait and see what happens with this change before I try anything else... I'll reply with an update later.
-
Well as much as I didn't think it could be the DNS, I think it might have been the problem. Going on strong for 4+ hours.
A note for anyone else looking at this:
I don't remember the exact setting in Pi-hole, but I think it was enabling 'conditional forwarding' for my top-level domain (tld). There was a new entry added in my dnsmasq conf file. I think it was a 'rev-server=' line and a 'server=//domain.tld' entry that I commented out and disabled conditional forwarding. This may not solve your problem or even be the exact cause, but If you changed DNS settings recently just make sure they are correct.
-
Hmm, I wonder what that's causing that would trigger this...