pfSense 2.5.2 keeps crashing periodically

netblues

<118>* WARNING! *
<118>* The current configuration has been created with a newer version of pfSense *
<118>* than this one! This can lead to serious misbehavior and even security *
<118>* holes! You are urged to either upgrade to a newer version of pfSense or *
<118>* revert to the default configuration immediately! *
<118>*************************************

And the crash says pagefault while in kernel mode which is very os related.
No idea what causes it, but any chances you downgraded?

urquhaty

@netblues

I was getting the error when originally running on 2.5.x. Then I upgraded to 2.6.x to see if it would resolve the issue.

That didn't work, so I did a fresh install of 2.5.2 on a different boot disk and restored the 2.6.x configuration. So I would like to think that the downgrade didn't cause any of this... but who knows.

Starting to feel like a software issue (seeing as I have changed pretty much every part of my pfSense box). I've also tried to disable all of my packaes (HAProxy, ntop, etc.). No luck though...

Is my only remaining option to start from scratch?

netblues

@urquhaty Wait for other opinions.
However if you can do a fresh install and start config from scratch and it doesn't bomb, then we are getting somewhere
2.5.2 is quite stable, as a starting point too.

stephenw10

The backtrace is the key part there:

db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100040 td 0xfffff8000538b740
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00004ee280
vpanic() at vpanic+0x197/frame 0xfffffe00004ee2d0
panic() at panic+0x43/frame 0xfffffe00004ee330
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00004ee390
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00004ee3e0
trap() at trap+0x286/frame 0xfffffe00004ee4f0
calltrap() at calltrap+0x8/frame 0xfffffe00004ee4f0
--- trap 0xc, rip = 0xffffffff8109c7fa, rsp = 0xfffffe00004ee5c0, rbp = 0xfffffe00004ee630 ---
pf_test_state_udp() at pf_test_state_udp+0x2ba/frame 0xfffffe00004ee630
pf_test() at pf_test+0x1db8/frame 0xfffffe00004ee870
pf_check_in() at pf_check_in+0x1d/frame 0xfffffe00004ee890
pfil_run_hooks() at pfil_run_hooks+0xa1/frame 0xfffffe00004ee930
ip_tryforward() at ip_tryforward+0x193/frame 0xfffffe00004ee9b0
ip_input() at ip_input+0x3fe/frame 0xfffffe00004eea60
swi_net() at swi_net+0x12b/frame 0xfffffe00004eead0
ithread_loop() at ithread_loop+0x23c/frame 0xfffffe00004eeb30
fork_exit() at fork_exit+0x7e/frame 0xfffffe00004eeb70
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00004eeb70
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db:0:kdb.enter.default>  ps

That's not anything I recognise directly.

Do you have any other crashes? Are they all similar?
If it's a hardware issue the crashes will be random.

Steve

urquhaty

@stephenw10

Every back trace that I've looked at is the same. (pf_test_state_udp()..., etc.)

Cool_Corona

@stephenw10 What is PID 12?

stephenw10

That's just the current process ID, it's not anything specific to the issue.

Nothing really jumps out from that boot log other than you have a load of devices enabled that don't need to be, sound card etc. I would disable all of that in the BIOS if you can.

Did this just start happening or has this hardware always crashed like this?

Steve

urquhaty

@stephenw10 This issue started happening on different hardware. Then I swapped everything out one by one to try and isolate what was causing the issue. Now the hardware is essentially a 'new' machine.

The only thing I can think of that changed elsewhere on my network would have been some Pi-hole configuration... I recently just re-enabled my domain controllers DNS server to handle local domain requests and maybe accidentally caused a DNS loop when I told Pi-hole to forward domain requests to the DC? Not sure if this is even relevant, but it's what I'm trying right now... kind of desperate at this point.

stephenw10

I can't imagine anything DNS related causing a kernel panic like that. Something must have changed though if it was running on that same hardware fine previously.

urquhaty

@stephenw10 I didn't think it would have either. It's been running for an hour and 20 minutes so far without a crash. I'm going to wait and see what happens with this change before I try anything else... I'll reply with an update later.

urquhaty

@stephenw10

Well as much as I didn't think it could be the DNS, I think it might have been the problem. Going on strong for 4+ hours.

A note for anyone else looking at this:

I don't remember the exact setting in Pi-hole, but I think it was enabling 'conditional forwarding' for my top-level domain (tld). There was a new entry added in my dnsmasq conf file. I think it was a 'rev-server=' line and a 'server=//domain.tld' entry that I commented out and disabled conditional forwarding. This may not solve your problem or even be the exact cause, but If you changed DNS settings recently just make sure they are correct.

stephenw10

Hmm, I wonder what that's causing that would trigger this...