pfsense crash 2.8.0
-
The backtrace is usually the important part of a crash report and isn't shown there. Did you get a full crash report?
-
@stephenw10 I was using memory based fle option for /var set atr 1Gb but had never see it get anywhere near that, usually 1% (right now at 55Mb) at most but then the crash dump ended up in there so a page fault occurred.
My suspicion is that the crash happened but then the crash report caused a page fault when the dump filled up /var. But who knows maybe something unusual filled /var up.
I guess /var isn't the best place for a crash dump or if that's the only option then I'll have to remove the memory based option. I did increase the size to 4Gb but who know how big the dump could be.
Thanks,
Bill
-
@cayossarian said in pfsense crash 2.8.0:
the crash report caused a page fault when the dump filled up /var
Crash dump are not stored 'somewhere' in the /var/ - but, afaik, in the swap space (partition).
Obtaining Panic Information for Developers
Start by saying they are stored in /var/crash/
and at the bottom you'll find : Install without Swap Space which tells me something different. And actually, as you said, more logic : what happens when there is a file system issue ? The system goes down with a trace.
Also : the small Netgate appliances don't even have '4 Gbytes' for their /var/ ....Maybe - me even more guessing - the /car/crash/ contains some sort of symlink or just a filename or indication if a crash dump exists in the swap ?
@cayossarian said in pfsense crash 2.8.0:
But who knows maybe something unusual filled /var
Your mission, as an admin : go have a look ? What folder contains 'Gbytes' size files ?
-
You should still see the backtrace at the console if it panics even without SWAP to store it.
-
This post is deleted! -
Backtrace:
db:1:pfs> bt Tracing pid 11 tid 100003 td 0xfffff8026f5fd740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe008e21eb20 panic() at panic+0x43/frame 0xfffffe008e21eb80 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe008e21ebe0 trap_pfault() at trap_pfault+0x46/frame 0xfffffe008e21ec30 calltrap() at calltrap+0x8/frame 0xfffffe008e21ec30 --- trap 0xc, rip = 0xffffffff80d15b8d, rsp = 0xfffffe008e21ed00, rbp = 0xfffffe008e21ed60 --- callout_process() at callout_process+0x1ad/frame 0xfffffe008e21ed60 handleevents() at handleevents+0x186/frame 0xfffffe008e21eda0 cpu_activeclock() at cpu_activeclock+0x6a/frame 0xfffffe008e21edd0 cpu_idle() at cpu_idle+0xa6/frame 0xfffffe008e21edf0 sched_idletd() at sched_idletd+0x546/frame 0xfffffe008e21eef0 fork_exit() at fork_exit+0x7b/frame 0xfffffe008e21ef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008e21ef30 --- trap 0xf0db229f, rip = 0x2a49b49e2199f62b, rsp = 0x996070dacc2370c0, rbp = 0x468b9de920125c59 ---
Unfortunately that's not very revealing. Doesn't really point to anything specific.
The message buffer has some entries I would investigate though.
<6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP
What is igc0? Was the link intentionally being reconnected?
<6>arp: 192.168.65.70 moved from 00:14:2d:e2:70:18 to 2c:3b:70:e9:08:61 on igc1.65 <3>arp: 2c:3b:70:e9:08:61 attempts to modify permanent entry for 192.168.65.70 on igc1.65 <6>arp: 192.168.65.70 moved from 00:14:2d:e2:70:18 to 2c:3b:70:e9:08:61 on igc1.65
What are those devices and are they something that should sharing an IP address? Also that permanent entry implies either it's a local NIC or you're using static-arp which is almost always a bad idea.
<7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 <7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (6547 occurrences), euid 0, rgid 0, jail 0 <7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1234 occurrences), euid 0, rgid 0, jail 0
It looks like Unbound is unable to answer queries over TLS fast enough and it exhausting the queue for some reason.
-
This post is deleted! -
Hmm, so 192.168.65.70 is the pfSense interface in that VLAN? And c:3b:70:e9:08:61 should not be using it?
None of that should ever cause a panic but you should address it at least to clean up the logs so other more important events aren't hidden.
-
This post is deleted! -
Are both interfaces actually connected? Both on the same subnet? That's often asking for trouble. I would try to use only one interface there.
-
@stephenw10 I don’t have control of the panel but thanks for asking as I can open a. Support ticket with SPAN.