pfsense crash 2.8.0
-
This post is deleted! -
The backtrace is usually the important part of a crash report and isn't shown there. Did you get a full crash report?
-
@stephenw10 I was using memory based fle option for /var set atr 1Gb but had never see it get anywhere near that, usually 1% (right now at 55Mb) at most but then the crash dump ended up in there so a page fault occurred.
My suspicion is that the crash happened but then the crash report caused a page fault when the dump filled up /var. But who knows maybe something unusual filled /var up.
I guess /var isn't the best place for a crash dump or if that's the only option then I'll have to remove the memory based option. I did increase the size to 4Gb but who know how big the dump could be.
Thanks,
Bill
-
@cayossarian said in pfsense crash 2.8.0:
the crash report caused a page fault when the dump filled up /var
Crash dump are not stored 'somewhere' in the /var/ - but, afaik, in the swap space (partition).
Obtaining Panic Information for Developers
Start by saying they are stored in /var/crash/
and at the bottom you'll find : Install without Swap Space which tells me something different. And actually, as you said, more logic : what happens when there is a file system issue ? The system goes down with a trace.
Also : the small Netgate appliances don't even have '4 Gbytes' for their /var/ ....Maybe - me even more guessing - the /car/crash/ contains some sort of symlink or just a filename or indication if a crash dump exists in the swap ?
@cayossarian said in pfsense crash 2.8.0:
But who knows maybe something unusual filled /var
Your mission, as an admin : go have a look ? What folder contains 'Gbytes' size files ?
-
You should still see the backtrace at the console if it panics even without SWAP to store it.
-
This post is deleted! -
Backtrace:
db:1:pfs> bt Tracing pid 11 tid 100003 td 0xfffff8026f5fd740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe008e21eb20 panic() at panic+0x43/frame 0xfffffe008e21eb80 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe008e21ebe0 trap_pfault() at trap_pfault+0x46/frame 0xfffffe008e21ec30 calltrap() at calltrap+0x8/frame 0xfffffe008e21ec30 --- trap 0xc, rip = 0xffffffff80d15b8d, rsp = 0xfffffe008e21ed00, rbp = 0xfffffe008e21ed60 --- callout_process() at callout_process+0x1ad/frame 0xfffffe008e21ed60 handleevents() at handleevents+0x186/frame 0xfffffe008e21eda0 cpu_activeclock() at cpu_activeclock+0x6a/frame 0xfffffe008e21edd0 cpu_idle() at cpu_idle+0xa6/frame 0xfffffe008e21edf0 sched_idletd() at sched_idletd+0x546/frame 0xfffffe008e21eef0 fork_exit() at fork_exit+0x7b/frame 0xfffffe008e21ef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe008e21ef30 --- trap 0xf0db229f, rip = 0x2a49b49e2199f62b, rsp = 0x996070dacc2370c0, rbp = 0x468b9de920125c59 ---Unfortunately that's not very revealing. Doesn't really point to anything specific.
The message buffer has some entries I would investigate though.
<6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UP <6>igc0: link state changed to DOWN <6>igc0: link state changed to UPWhat is igc0? Was the link intentionally being reconnected?
<6>arp: 192.168.65.70 moved from 00:14:2d:e2:70:18 to 2c:3b:70:e9:08:61 on igc1.65 <3>arp: 2c:3b:70:e9:08:61 attempts to modify permanent entry for 192.168.65.70 on igc1.65 <6>arp: 192.168.65.70 moved from 00:14:2d:e2:70:18 to 2c:3b:70:e9:08:61 on igc1.65What are those devices and are they something that should sharing an IP address? Also that permanent entry implies either it's a local NIC or you're using static-arp which is almost always a bad idea.
<7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 <7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (6547 occurrences), euid 0, rgid 0, jail 0 <7>sonewconn: pcb 0xfffff801c6f85000 (127.0.0.1:853 (proto 6)): Listen queue overflow: 193 already in queue awaiting acceptance (1234 occurrences), euid 0, rgid 0, jail 0It looks like Unbound is unable to answer queries over TLS fast enough and it exhausting the queue for some reason.
-
This post is deleted! -
Hmm, so 192.168.65.70 is the pfSense interface in that VLAN? And c:3b:70:e9:08:61 should not be using it?
None of that should ever cause a panic but you should address it at least to clean up the logs so other more important events aren't hidden.
-
This post is deleted! -
Are both interfaces actually connected? Both on the same subnet? That's often asking for trouble. I would try to use only one interface there.
-
@stephenw10 I don’t have control of the panel but thanks for asking as I can open a. Support ticket with SPAN.