Wan periodic reset causes system reboot.
-
I'm sorry. Yes it will be much better when there's a gui option.
You shouldn't need to add the debug kernel just to get the coredump.
The important steps are:
- Make sure you have enough SWAP space (you do.
- Edit /etc/pfSense-ddb.conf so it contains the different default line like:
# $FreeBSD$ # # This file is read when going to multi-user and its contents piped thru # ``ddb'' to define debugging scripts. # # see ``man 4 ddb'' and ``man 8 ddb'' for details. # script lockinfo=show locks; show alllocks; show lockedvnods script pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace # kdb.enter.panic panic(9) was called. #script kdb.enter.default=textdump set; capture on; run pfs ; capture off; textdump dump; reset script kdb.enter.default=capture on; bt; show registers; show pcpu; capture off; dump; reset # kdb.enter.witness witness(4) detected a locking error. script kdb.enter.witness=run lockinfo
- Reboot.
- (Optionally) Run
sysctl debug.kdb.panic=1
to test the setup. You should see it writing out the coredump to swap in the console after all the backtraces scroll past.
Steve
-
@stephenw10 said in Wan periodic reset causes system reboot.:
- Edit /etc/pSense-ddb.conf so it contains the different default line like:
Hmmm, no such file found on this device. No idea why!
๏ธ
-
Oh sorry I typo'd that.
Should be
/etc/pfSense-ddb.conf
-
Haha - should have spotted that.
[23.09-BETA]/root: cat /etc/pfSense-ddb.conf # $FreeBSD$ # # This file is read when going to multi-user and its contents piped thru # ``ddb'' to define debugging scripts. # # see ``man 4 ddb'' and ``man 8 ddb'' for details. # script lockinfo=show locks; show alllocks; show lockedvnods script pfs=bt ; show registers ; show pcpu ; run lockinfo ; acttrace ; ps ; alltrace # kdb.enter.panic panic(9) was called. # script kdb.enter.default=textdump set; capture on; run pfs ; capture off; textdump dump; reset script kdb.enter.default=capture on; bt; show registers; show pcpu; capture off; dump; reset # kdb.enter.witness witness(4) detected a locking error. script kdb.enter.witness=run lockinfo [23.09-BETA]/root:
Now, do I have a typo of my own?
๏ธ
-
Looks fine to me. Reboot to apply it and then try a test panic.
-
Wife watching Bake Off on catch-up; I would die a painful death.
I'll be brave when she is elsewhere.
๏ธ
-
@stephenw10
I think I have it. Now, how do I get this massive vmcore and info file to you?<118>Netgate pfSense Plus 23.09-BETA amd64 20231020-0600 <118>Bootup complete <6>ng0: changing name to 'pppoe0' pf_test6: kif == NULL, if_xname pppoe0 <6>ng0: changing name to 'pppoe0' Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 18 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80f4e116 stack pointer = 0x0:0xfffffe00850b6b60 frame pointer = 0x0:0xfffffe00850b6b90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2 (clock (3)) rdi: fffff80203712800 rsi: 000000000000001c rdx: fffff8013760d878 rcx: fffff8013760d878 r8: 00000000ffffffbd r9: 0000000000000018 rax: 0000000000000000 rbx: 0000000000000000 rbp: fffffe00850b6b90 r10: fffff802033dd8c0 r11: fffff8016f5e5000 r12: 0000000000010300 r13: fffff80203676b98 r14: fffffe00850b6b68 r15: 0000000000000018 trap number = 12 panic: page fault cpuid = 3 time = 1697905286 KDB: enter: panic
๏ธ
-
You can upload it here: https://nc.netgate.com/nextcloud/index.php/s/ywzFPM3F8GZnRdb
Or I can download it from somewhere if that's easier, just send me a link in chat.
-
@stephenw10 said in Wan periodic reset causes system reboot.:
https://nc.netgate.com/nextcloud/index.php/s/ywzFPM3F8GZnRdb
Uploaded to your link. Usual privacy request, or I'll come looking for you.
If you can acknowledge they arrived ok, that would be great.
๏ธ
-
Great I see them.
-
@stephenw10
TVM.
I just used the GUI to command the WAN connection down and up again to trigger the panic. Give me a shout if you need a repeat or a different method.๏ธ
(but quietly hoping that this is the last of it...)
-
More can't hurt!
Are you able to get the backtrace from the console for that? Just to confirm it's the same crash. I'm pretty sure it is though.
-
@stephenw10
I've since cleared the file and switched devices (to check that the qat_200xx revision was in place for the D-1736NT). It did look the same though.๏ธ
-
not possible to keep testing after the latest news. thanks everyone
-
@AlexanderK said in Wan periodic reset causes system reboot.:
not possible to keep testing after the latest news. thanks everyone
That is a shame but understandable.
Hopefully this stupid mess is just for the short-term as it is hard to see why anyone testing could be expected to pay either a large sum of money, every year, or buy additional Netgate devices for testing purposes - especially if the end results are excluded from home or lab users.
I've got 11.5 months on my current pfSense+ licence so will continue to test, at least in the short term, so that Netgate can think this through. I accept though that many will dump any and all dev/beta testing out of principle, even if they don't run from pfSense.
๏ธ
-
Updates coming....
-
@stephenw10 said in Wan periodic reset causes system reboot.:
Updates coming....
...and I remain optimistic.
๏ธ
-
@stephenw10 said in Wan periodic reset causes system reboot.:
Updates coming....
Optimism now dead.
Current pfSense+ subscription that I use for dev/beta testing apparently dead too.
Don't know what to say really.
๏ธ
-
It should not be. Existing subs should still function until they expire at least. Are you no longer able to see pkgs?
Also further updates may still happen here.
-
@stephenw10 said in Wan periodic reset causes system reboot.:
It should not be. Existing subs should still function until they expire at least. Are you no longer able to see pkgs?
Also further updates may still happen here.
I can see packages at the moment (I'm on 23.09-RC) but the notice about current users was only just announced:
Please note that existing Home+Lab users who choose not to purchase a TAC Lite subscription will not receive updates when they are released.
Presumably you guys will be pruning current subscriptions. Mine is dated 13 Oct 2023, no doubt easy to find and remove.
Still don't know what to say really. I've invested time and money into this move to Netgate. I just didn't see this coming.
๏ธ