Wan periodic reset causes system reboot.

stephenw10

Yes it's big! The interesting part is right at the beginning of the report. For example:

debug.kdb.panic:panic: kdb_sysctl_panic
cpuid = 3
time = 1691688092
KDB: enter: panic
[ thread pid 30195 tid 100206 ]
Stopped at      kdb_enter+0x32: movq    $0,0x2342e13(%rip)
db:0:kdb.enter.default> textdump set
textdump set
db:0:kdb.enter.default>  capture on
db:0:kdb.enter.default>  run pfs
db:1:pfs> bt
Tracing pid 30195 tid 100206 td 0xfffffe00c73a4900
kdb_enter() at kdb_enter+0x32/frame 0xfffffe00c713baf0
vpanic() at vpanic+0x183/frame 0xfffffe00c713bb40
panic() at panic+0x43/frame 0xfffffe00c713bba0
kdb_sysctl_panic() at kdb_sysctl_panic+0x61/frame 0xfffffe00c713bbd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x90/frame 0xfffffe00c713bc20
sysctl_root() at sysctl_root+0x216/frame 0xfffffe00c713bca0
userland_sysctl() at userland_sysctl+0x177/frame 0xfffffe00c713bd50
sys___sysctl() at sys___sysctl+0x5c/frame 0xfffffe00c713be00
amd64_syscall() at amd64_syscall+0x109/frame 0xfffffe00c713bf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00c713bf30
--- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x3dada5b3892a, rsp = 0x3dada42e1158, rbp = 0x3dada42e1190 ---

That was a crash I triggered manually. Yours will probably be similar to that shown in the bug report linked above.
Unfortunately those pictures are all from after that part.

AlexanderK

@stephenw10
https://forums.freebsd.org/threads/tip-log-console-messages.10090/

Tried to enable console logs with no result.
What i can do to help ?

stephenw10

Are you not able to scroll back far enough to see the initial backtrace?

pfSense doesn't use the standard log conf file it uses: /var/etc/syslog.d/pfSense.conf
So you'd have to try that there instead. I have never tried that.

Steve

AlexanderK

@stephenw10 the first picture was the highest i could go....
So it is not possible.
I will try the log

stephenw10

Hmm, still odd that it doesn't save the panic as a crash report by default. The only reasons that wouldn't happen I can think of are if there's no SWAP configured (there is) or if the drive is failing and it's inaccessible at that point.

AlexanderK

reboots still happening.... any idea what to do - check?

stephenw10

Did you manage to log anything?

If you are hitting that IPv6 issue though you could test that by simply disabling IPv6. If it still reboots when PPPoE resets that you're not hitting that bug.

AlexanderK

@stephenw10
No i couldn't log something.
I trying to enable console logs.
But it is not working
.
I will disable ipv6 and check again.

YannTKO

@AlexanderK

Hi,
I faced a similar issue without understanding the root cause.
If i remember well, I disabled the gateway monitoring.
dpinger (Gateway Monitoring Daemon) was probably my problem but I don't know why.
Do you monitor your gateway ? If yes, you can try to disable it.
Regards.
Yann.

stephenw10

Potentially that might avoid it if the only thing using IPv6 is the gateway monitoring at that time. it would be an interesting test.

AlexanderK

@YannTKO i enabled again ipv6, disable monitor gateway of ipv6 and again reboot.. so dpinger is not the issue

AlexanderK

@stephenw10 disabling ipv6 again reboot. i think it is something with periodic reset

stephenw10

Previously you say it was rebooting when you manually restarted the WAN. Is that still the case with IPv6 disabled?

AlexanderK

@stephenw10 yes but i need to check again

AlexanderK

I triggered it again with different procedure.
I simply disconnected wan interface with ipv6 enabled.... system reboots.

RobbieTT

@AlexanderK said in Wan periodic reset causes system reboot.:

I triggered it again with different procedure.
I simply disconnected wan interface with ipv6 enabled.... system reboots.

Can you do each procedure with and without IPv6 as it is a bit scatter-gun at the moment?

It will help the diagnostics as there is potentially more than 1 issue at play.

️

AlexanderK

@RobbieTT without ipv6 everything works perfect.
With ipv6 enabled i have the reboots

RobbieTT

@AlexanderK
Ok, sounds like one of the issues I have and it has been recognised by Netgate too, with an associated redmine entry.

@stephenw10
Steve, another one for you I guess. The absence of suitable logs is a bit puzzling though. I'm not sure if you guys have made any progress resolving this IPv6 related event though?

️

stephenw10

We think we know what might cause it but without being able to replicate it locally it's difficult to prove.

23.09 dev snapshots have been enabled for anyone able to test against that. It would be good to know if there uis any difference there.

RobbieTT

@stephenw10
I only have one Netgate device and that is in production but hopefully someone has a unit they can test with.

Unless you have a unit you can send for testing Steve?

️