Random kernel panic and restart on 2.7.2
-
Hi,
after a few years of trouble-free use, I moved my pfSense instance from a Chinese mini-pc to a 1RU mini-server with a Supermicro A2SDi-4C-HLN4F card with Intel Atom C3558.I reinstalled the instance from scratch with ISO 2.7.2 and restored the config.xml file, with some changes manually applied (different network card names between the two platforms).
Everything worked perfectly for a few hours, then the system started randomly rebooting and going into kernel panic.
I tried updating Bios and BMC to the latest version (with default bios settings), disabling aes-ni and rdrand encryption on VPNs. No changes.
I started the server with memtest and let the test run for 16 hours straight, no errors and no reboots.
I ran also a smart self-test on the two disks (configured in ZFS mirror from setup), no errors.I don't know what to check or try anymore.
Reboots happen at irregular intervals, sometimes 45 minutes apart, other times after 8-10 hours. No relation to system load (which is very very low, since it's just a home lab firewall with limited WAN bandwidth).
What's even stranger is that the same mini-server previously hosted another instance of pfSense ("plus", the only difference). It wasn't used continuously, but it never presented any problems of that kind.
-
Here the crash dump and some information, in the hope that some of you can help me understand what is happening.
Thanks in advance!
Byeversion.txt
FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec 6 20:45:47 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/amd64.amd64/sys/pfSense
config.txt
options CONFIG_AUTOGENERATED ident pfSense machine amd64 cpu HAMMER makeoptions WITH_CTF=1 makeoptions DEBUG=-g options RSS options RATELIMIT options MROUTING options MSGTQL=2048 options MSGSSZ=32 options MSGSEG=512 options MSGMNI=40 options MSGMNB=8192 options ALTQ_CODEL options ALTQ_NOPCC options ALTQ_FAIRQ options ALTQ_PRIQ options ALTQ_HFSC options ALTQ_RIO options ALTQ_RED options ALTQ_CBQ options ALTQ options IPSEC options TCP_SIGNATURE options NETGRAPH_PRED1 options NETGRAPH_DEFLATE options NETGRAPH_CAR options NETGRAPH_PIPE options NETGRAPH_TCPMSS options NETGRAPH_TEE options NETGRAPH_HOLE options NETGRAPH_FRAME_RELAY options NETGRAPH_ASYNC options NETGRAPH_ECHO options NETGRAPH_CISCO options NETGRAPH_BRIDGE options NETGRAPH_ONE2MANY options NETGRAPH_LMI options NETGRAPH_KSOCKET options NETGRAPH_VJC options NETGRAPH_UI options NETGRAPH_MPPC_ENCRYPTION options NETGRAPH_TTY options NETGRAPH_SOCKET options NETGRAPH_RFC1490 options NETGRAPH_PPTPGRE options NETGRAPH_PPPOE options NETGRAPH_PPP options NETGRAPH_EIFACE options NETGRAPH_IFACE options NETGRAPH_ETHER options NETGRAPH_BPF options NETGRAPH_L2TP options NETGRAPH_VLAN options NETGRAPH options IPSTEALTH options IPFIREWALL_VERBOSE options IPFIREWALL_DEFAULT_TO_ACCEPT options ALQ options NULLFS options GEOM_BDE options GEOM_ELI options GEOM_UZIP options GEOM_MIRROR options IICHID_SAMPLING options EVDEV_SUPPORT options XENHVM options ATH_ENABLE_11N options AH_AR5416_INTERRUPT_MITIGATION options IEEE80211_SUPPORT_MESH options SC_PIXEL_MODE options PPS_SYNC options COMPAT_LINUXKPI options PCI_IOV options PCI_HP options IOMMU options SMP options NETGDB options NETDUMP options DEBUGNET options ZSTDIO options GZIO options EKCD options VERBOSE_SYSINIT=0 options GDB options DDB options KDB options RCTL options RACCT_DEFAULT_TO_DISABLED options RACCT options INCLUDE_CONFIG_FILE options DDB_CTF options KDTRACE_HOOKS options KDTRACE_FRAME options CAPABILITIES options CAPABILITY_MODE options AUDIT options HWPMC_HOOKS options KBD_INSTALL_CDEV options PRINTF_BUFR_SIZE=128 options _KPOSIX_PRIORITY_SCHEDULING options SYSVSEM options SYSVMSG options SYSVSHM options STACK options KTRACE options SCSI_DELAY=5000 options COMPAT_FREEBSD13 options COMPAT_FREEBSD12 options COMPAT_FREEBSD11 options COMPAT_FREEBSD10 options COMPAT_FREEBSD9 options COMPAT_FREEBSD32 options EFIRT options GEOM_LABEL options GEOM_RAID options TMPFS options PSEUDOFS options PROCFS options CD9660 options MSDOSFS options NFS_ROOT options NFSLOCKD options NFSD options NFSCL options MD_ROOT options QUOTA options UFS_GJOURNAL options UFS_DIRHASH options UFS_ACL options SOFTUPDATES options FFS options KERN_TLS options SCTP_SUPPORT options TCP_RFC7413 options TCP_HHOOK options TCP_BLACKBOX options TCP_OFFLOAD options FIB_ALGO options ROUTE_MPATH options IPSEC_SUPPORT options INET6 options INET options VIMAGE options PREEMPTION options NUMA options SCHED_ULE options NETLINK options NEW_PCIB options CC_CUBIC options GEOM_PART_GPT options GEOM_PART_MBR options GEOM_PART_EBR options GEOM_PART_BSD options EARLY_AP_STARTUP options DDB options TMPFS options PPS_SYNC options COMPAT_LINUXKPI device isa device mem device io device uart_ns8250 device cpufreq device acpi device smbios device pci device fdc device ahci device ata device mvs device siis device ahc device ahd device hptiop device isp device mpt device mps device mpr device sym device isci device ocs_fc device pvscsi device scbus device ch device da device sa device cd device pass device ses device arcmsr device ciss device ips device smartpqi device tws device aac device aacp device aacraid device ida device mfi device mlx device mrsas device nvme device nvd device vmd device atkbdc device atkbd device psm device kbdmux device vga device splash device sc device vt device vt_vga device vt_efifb device vt_vbefb device agp device cbb device cardbus device uart device ppc device ppbus device lpt device ppi device puc device iflib device em device igc device ix device ixv device ixl device iavf device ice device vmx device axp device bxe device le device ti device mlx5 device mlxfw device mlx5en device miibus device ae device age device alc device ale device bce device bfe device bge device cas device dc device et device fxp device gem device jme device lge device msk device nfe device nge device re device rl device sge device sis device sk device ste device stge device vge device vr device xl device wlan device wlan_wep device wlan_ccmp device wlan_tkip device wlan_amrr device ath device ath_hal device ath_rate_sample device ipw device iwi device iwn device malo device mwl device ral device wpi device crypto device loop device padlock_rng device rdrand_rng device ether device vlan device tuntap device md device gif device firmware device xz device bpf device uhci device ohci device ehci device xhci device usb device ukbd device umass device sound device snd_cmi device snd_csa device snd_emu10kx device snd_es137x device snd_hda device snd_ich device snd_via8233 device mmc device mmcsd device sdhci device virtio device virtio_pci device vtnet device virtio_blk device virtio_scsi device virtio_balloon device kvm_clock device hyperv device xenefi device xenpci device xentimer device netmap device evdev device uinput device hid device wlan_rssadapt device wlan_xauth device wlan_acl device iwm device iwmfw device iwifw device ipwfw device wpifw device iwnfw device uath device ralfw device ural device urtw device rum device mwlfw device zyd device upgt device udav device axe device axge device aue device cue device kue device mos device rsu device rsufw device rtwn device rtwnfw device rtwn_pci device rtwn_usb device run device runfw device rue device bwn device bwi device ufoma device ucom device uslcom device uplcom device umct device uvisor device uark device uftdi device uvscom device umodem device u3g device cdce device uhid device firewire device sbp device gre device if_bridge device carp device lagg device vte device enc device pf device pflog device pfsync device rndtest device speaker device mxge device cxgb device cxgbe device oce device mlx4 device mlx4en device qlxgb device bnxt device virtio_console
panic.txt
double fault
-
-
Hmm, that backtrace is huge:
db:0:kdb.enter.default> bt Tracing pid 0 tid 100008 td 0xfffffe0020516000 kdb_enter() at kdb_enter+0x32/frame 0xfffffe002036acd0 vpanic() at vpanic+0x163/frame 0xfffffe002036ae00 panic() at panic+0x43/frame 0xfffffe002036ae60 dblfault_handler() at dblfault_handler+0x1ce/frame 0xfffffe002036af20 Xdblfault() at Xdblfault+0xd7/frame 0xfffffe002036af20 --- trap 0x17, rip = 0xffffffff80f6bc74, rsp = 0xfffffe001d7d2000, rbp = 0xfffffe001d7d2000 --- ipsec6_checkpolicy() at ipsec6_checkpolicy+0x4/frame 0xfffffe001d7d2000 ipsec6_common_output() at ipsec6_common_output+0x28/frame 0xfffffe001d7d2040 ip6_output() at ip6_output+0x102/frame 0xfffffe001d7d2260 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d22c0 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d2490 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d24c0 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d24f0 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d2530 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d25a0 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d25c0 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d2660 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d26c0 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d2700 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d2920 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d2980 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d2b50 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d2b80 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d2bb0 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d2bf0 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d2c60 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d2c80 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d2d20 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d2d80 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d2dc0 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d2fe0 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d3040 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d3210 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d3240 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d3270 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d32b0 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d3320 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d3340 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d33e0 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d3440 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d3480 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d36a0 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d3700 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d38d0 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d3900 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d3930 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d3970 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d39e0 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d3a00 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d3aa0 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d3b00 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d3b40 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d3d60 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d3dc0 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d3f90 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d3fc0 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d3ff0 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d4030 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d40a0 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d40c0 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d4160 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d41c0 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d4200 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d4420 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d4480 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d4650 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d4680 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d46b0 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d46f0 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d4760 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d4780 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d4820 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d4880 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d48c0 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d4ae0 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d4b40 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d4d10 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d4d40 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d4d70 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d4db0 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d4e20 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d4e40 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d4ee0 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d4f40 ip6_output_send() at ip6_output_send+0xe3/frame 0xfffffe001d7d4f80 ip6_output() at ip6_output+0x1d57/frame 0xfffffe001d7d51a0 pf_refragment6() at pf_refragment6+0x1ab/frame 0xfffffe001d7d5200 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d53d0 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d5400 pfil_mbuf_out() at pfil_mbuf_out+0x38/frame 0xfffffe001d7d5430 enc_hhook() at enc_hhook+0x262/frame 0xfffffe001d7d5470 hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe001d7d54e0 ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe001d7d5500 ipsec6_perform_request() at ipsec6_perform_request+0x76/frame 0xfffffe001d7d55a0 ipsec_transmit() at ipsec_transmit+0x170/frame 0xfffffe001d7d5600 ip6_forward() at ip6_forward+0x99c/frame 0xfffffe001d7d5700 pf_refragment6() at pf_refragment6+0x18d/frame 0xfffffe001d7d5760 pf_test6() at pf_test6+0x153b/frame 0xfffffe001d7d5930 pf_check6_out() at pf_check6_out+0x43/frame 0xfffffe001d7d5960 pfil_mbuf_fwd() at pfil_mbuf_fwd+0x38/frame 0xfffffe001d7d5990 ip6_forward() at ip6_forward+0x3fd/frame 0xfffffe001d7d5a90 ip6_input() at ip6_input+0xa57/frame 0xfffffe001d7d5b70 netisr_dispatch_src() at netisr_dispatch_src+0x22c/frame 0xfffffe001d7d5bc0 ether_demux() at ether_demux+0x149/frame 0xfffffe001d7d5bf0 ether_nh_input() at ether_nh_input+0x36e/frame 0xfffffe001d7d5c50 netisr_dispatch_src() at netisr_dispatch_src+0xaf/frame 0xfffffe001d7d5ca0 ether_input() at ether_input+0x69/frame 0xfffffe001d7d5d00 iflib_rxeof() at iflib_rxeof+0xc46/frame 0xfffffe001d7d5e00 _task_fn_rx() at _task_fn_rx+0x72/frame 0xfffffe001d7d5e40 gtaskqueue_run_locked() at gtaskqueue_run_locked+0x14e/frame 0xfffffe001d7d5ec0 gtaskqueue_thread_loop() at gtaskqueue_thread_loop+0xc2/frame 0xfffffe001d7d5ef0 fork_exit() at fork_exit+0x7f/frame 0xfffffe001d7d5f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001d7d5f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Do the crashes always show that? Or at least very similar to that?
It looks similar to this: https://redmine.pfsense.org/issues/14431 Though I'd expect to see something logged due to an interface or link going down. You didn't remove anything from the msgbuf output?
Steve
-
@stephenw10 except for PIDs, they are near identical from one crash to another.
I'll take a look at your link.
Thank you! -
I would guess it's triggered by an IPSec tunnel carrying IPv6 if you have that?
Of course that should not panic...
-
Well, it seems that the issue can be related to the bug...!!
I provide some context for better understanding.
This is my home-office firewall.
I do a lot of remote work as IT manager for some customers, and I connect to the internet via a dual 4G setup (using two different providers and pointing different BTS), since I live in a remote location (not a great choice for an IT guy, but hey...).
So I've setted up many IPSec VPNs with dual tunnel for every customer using router VTI and BGP dynamic routing via FRR.
With this setup, I can continue to work even when one of my WANs goes down or loss packets (which is quite common).
All this on IPv4 world. (no IPv6 connectivity from mobile providers here)But I've also the same connection schema to a location with IPv6 from which I take a /64 subnet to my house for experimental purposes.
So one of the two tunnel on this location is routing also this IPv6 subnet to my house on a second Phase 2 IPSec.The whole system worked very fine, but after learning about this bug, it could be that during a connection drop, even a brief one, there is some IPv6 traffic trying to pass through the offline ipsec interface...
This setup has actually been working for more than 1 year, but I previously tunneled via OpenVPN and only recently switched this IPv6 routing to IPSec.
I'll try shutting down the IPv6 stack entirely and see if that fixes it.
Thank you!
-
Yeah, that seems likely from that backtrace. If it always happens ai less than 12hrs that should be easy enough to test.
I've never been able to replicate that panic locally which means it's far more difficult to pin down.
-
Pretty frequent, so I think we'll know by tomorrow
Feb 8 17:36:18 root 26063 Bootup complete Feb 8 16:49:44 root 428 Bootup complete Feb 8 14:47:03 root 21766 Bootup complete Feb 8 13:45:17 root 93044 Bootup complete Feb 8 07:10:15 root 60172 Bootup complete Feb 8 00:14:39 root 77642 Bootup complete Feb 7 15:36:30 root 45258 Bootup complete Feb 7 14:53:23 root 98846 Bootup complete Feb 7 13:40:11 root 15021 Bootup complete Feb 7 12:38:21 root 49059 Bootup complete Feb 7 10:34:49 root 34494 Bootup complete Feb 7 10:16:51 root 75312 Bootup complete Feb 7 08:35:36 root 62958 Bootup complete Feb 7 07:53:43 root 54954 Bootup complete Feb 6 23:09:37 root 60224 Bootup complete Feb 6 22:28:15 root 96734 Bootup complete Feb 6 21:22:09 root 802 Bootup complete
-
I am curious as to what I can search through / look for in my crash dumps as I have been crashing pretty regularly on 2.7.2 to see if this is similar. Or what files would someone like to view ?
-
The most telling line in the backtrace is probably:
ip6_output()
Though that doesn't always appear as you can see in the bug report where is happens on ppp links. -
Uptime: 15h 44m
I'll keep an eye on it today too, but it seems that disabling IPv6 subnet tunneling solved the problem, so it's probably the same anomaly.
Do you know if there is a planned fix for this problem also on pfSense CE?
As a workaround on my specific problem, I could try to restore that routing on OpenVPN tunnels as before (which had never given this problem), but it would be nice if it were solved.
Thanks,
Edoardo -
@EdoFede said in Random kernel panic and restart on 2.7.2:
I could try to restore that routing on OpenVPN tunnels as before
That would be a good test. I would expect both VPN types to go down at the same time so IPv6 sessions over both should behave similarly. So if it doesn't panic over OpenVPN then it's handling that differently which could be a clue.
-
I'm trying to replicate the setup of IPv6 routing on OpenVPN, but something doesn't work as expected.
I'm not doing the exact same way as before (that worked...both IPv4 and IPv6 tunneling) because I've already setted up IPv4 over IPSec + BGP for this site and don't want to brake the whole setup.I'm trying to route only IPv6 traffic over the OpenVPN tunnel (that has IPv4 endpoints as before), but something is wrong, I think on routing.
I'm able to ping6 google from the "remote" firewall via the tunnel, but not on the internal "remote" IPv6 network.I'll investigate and let you know if I can reproduce the issue even on OpenVPN.
I don't think that will happen anyway, because nothing like this has ever happened to me before with IPv6 and OpenVPN.Bye!
Edo -
Sounds like a missing iroute at the server end.
-
yeah, missing iroute!
Now fixed, thanks!Now I'll monitor the system and let you know if the issue happens also on OpenVPN.
Bye,
Edo -
it seems to be stable on OpenVPN... no reboots/crash at the moment.
The only setup difference with the IPSec configuration is that on IPSec I had to manually enter the default route (route -6 add default <tunnel endpoint>) because for some strange reason it was not set automatically (even if I selected the gateway as default in the routing menu).
I'll write if it happens again, but I would say that the problem only seems to be present on IPSec.