pfSense 2.7.2 freezing with no crash report after reboot
-
Greetings,
We are currently observing freezing of pfSense 2.7.2 systems, which are only accessible again after a reboot via Hyper-v.
During freezing, the system is neither accessible via GUI nor via SSH. The console is also no longer accessible in some cases.
What makes error analysis particularly difficult is that no dumps or error messages are generated. The failures can only be traced via gaps in the system logs.
We also cannot see any spikes in resource usage during the outages. The affected systems are partly updated but there are also systems freshly installed on 2.7.2.
Another commonality is that so far all affected systems have HA proxy installed.Do you have any idea how to recreate or better analyze the freezes?
Here is an example of a log: (Gap between Nov 13 10:23:59 - Nov 13 10:18:57)
Nov 13 10:23:59 kernel Launching APs: 1 Nov 13 10:23:59 kernel ioapic0 <Version 1.1> irqs 0-23 Nov 13 10:23:59 kernel ioapic0: MADT APIC ID 0 != hw id 2 Nov 13 10:23:59 kernel arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled. Nov 13 10:23:59 kernel random: fast provider: "Intel Secure Key RNG" Nov 13 10:23:59 kernel random: registering fast source Intel Secure Key RNG Nov 13 10:23:59 kernel FreeBSD/SMP: 1 package(s) x 1 core(s) x 2 hardware threads Nov 13 10:23:59 kernel FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs Nov 13 10:23:59 kernel ACPI APIC Table: <VRTUAL MICROSFT> Nov 13 10:23:59 kernel Event timer "LAPIC" quality 100 Nov 13 10:23:59 kernel avail memory = 979812352 (934 MB) Nov 13 10:23:59 kernel real memory = 1073741824 (1024 MB) Nov 13 10:23:59 kernel Hypervisor: Origin = "Microsoft Hv" Nov 13 10:23:59 kernel AMD Extended Feature Extensions ID EBX=0x6000d004<XSaveErPtr,IBPB,IBRS,STIBP> Nov 13 10:23:59 kernel XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES> Nov 13 10:23:59 kernel Structured Extended Features3=0x10<FSRM> Nov 13 10:23:59 kernel Structured Extended Features2=0x405f42<AVX512VBMI,AVX512VBMI2,GFNI,VAES,VPCLMULQDQ,AVX512VNNI,AVX512BITALG,AVX512VPOPCNTDQ,RDPID> Nov 13 10:23:59 kernel Structured Extended Features=0xf1bf03a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512IFMA,CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL> Nov 13 10:23:59 kernel AMD Features2=0x4003f3<LAHF,CMP,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,Topology> Nov 13 10:23:59 kernel AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM> Nov 13 10:23:59 kernel Features2=0xfed83203<SSE3,PCLMULQDQ,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV> Nov 13 10:23:59 kernel Features=0x1783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2,HTT> Nov 13 10:23:59 kernel Origin="AuthenticAMD" Id=0xa10f11 Family=0x19 Model=0x11 Stepping=1 Nov 13 10:23:59 kernel CPU: AMD EPYC 9354 32-Core Processor (3245.57-MHz K8-class CPU) Nov 13 10:23:59 kernel Timecounter "Hyper-V" frequency 10000000 Hz quality 2000 Nov 13 10:23:59 kernel Features3=0xbed7b2<DEBUG,XMMHC,IDLE,NUMA,TMFREQ,SYNCMC,CRASH,NPIEP> Nov 13 10:23:59 kernel PM Features=0x0 [C2] Nov 13 10:23:59 kernel Features=0x2e7f<VPRUNTIME,TMREFCNT,SYNIC,SYNTM,APIC,HYPERCALL,VPINDEX,REFTSC,IDLE,TMFREQ> Nov 13 10:23:59 kernel Hyper-V Version: 10.0.17763 [SP5] Nov 13 10:23:59 kernel VT(vga): text 80x25 Nov 13 10:23:59 kernel SRAT: Ignoring memory at addr 0x1000000000 Nov 13 10:23:59 kernel SRAT: Ignoring memory at addr 0x100000000 Nov 13 10:23:59 kernel SRAT: Ignoring memory at addr 0x40000000 Nov 13 10:23:59 kernel FreeBSD clang version 16.0.6 (https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152) Nov 13 10:23:59 kernel root@freebsd:/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/obj/amd64/StdASW5b/var/jenkins/workspace/pfSense-CE-snapshots-2_7_2-main/sources/FreeBSD-src-RELENG_2_7_2/amd64.amd64/sys/pfSense amd64 Nov 13 10:23:59 kernel FreeBSD 14.0-CURRENT amd64 1400094 #1 RELENG_2_7_2-n255948-8d2b56da39c: Wed Dec 6 20:45:47 UTC 2023 Nov 13 10:23:59 kernel FreeBSD is a registered trademark of The FreeBSD Foundation. Nov 13 10:23:59 kernel The Regents of the University of California. All rights reserved. Nov 13 10:23:59 kernel Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Nov 13 10:23:59 kernel Copyright (c) 1992-2023 The FreeBSD Project. Nov 13 10:23:59 kernel ---<<BOOT>>--- Nov 13 10:23:59 syslogd kernel boot file is /boot/kernel/kernel Nov 13 10:18:57 sshd 75283 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 33640 Nov 13 10:18:57 sshd 75283 Received disconnect from xxxx:xxxx:x:x:xxx::x port 33640:11: disconnected by user Nov 13 10:18:57 sshd 75283 Accepted publickey for root from xxxx:xxxx:x:x:xxx::x port 33640 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxx Nov 13 10:18:52 sshd 72955 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 40136 Nov 13 10:18:52 sshd 72955 Received disconnect from xxxx:xxxx:x:x:xxx::x port 40136:11: disconnected by user Nov 13 10:18:50 sshd 72955 Accepted publickey for root from xxxx:xxxx:x:x:xxx::x port 40136 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxx Nov 13 10:18:46 sshd 70399 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 40132 Nov 13 10:18:46 sshd 70399 Received disconnect from xxxx:xxxx:x:x:xxx::x port 40132:11: disconnected by user Nov 13 10:18:46 sshd 70399 Accepted publickey for root from xxxx:xxxx:x:x:xxx::x port 40132 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxx Nov 13 10:18:39 sshd 68841 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 60236 Nov 13 10:18:39 sshd 68841 Received disconnect from xxxx:xxxx:x:x:xxx::x port 60236:11: disconnected by user Nov 13 10:18:39 sshd 68841 Accepted publickey for root from xxxx:xxxx:x:x:xxx::x port 60236 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxx Nov 13 10:18:13 sshd 65925 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 46936 Nov 13 10:18:13 sshd 65925 Received disconnect from xxxx:xxxx:x:x:xxx::x port 46936:11: disconnected by user Nov 13 10:18:13 sshd 65925 Accepted publickey for root from xxxx:xxxx:x:x:xxx::x port 46936 ssh2: RSA SHA256:xxxxxxxxxxxxxxxxxxxx Nov 13 10:17:44 sshd 31345 Connection closed by xxxx:xxxx:x:x:xxx::x port 58628 [preauth] Nov 13 10:15:59 sshd 76242 Disconnected from user root xxxx:xxxx:x:x:xxx::x port 59520 Nov 13 10:15:59 sshd 76242 Received disconnect from xxxx:xxxx:x:x:xxx::x port 59520:11: disconnected by user
Ressourcen:
-
Hmm, what are those ssh connections from? Not that it should be a problem.
-
Hello,
thanks for your reply.
Those are connections from our monitoring server. No other users conncted to the system before the freeze.
-
Anything shown at the console? Can you setup a serial console and log that output?
-
Yes I checked console connection, but all you see are syslogd messages of user logins sometimes multiple days old.
The affected systems are unfortunately all live systems, which have to be made functional again as soon as possible, so I don't have much time for longer tests.
I have not yet been able to cause a freeze on my test systems.
Normally a freeze occurs every 2 - 4 weeks. However, there is no guarantee of this.
-
We have currently moved the HA proxy from one of the affected systems to another system to see if the freezes are still occurring
Perhaps this will help to narrow down the potential sources of error
-
@Techniker_ctr said in pfSense 2.7.2 freezing with no crash report after reboot:
The console is also no longer accessible in some cases.
Hmm, so sometimes the console was still responsive? I assume you were not able to connect out at that point?
-
It seems so. In all the freezes I've seen, the console was no longer functional, but my colleague said he could access a device via console.
At the next freeze I will actively check if more info is available via console
-
Make sure you try using
ctl+t
at the console. That can often show a stuck process when nothing else is responding. -
Hey,
We had another outage today, unfortunately no further information could be retrieved via the console.
The system that failed is an updated system that runs on ufs and does not use a ha proxy.
Failure pattern is the same again. Normal load and suddenly no response from the system and total traffic failure. As alsways no crash report.
The following are the system loads
RAM
Proccesses:
Any other Idea how we can track down the cause?
-
Do you see anything in Hyper-V? Usage spikes etc?
-
I appreciate your help
I have checked our Grafana for the HyperV again and can indeed detect a CPU spike on the system, just before the freeze.
And a complete loss of traffic
-
Hmm, so like some process uses all the CPU cycles until it's rebooted. Yet nothing is logged....