PFsense hanging since version 2.4.4
-
Hello!
I was hopeing someone has tips how to find the root cause of my problems. I dont know if it's software or hardware related.
Before 2.4.4 i had no problems with uptime. I have had my "router/firewall" for maybe 1,5 years now. It has crashed 1 time during that period. But after 2.4.4 was installed the computer crashes after 1-7 days. Yesterday there were 2 crashes. Sometimes it can run up to a week without crashing..
At the moment im running 2.4.4-RELEASE-p1 (amd64)
I'm not sure what the best way to search for the error is. Does anyone has any suggestions where to start?
I have checked S.M.A.R.T logs and statistics of the SSD. SMART check says Passed and there is no relocated sectors or anything. Uptime of the HDD is 588 days.
When the computer crashes the image shown on the display is distorted.. and i can not see anything special in the logs i think.. But i might have been checking the wrong place..
The pfsense is running in my home. I'm running two OpenVPN tunnels on it. One PIA tunnel and one tunnel to my work.
Traffic is maybe around 30GB/day on it so nothing extreme :)
dmesg below:
[2.4.4-RELEASE][root@pfSense.localdomain]/root: dmesg Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.2-RELEASE-p4 #2 b00c407ba5d(RELENG_2_4_4): Mon Nov 26 11:41:48 EST 2018 root@buildbot2.nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense amd64 FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0) CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (2000.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x30678 Family=0x6 Model=0x37 Stepping=8 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x101<LAHF,Prefetch> Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 2147483648 (2048 MB) avail memory = 1898496000 (1810 MB) Event timer "LAPIC" quality 600 ACPI APIC Table: <ALASKA A M I > WARNING: L1 data cache covers less APIC IDs than a core 0 < 1 FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs FreeBSD/SMP: 1 package(s) x 4 core(s) Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/32 (20171214/tbfadt-748) WARNING: Bogus Interrupt Polarity. Assume CONFORMS ioapic0 <Version 2.0> irqs 0-86 on motherboard SMP: AP CPU #3 Launched! SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! Timecounter "TSC" frequency 2000056392 Hz quality 1000 ipw_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_bss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_bss_fw, 0xffffffff80680430, 0) error 1 random: entropy device external interface ipw_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_ibss: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_ibss_fw, 0xffffffff806804e0, 0) error 1 ipw_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_ipw.LICENSE. ipw_monitor: If you agree with the license, set legal.intel_ipw.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (ipw_monitor_fw, 0xffffffff80680590, 0) error 1 iwi_bss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_bss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_bss_fw, 0xffffffff806a7460, 0) error 1 iwi_ibss: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_ibss: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_ibss_fw, 0xffffffff806a7510, 0) error 1 iwi_monitor: You need to read the LICENSE file in /usr/share/doc/legal/intel_iwi.LICENSE. iwi_monitor: If you agree with the license, set legal.intel_iwi.license_ack=1 in /boot/loader.conf. module_register_init: MOD_LOAD (iwi_monitor_fw, 0xffffffff806a75c0, 0) error 1 wlan: mac acl policy registered kbd0 at kbdmux0 netmap: loaded module random: registering fast source Intel Secure Key RNG random: fast provider: "Intel Secure Key RNG" nexus0 cryptosoft0: <software crypto> on motherboard padlock0: No ACE support. acpi0: <ALASKA A M I > on motherboard acpi0: Power Button (fixed) unknown: I/O range not supported cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 cpu2: <ACPI CPU> on acpi0 cpu3: <ACPI CPU> on acpi0 atrtc0: <AT realtime clock> port 0x70-0x77 on acpi0 atrtc0: Warning: Couldn't map I/O. atrtc0: registered as a time-of-day clock, resolution 1.000000s Event timer "RTC" frequency 32768 Hz quality 0 hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 8 on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 450 Event timer "HPET1" frequency 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz quality 440 attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> port 0xf080-0xf087 mem 0xd0000000-0xd03fffff,0xc0000000-0xcfffffff irq 16 at device 2.0 on pci0 vgapci0: Boot video device ahci0: <AHCI SATA controller> port 0xf070-0xf077,0xf060-0xf063,0xf050-0xf057,0xf040-0xf043,0xf020-0xf03f mem 0xd0a16000-0xd0a167ff irq 19 at device 19.0 on pci0 ahci0: AHCI v1.30 with 2 3Gbps ports, Port Multiplier not supported ahcich1: <AHCI channel> at channel 1 on ahci0 xhci0: <Intel BayTrail USB 3.0 controller> mem 0xd0a00000-0xd0a0ffff irq 20 at device 20.0 on pci0 xhci0: 32 bytes context size, 64-bit DMA xhci0: Port routing mask set to 0xffffffff usbus0 on xhci0 usbus0: 5.0Gbps Super Speed USB v3.0 pci0: <encrypt/decrypt> at device 26.0 (no driver attached) hdac0: <Intel BayTrail HDA Controller> mem 0xd0a10000-0xd0a13fff irq 22 at device 27.0 on pci0 pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pcib1: [GIANT-LOCKED] pci1: <ACPI PCI bus> on pcib1 igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xe000-0xe01f mem 0xd0900000-0xd091ffff,0xd0920000-0xd0923fff irq 16 at device 0.0 on pci1 igb0: Using MSIX interrupts with 3 vectors igb0: Ethernet address: 00:0e:c4:d0:64:a5 igb0: Bound queue 0 to cpu 0 igb0: Bound queue 1 to cpu 1 igb0: netmap queues/slots: TX 2/1024, RX 2/1024 pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.1 on pci0 pcib2: [GIANT-LOCKED] pci2: <ACPI PCI bus> on pcib2 igb1: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xd000-0xd01f mem 0xd0800000-0xd081ffff,0xd0820000-0xd0823fff irq 17 at device 0.0 on pci2 igb1: Using MSIX interrupts with 3 vectors igb1: Ethernet address: 00:0e:c4:d0:64:a6 igb1: Bound queue 0 to cpu 2 igb1: Bound queue 1 to cpu 3 igb1: netmap queues/slots: TX 2/1024, RX 2/1024 pcib3: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0 pcib3: [GIANT-LOCKED] pci3: <ACPI PCI bus> on pcib3 igb2: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xc000-0xc01f mem 0xd0700000-0xd071ffff,0xd0720000-0xd0723fff irq 18 at device 0.0 on pci3 igb2: Using MSIX interrupts with 3 vectors igb2: Ethernet address: 00:0e:c4:d0:64:a7 igb2: Bound queue 0 to cpu 0 igb2: Bound queue 1 to cpu 1 igb2: netmap queues/slots: TX 2/1024, RX 2/1024 pcib4: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0 pcib4: [GIANT-LOCKED] pci4: <ACPI PCI bus> on pcib4 igb3: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port 0xb000-0xb01f mem 0xd0600000-0xd061ffff,0xd0620000-0xd0623fff irq 19 at device 0.0 on pci4 igb3: Using MSIX interrupts with 3 vectors igb3: Ethernet address: 00:0e:c4:d0:64:a8 igb3: Bound queue 0 to cpu 2 igb3: Bound queue 1 to cpu 3 igb3: netmap queues/slots: TX 2/1024, RX 2/1024 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 acpi_button0: <Power Button> on acpi0 acpi_button1: <Sleep Button> on acpi0 acpi_tz0: <Thermal Zone> on acpi0 uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 ppc0: cannot reserve I/O port range est0: <Enhanced SpeedStep Frequency Control> on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est0 attach returned 6 est1: <Enhanced SpeedStep Frequency Control> on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est1 attach returned 6 est2: <Enhanced SpeedStep Frequency Control> on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est2 attach returned 6 est3: <Enhanced SpeedStep Frequency Control> on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est3 attach returned 6 Timecounters tick every 1.000 msec hdacc0: <Intel (0x2882) HDA CODEC> at cad 2 on hdac0 hdaa0: <Intel (0x2882) Audio Function Group> at nid 1 on hdacc0 pcm0: <Intel (0x2882) (HDMI/DP 8ch)> at nid 4 on hdaa0 ugen0.1: <0x8086 XHCI root HUB> at usbus0 uhub0: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0 uhub0: 7 ports with 7 removable, self powered ugen0.2: <Ralink 802.11 n WLAN> at usbus0 run0 on uhub0 run0: <1.0> on usbus0 run0: MAC/BBP RT3070 (rev 0x0201), RF RT3020 (MIMO 1T1R), address 74:f0:6d:87:29:54 ugen0.3: <vendor 0x05e3 USB2.0 Hub> at usbus0 uhub1 on uhub0 uhub1: <vendor 0x05e3 USB2.0 Hub, class 9/0, rev 2.00/85.36, addr 2> on usbus0 uhub1: 4 ports with 4 removable, self powered ada0 at ahcich1 bus 0 scbus0 target 0 lun 0 ada0: <SanDisk SSD i110 32GB i212000> ACS-2 ATA SATA 3.x device ada0: Serial Number 1611151144 ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes) ada0: Command Queueing enabled ada0: 30533MB (62533296 512 byte sectors) Trying to mount root from ufs:/dev/ufsid/58d112b7ff55e501 [rw]... WARNING: / was not properly dismounted random: unblocking device. CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (2000.06-MHz K8-class CPU) Origin="GenuineIntel" Id=0x30678 Family=0x6 Model=0x37 Stepping=8 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRAND> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x101<LAHF,Prefetch> Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS,NFPUSG> VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics coretemp0: <CPU On-Die Thermal Sensors> on cpu0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est0 attach returned 6 coretemp1: <CPU On-Die Thermal Sensors> on cpu1 est1: <Enhanced SpeedStep Frequency Control> on cpu1 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est1 attach returned 6 coretemp2: <CPU On-Die Thermal Sensors> on cpu2 est2: <Enhanced SpeedStep Frequency Control> on cpu2 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est2 attach returned 6 coretemp3: <CPU On-Die Thermal Sensors> on cpu3 est3: <Enhanced SpeedStep Frequency Control> on cpu3 est: CPU supports Enhanced Speedstep, but is not recognized. est: cpu_vendor GenuineIntel, msr 7c000000183e device_attach: est3 attach returned 6 igb0: link state changed to UP tun1: changing name to 'ovpnc1' ovpnc1: link state changed to UP tun2: changing name to 'ovpnc2' pflog0: promiscuous mode enabled igb1: link state changed to UP ovpnc2: link state changed to UP
-
Not sure that dmesg is going to show us much, might be better to get us a copy of the system logs if at all possible.
Looking over what you have posted us a few things I would do.
1.) The message
WARNING: / was not properly dismounted
makes me think you should look into booting into single user mode and running FSCK a few times.2.) There was an issue in unbound, and memory leaks, you should upgrade to 2.4.4-p1
-
Thanks for your reply :)
Yes, i have been running a FSCK a few times and repaired the system. But it did crash again after a few days anyway. That message is because of the crashes yesterday i suppose :)
Is unbound the DNS resolver? I'm running that.. But I'm also running version 2.4.4-p1 though.. Maybe i should try upgrading to something else?
What kind of system logs would be useful?
-
It's tricky to time it right, if you have an external logging server this is where it get's helpful.
Basically you want to get a copy of the system log right after a reboot, sometimes there is info logged in /var/crash that can be helpful.
If there is nothing logged, chances are high it's hardware related, and it's starting to fail out (RAM, HDD, etc).
-
I wish I could point you to something more specific, but I can tell you that I've been running PFsense for 10 years since v1.2.3 and every problem I've had in that time frame has been linked to hardware issues.
I had similar symptoms a few weeks back where it would appear to crash every 3-7 days. In my situation, the symptoms started with no internet, next I found I couldn't ping PFsense, next while investigating the box I would find the PS fan whining loudly on full blast. My box is headless so I couldn't see the console, but a reboot seemed to set everything right for another 3-7 days.
After each crash, there was a crash log waiting for me in the GUI, which I'm not familiar with parsing, so I started investigating my hardware before I posted my crash logs in the forums. Upon opening the case, I found my CPU and front intake fans caked with dust and a loose SATA cable on my HDD. So, after blowing out all the dust in the case, re-seating the SATA cable on my HDD, re-seating the RAM, and replacing the PS... the frequent crashing went away and PFsense has now been stable for the last 3+ weeks.
Before my recent hardware issues, my box would stay up for 6-12 months at a time.... only going down due to forced reboots on firmware updates. Otherwise, there's no telling how long my uptimes would be.
No software is perfect, but my money is on something in your hardware. I would do a deep dive into your hardware (e.g. blow out dust, re-seat connections, re-seat the RAM, try new RAM, etc, etc). After that, post the crash log here.
-
@chrismacmahon said in PFsense hanging since version 2.4.4:
It's tricky to time it right, if you have an external logging server this is where it get's helpful.
Basically you want to get a copy of the system log right after a reboot, sometimes there is info logged in /var/crash that can be helpful.
When you say system log is that the system.log in /var/log? In that case it does not contain anything during the crash or before when it happens... /var/crash is empty so i guess it points to a hardware issue..
If there is nothing logged, chances are high it's hardware related, and it's starting to fail out (RAM, HDD, etc).
I have taken the mini pc apart. No dust in it since its fan-less system. I removed the M2 drive and RAM and put them back again just in case.
I also removed the Wifi chip in it since i do not use it anymore and one less problem cause to worry about :)
-
our book has all the information needed on logs: https://www.netgate.com/docs/pfsense/monitoring/system-logs.html
If you want to view from the CLI: https://www.netgate.com/docs/pfsense/monitoring/working-with-binary-circular-logs-clog.html
-
@marvosa said in PFsense hanging since version 2.4.4:
No software is perfect, but my money is on something in your hardware. I would do a deep dive into your hardware (e.g. blow out dust, re-seat connections, re-seat the RAM, try new RAM, etc, etc). After that, post the crash log here.
Yeah.. I had a nightmare with another system here a few months ago.. Windows system. It did bluescreen now and then.. After about 1-2 months searching for the fault, checking drivers, viruses, replaced hd, reinstalled windows on another hd, replaced the RAM sticks, changed GFX card and finally i replaced the PSU and that solved the problem! .. I have never in my 25-30 year PC career have had a faulty PSU that created blue screens... i even measured the old PSU with a Fluke measuring tool and the voltages seemed fine and no fluctuation what i could find... But bluescreens happened sporadically .. 1-3 days apart...
Anyway... thats off topic.. haha..
I opened the mini-pc.. no dust (fan less) removed RAM and Disk and put back again... Removed WIFI-mini chip. I will let it run for a few days now and see... If it crashes again i will move the machine to a stable 12V power supply to rule out the PSU (thats the easiest thing for me to test right now) .. After that i have to decide if i should buy new RAM or HDD first... What do you think? :)
-
How old is the system?
-
@chrismacmahon said in PFsense hanging since version 2.4.4:
How old is the system?
It's from 1st of April 2017 so not very old..
It is ordered from Aliexpress.com though...........
-
Computer hanged again.. After 1day and 8 hours. I will update to 2.4.4 p2 and if it hangs again i will connect it to another PSU to test one thing each time.. Will let you know in here to help out others maybe :)
-
It sounds like it's hardware.
I'm not in your shoes, but chasing hardware faults is difficult at best sometimes.
When I had this happen 10 years ago, in my home, my wife insisted i fix it.... I ended up buying new hardware as that was the fastest path to resolution. Good luck!
-
It hung again. So now im running on another PSU. Also running a memtest now with Memtest86+... Will keep you updated... :) When it hung the console screen were just frozen and no special messages.
If it freezes again next step is to run it from USB stick i guess...
If it frezes then i will test older releases of pfsense. -
This is a hardware fault.
If you have another device to swap out I would do so, or if you can run a VM to test I would.
-
@chrismacmahon said in PFsense hanging since version 2.4.4:
This is a hardware fault.
If you have another device to swap out I would do so, or if you can run a VM to test I would.
A VM to test what exactly? A VM of my current installation?
The router hanged again an hour ago. Im now making a USB stick to try to run my config from it if possible... Live.
-
If you have the hardware to spin up a virtual machine, you can import your working config into the VM and run off of that.
-
@chrismacmahon said in PFsense hanging since version 2.4.4:
If you have the hardware to spin up a virtual machine, you can import your working config into the VM and run off of that.
ok.. i dont have a ESXi machine at home.. but at work.. i will try to reinstall 2.4.4.. on the computer first.. or run 2.3 if that failes.
-
i tried to create a USB stick with 2.4.4 p1 but after installing i get into that crappy serial console bug.. i pressed ESC to type "set kern.vty=sc" but then booting into multi user mode i seem to get a crash.. text scrolls by too fast for me to read.. so after ESC i want to boot into single user mode to be able to see the boot errors.. but how do i boot from CLI (ESC) into single user mode.. i have tried to google for it but cannot find anything... i tried like boot single and such without success.. please help!
I really wanted to try 2.4.4 before i go back and try 2.3 or something that might work better...
-
Not sure, I would re-burn the image, try again...if it happens again the hardware issue is the problem.
We are a fan of Etcher.io
-
Found it... "boot -s" it is :) .. Will try it and review the logs.