PfSense crashes ever few weeks - log is blank
-
Hi folks,
I've happily been using pfSense for home for about 6 months now. I am using the following:
2.2-RELEASE (amd64)
built on Thu Jan 22 14:03:54 CST 2015
FreeBSD 10.1-RELEASE-p4
I am running this on a Gigabyte GA-J1900N-D3V, 4GB RAM and a 64GB SSD.I have a problem where I would lose internet connection from some clients. Oddly, it affected most clients on my network by some were ok.
When I tried logging in to the GUI I would get php errors about /tmp/session <something or="" other="">was not found.
I SSH'd into the the box and could see the menu but trying to select an option, 8 for shell, spewed back php errors and I was still in the menu.So, I did a hard-boot on it and all is well now. It came up, nothing seems wrong.
Having a look at the system.log in /var/log/ shows a massive blank from about the time my wife said "there is something wrong with the internet" and the reboot. It's as if the machine was off. (pasted below)This has happened twice when I was using the alpha builds way back when and now on the release. I said nothing back then because it was alpha.
I'm more curious now as to why it happened and if there is any interest from the (awesome) writers of pfSense as to why it happens. Is it hardware or software bug?Thanks,
FredMar 14 09:03:07 pfSense kernel: ue0_vlan10: link state changed to UP Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0_vlan10 Mar 14 09:03:07 pfSense kernel: ue0: link state changed to DOWN Mar 14 09:03:07 pfSense kernel: ue0_vlan10: link state changed to DOWN Mar 14 09:03:07 pfSense kernel: ue0: link state changed to UP Mar 14 09:03:07 pfSense kernel: ue0_vlan10: link state changed to UP Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0_vlan10 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0_vlan10 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0_vlan10 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0 Mar 14 09:03:07 pfSense check_reload_status: Linkup starting ue0_vlan10 Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 09:03:08 pfSense php-fpm[8651]: /rc.linkup: Linkup detected on disabled interface...Ignoring Mar 14 12:52:08 pfSense syslogd: kernel boot file is /boot/kernel/kernel Mar 14 12:52:08 pfSense kernel: Copyright (c) 1992-2014 The FreeBSD Project. Mar 14 12:52:08 pfSense kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Mar 14 12:52:08 pfSense kernel: The Regents of the University of California. All rights reserved. Mar 14 12:52:08 pfSense kernel: FreeBSD is a registered trademark of The FreeBSD Foundation. Mar 14 12:52:08 pfSense kernel: FreeBSD 10.1-RELEASE-p4 #0 36d7dec(releng/10.1)-dirty: Thu Jan 22 15:12:35 CST 2015 Mar 14 12:52:08 pfSense kernel: root@pfsense-22-amd64-builder:/usr/obj.amd64/usr/pfSensesrc/src/sys/pfSense_SMP.10 amd64 Mar 14 12:52:08 pfSense kernel: FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 Mar 14 12:52:08 pfSense kernel: CPU: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz (2000.05-MHz K8-class CPU) Mar 14 12:52:08 pfSense kernel: Origin = "GenuineIntel" Id = 0x30673 Family = 0x6 Model = 0x37 Stepping = 3 Mar 14 12:52:08 pfSense kernel: Features=0xbfebfbff <fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe>Mar 14 12:52:08 pfSense kernel: Features2=0x41d8e3bf <sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,sse4.1,sse4.2,movbe,popcnt,tscdlt,rdrand>Mar 14 12:52:08 pfSense kernel: AMD Features=0x28100800 <syscall,nx,rdtscp,lm>Mar 14 12:52:08 pfSense kernel: AMD Features2=0x101 <lahf,prefetch>Mar 14 12:52:08 pfSense kernel: Structured Extended Features=0x2282 <tscadj,smep,erms>Mar 14 12:52:08 pfSense kernel: VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID</tscadj,smep,erms></lahf,prefetch></syscall,nx,rdtscp,lm></sse3,pclmulqdq,dtes64,mon,ds_cpl,vmx,est,tm2,ssse3,cx16,xtpr,pdcm,sse4.1,sse4.2,movbe,popcnt,tscdlt,rdrand></fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,dts,acpi,mmx,fxsr,sse,sse2,ss,htt,tm,pbe> ```</something>
-
USB Ethernet acting up? Try a real card.
-
Nope. Not using that. It's plugged in but not actually used for anything.
-
@FarmerB3rd:
Nope. Not using that. It's plugged in but not actually used for anything.
So remove it!!!
-
No - Don't unplug the USB thingy. Investigate it for a few weeks. Trouble shoot another couple of months. Try compiling a dozen different drivers… Don't give up on the USB NIC (that you aren't using)
-
I've had the USB NIC in there for a while. There is a problem with pfSense going belly-up. I am trying to understand why it is doing that. If it is the NIC then, and the logs or something else points towards it, I will happily drive over it. However, I am more interested in understanding why it went belly-up and why the logs have a black hole in them and if there is a bug, to log it so it can be improved further.
I have no doubt that the box will stay up for a few months now without a problem. It works really well, handles 5 concurrent VPNs, multiple VPN servers and moves about 50GB a day through the WAN. pfSense is very good.
-
Yeah, as said above, don't give up. The galore of "Linkup detected on disabled interface" log entries is definitely not good enough reason to remove crappy unused hardware! ;D ;D ;D
-
How do you know its not the USB port going bad or the USB device going bad. Its every bit as likely as some other piece of hardware going bad.
Plus they are crap… If you don't need it, that alone would motivate me to unplug it.
Will it cost you something to unplug it?
-
ok ok, it's out now :)
-
Cool - Now lets wait for a crash. Might it be a pfsense 2.2 distro issue? sure. Or maybe some other hardware issue? Maybe. We are 1 step closer to finding out. (-:
-
Ok, happy to wait for the next crash (well, not much choice there ;) ) but what can be done now to look for why it crashed previously? Any other logs I don't know about?
-
I don't know. Hardware doesn't always crash in a graceful way thats lets you know whats going on. Plus, I'm not super expert at finding the cause of weird crashes.
It is good advice to keep your hardware limited to exactly what you need and to take away anything not needed, especially if support for said hardware is flakey at best.
-
Fanless computer makes me think of heat issues, is PFSense reading your CPU temps? I might poke the heatsinks, and any chips on the board while the system is running to see if they're heating up. If you have a cruddy power supply, now would be an excellent time to get a proper one. 80Plus Bronze rating, and if an affordable unit claims to be C6/C7 or Haswell Ready, that's really important for this application. Most systems will draw 80Watts or more at idle, so many cheap PSUs don't bother testing lower power draw (like an atom board!).
64GB SSD sounds old… if it's an old OCZ drive or something sketchy and you don't need the space, try cloning PFSense to a flash drive. I had a Vector Plus R2 in my PFSense box and every few weeks the partition table decided to not exist. It's pretty hard to troubleshoot a bootloader error when the machine is buried in a closet. SMART info in the gui should be able to check for bad sectors, maybe even run a surface test. My OCZ completes an extended test in 10 seconds (64GB @ Sata II...impossibru!), and incidentally has a back SMART checksum according to gsmartcontrol.
If you can afford to take the machine down for a day or so, running Memtest would be a good idea.
-
@FarmerB3rd:
ok ok, it's out now :)
Good! :) I suspect there's a good chance that's the root of the issue, given it was triggering log noise before the reboot.
Did you get a crash report prompt? Having the backtrace should significantly reduce the possible causes.
-
The board hovers around 44C so don't think heat is a problem. While it is fanless it is in a very perforated case: http://linitx.com/images/products/M350_Universal_Mini-ITX_Enclosure_main_large.jpg
Yes, the SSD is an old repurposed one but was healthy (SMART) when I took it out of the previous machine. I'll check SMART again and see what it says.
I had another look at the log file - it's not missing sections. It has whole block of information out of order - as if it wrote in the middle of the file, then the end and then back in the middle. I can only assume the partition table is dodge… Will focus on that.
-
CLOGs… Perhaps?
https://doc.pfsense.org/index.php/Why_can%27t_I_view_view_log_files_with_cat/grep/etc%3F_%28clog%29
Don't break it thinking you have an issue there. At first glance, this seems normal to me.
I'd leave it alone and wait for more crashes since you removed that USB thingy. Give it a chance to be stable. Unless its already crashing again?
-
ok, it may well be that. The "writing in the middle" continues.
Mar 17 08:10:06 pfSense php-fpm[2384]: /index.php: Successful login for user 'admin' from: 10.10.50.X
Mar 17 08:10:06 pfSense php-fpm[2384]: /index.php: Successful login for user 'admin' from: 10.10.50.X
Mar 17 09:06:43 pfSense sshd[14419]: error: PAM: authentication error for root from 10.10.50.X
Mar 17 09:06:43 pfSense sshd[14419]: error: PAM: authentication error for root from 10.10.50.X
Mar 17 09:06:49 pfSense sshd[14419]: Accepted keyboard-interactive/pam for root from 10.10.50.X port XXXX ssh2
ad_status: Syncing firewall
Feb 1 12:10:30 pfSense kernel: ovpns2: link state changed to DOWN
Feb 1 12:10:30 pfSense check_reload_status: Reloading filter
Feb 1 12:10:30 pfSense kernel: ovpns2: changing name to 'tun2'
Feb 1 12:10:31 pfSense check_reload_status: Syncing firewallI see most of my log files are exactly 500KB so it stops at that point and writes from the top again.
Thanks for that - removes my biggest worry.
It's not crashing and I don't expect it to crash for a long time. This is the second or third time it has crashed since early June - most of which was running Alpha nightlies.
Also, the crash, as far as I can see, is not a panic (as I know it). The system is still up and working but just really badly.
thanks
FB -
Well - with nightlies I'd be expecting some glitches anyway. Basically you are beta testing, which is nice but its certainly not the way I would start out. I think stable releases are a better bet for someone just getting to know pfsense.
Did I say beta testing… I should have said alpha testing :o
-
TBH, I was surprised how good the Alphas were. Only issue was this one I have now. I needed Alpha because the hardware I bought was not supported by the previous release of BSD.
-
Are the alphas using a different version of BSD than 2.2? (I'm not sure - I haven't tried)