2.1.5 –> 2.2, nanobsd 64, random crashes on Atom N280 (HP T5740/T5745)
I just upgraded an HP T5740/T5745 thin client from 64bit 2.1.5 nanobsd to 64bit 2.2 nanobsd, and the box now crashes at random after 10-30 minutes of uptime. I had been running pfblocker, but that has now been completely removed, and the crashes persist. The only special thing going on is a single IPSec tunnel.
The hardware is a single core Intel Atom N280 (single core HT), with a Broadcom based gigabit interface (bge), and a dual port Intel NIC (em) in the pcie expansion bay.
This setup was rock solid running 2.1.5 64bit nanobsd, uses little power, and maintains a CPU core temp around 24-26C.
I tried different powerd settings (adaptive, highdaptive, max), and crashes persist in 2.2. All of the preventative tuning measures for BGE and EM cards are in place.
I put a monitor on the box, and the crash dump scrolls past too fast to read until it finally stops and reboots.
Are you using some full-install, or the NanoBSD image version?
As indicated in the title and body of the post, I am using nanobsd.
We need some crash info here. Does it lock completely or kernel panic and dump you at the db> prompt?
You should be able to scroll back to the first part of the panic log.
I'm using 2.2 with HP t5730 Thin Clients without any single issue.
How did you install the NanoBSD image to the built-in flash module?
The machine had a screen attached, but no keyboard. A crash dump scrolls the screen for a couple of minutes, and then the machine reboots. I was not able to stop the scrolling when I plugged in a keyboard. It did not resolve to a db> prompt, so far as I can tell. I would notice the lack of network connectivity, and then look across the room to see a crash dump furiously scrolling across the screen.
I set up remote logging on the local network, but the crash is not preceded by anything suspicious in the log files I was able to capture on the syslog server.
–------------ UNRELATED INFO BELOW
Note! This is an HP T5740 which is Intel Atom based. T5730s are AMD based, and work just fine, I have 5 of them. Please do not confuse this thread with T5730 material.
To install I used the YUMI multiboot USB tool to create a puppylinux USB boot stick and copied the nanobsd install image to the free space on the stick. Boot into puppy via USB, and use dd to write the nanobsd image to the internal storage. YUMI compartmentalizes each bootable ISO image that is installed on the stick, and leaves the free space accessible as a FAT partition on the root of the USB stick. I have found it to be very handy for working with the thin clients. http://www.pendrivelinux.com/yumi-multiboot-usb-creator/
bfeitell, I know very well it's a different model. I just noted that for the record.
I'm using also a prepared FAT32 USB stick with syslinux and a little debian image, which allows me to dd back and forth (write image to or create image from) the internal DOM storage while it's gunzipping/gzipping it on the fly (no need to gunzip first). (sorry for being offtopic)
Did you try to change BIOS settings? Factory defaults, disable or enable HT on the CPU?
It looks like that box has a serial port so one thing you could do is hook up a serial terminal and set a big buffer so you scroll back through it.
One important change going to 2.2 is that there is now only one 64bit kernel and it has both serial and vga consoles. You could try disabling the serial port (if it's enabled) in the BIOS to force a VGA console only. Perhaps some buffer is filling after ~30mins. You could switch to 32bit which still has separate kernels and I can't imagine you have >4GB RAM.
I will look tomorrow when I have access to the box again. I do not think that there is any way to disable HT in the bios. I will also try disabling PowerD completely.
I have seen another report indicating spontaneous reboot problems with 2.2 on Atom hardware.
I believe that serial is already disabled in the bios, but I will double check. I do not have the cables handy to hook up a serial connection, but I will start digging for them.
I remember reading somewhere that early Atoms don't support the entire 64-bit instruction set, and it also depends on the motherboard how they implemented it (bios, chipset). My guess would be that probably old FreeBSD 8 didn't actually use any of these and that's why it worked until now. I'd check it with i386 NanoBSD image too.
I did a little research, and I believe you are right. I'm going to prep a fresh 2G DOM with the i386 version of 2.2 nanobsd, and swap it in when I get to work. I could have sworn that the Atom N280 supported 64bit instructions, but the data I'm seeing indicate that in fact it doesn't.
I made a note here for others to read about it: https://forum.pfsense.org/index.php?topic=84679.0
The Atom N2xx, Z5xx, Z6xx series Atom models dont support 64-bit. I'm running a D510 which does support it
Unfortunately information on Intel's ARK website is confusing, they say about many Atom CPUs that they are 64-bit capable, while in reality systems containing it also depend on the motherboard implementation.
I have reinstalled a fresh copy of pfSense 2.2 i386 nanobsd on the HP T5740, and thus far things seem to be working correctly. If I see any spurious reboots running the i386 code, I will update this thread.
The odd thing is that 2.1.5 64bit nanobsd ran beautifully on this hardware. I must have just gotten lucky.
I would like to thank those who chimed in for their helpful advice.
i386 nanobsd seems broken as well. The machine just rebooted spontaneously after about 38 minutes of uptime. I was not in time to catch any of the crash dump.
Have you tried to disable IPSEC?
As posted here https://forum.pfsense.org/index.php?topic=87499 that has fixed my problem. Not a single reboot since I switched to OVPN.
Another user reported reboot-problems when trying to access webinterface via ipsec https://forum.pfsense.org/index.php?topic=87391.0
I switched to IPSEC from OVPN some time ago, as I found that the OVPN tunnels would not stay up, while the IPSEC ones, at least under 2.1.5, have been comparatively bulletproof.
I have taken the HP T5740 (Atom N280) out of production in favor of a more power hungry HP T5730 (AMD Sempron), which doesn't seem to suffer from the reboot issue.
At this point I'm inclined to wait for the next update before I try the Atom hardware again. I need a VPN tunnel that stays up to carry critical VOIP traffic.
I have now done some additional testing, and even with powerd disabled the system spontaneously reboots when there is heavy traffic across the ipsec tunnel. I will dig up a SATA drive and and install a full version of pfSense so that I may collect a proper crash dump.
I think I may have figured this out, even though I never managed to collect a proper crash dump. After reading through the responses to my posts I checked in the bios and found that on the HP T5740 / T5745 thin client there is no way to disable hyperthreading (HT) in the bios. I did a bit of research and found a sysctl toggle for hyperthreading.
to /boot/loader.local.conf I was able to disable hyperthreading.
This seems to have quelled the random crashes on my N280 Atom cpu when running i386 nanobsd in conjunction with ipsec.
Big thanks to all those who chimed in, and especially robi, who put me on to exploring hyperthreading. I hope that this info helps someone else as well. If I run into any more problems I will update this thread.
After some further reading about IPSec crashes on i386, I came across a thread here indicating that a different sysctl toggle may alleviate the problem in a more finely targeted manner than shutting off hyperthreading.
I have now remove the HT tweak discussed above, and I'm testing to see if setting:
fixes the issue.
Here is the thread and relevant post: https://forum.pfsense.org/index.php?topic=88606.msg501050#msg501050