2.7.0-devel running on mini-pc keeps crashing.
-
Purchased hardware from this site ---> Firewall Device. Solid built hardware but for some reason pfsense keeps crashing.
I am running 2.7.0-Devel to use the drivers for the i226-v NIC on the board. I suspect it maybe over temping on the CPU (environment reaches 90°F during peak) but I am not versed in reading logs. so I could use some help with that. Not sure if I can put the crash report logs here. There were 4 documents in the crash reporter. Any assistance would be greatly appreciated.
-
@PaPaStarr I'm sure it's just a type, but depending on where you are in the world and what you're stance is on using A/C, 90F could be just a little bit higher than ambient temperature. That is considered very cool actually. I assume you meant 90C (194F)? In that case, yes it's cooking.
-
@cybrnook Yeah and upon further investigation it appears the top part of the case is acting as the heatsink. Not a bad design, gives the CPU a massive heatsink. THe intel N5100 CPU does run hot.
I noticed no matter what I do the cpu temperature reported in the pfsense dashboard never changes. It stays at 27.9°C and even if I put it right next to an AC vent. The top gets really hot.
I have it sitting in my office now with a spare pc transferring a 500gb zip file from my server on a 2.5gbps connection along with a tv streaming, and a cloud connection to a remote server transferring files to stress it. If it crashes again I got a major problem cause I can not deploy it at home. I am using a backup router (old netgear orbi RBR50) but I really want to expand my network and this router is important. I really need someone to analyze the logs so I can be sure its not the pfsense install. It would actually be great if it was and I could reinstall the software. Right now IDK.
-
You got the Intel sensor selected in System/Advanced/Miscellaneous?
Also usually those fanless china-boxes have crap thermal compound on the cpu...
-
-
@PaPaStarr some of those boxes are notorious for bad heatsink connectivity with the CPU and sometimes inferior or ineffective thermal paste. That's definitely something to look into.
Also the N5100 supports Speed Shift and with FreeBSD 14 on which pfSense 2.7 is based, this takes priority over PowerD and may be running the processor harder than necessary. Take a look at the thread below for some System Tunables you can try to help with the heat.
If noise is not a problem you could also purchase a usb fan and strap it to the top of the case.
-
@cappie Thank you for the information. I read over and will add to the list of things to investigate. So far the unit has not crashed or frozen. It is trucking along. So im going to give it the night back in the garage since me and my wife are not working. I talked with the vendor and they are going to send me a box with a J4125 chip but with less ram (8gb vs 16gb). the hardware is overkill but I like having maxed out hardware for the just in case or new project ideas.
-
@cappie So I followed the post you linked to and I did have that issue. My 5100 was running at 2.2 GHZ floating up a little and down a little but it stayed around 2.2 GHz. I put the tunables in and my CPU went down to 1 GHz which is closer tot eh 1.1 GHz base of the CPU. I had to go with a 90 to get it right. So thank you very much for that. unfortunately it still crashed last night sitting in my office. temp wise it was hovering at 43°C when it crashed. The seller did say the J4125 chip would be better suited for my environment. It has a advanced cooling chassis and the CPUY runs cooler. I still want to know what is crashing it. Can I post my logs in here? if so, which one should I post, since there were a few files to download in the crash reporter.
-
Also from the crash last night I pulled this from the logs:
CA: Bank 3, Status 0xbe00000006110135
MCA: Global Cap 0x0000000000000c09, Status 0x0000000000000004
MCA: Vendor "GenuineIntel", ID 0x906c0, APIC ID 2
MCA: CPU 1 UNCOR EN PCC DCACHE L1 DRD error
MCA: Address 0x248c98018
MCA: Misc 0x80
panic: Unrecoverable machine check exception
cpuid = 1
time = 1686123907
KDB: enter: panic
panic: Assertion kstack_contains(td, (vm_offset_t)et, sizeof(*et)) failed at /var/jenkins/workspace/pfSense-CE-snapshots-master-main/sources/FreeBSD-src-devel-main/sys/kern/subr_epoch.c:473
cpuid = 1
time = 1686123907
KDB: enter: panic
panic: Assertion kstack_contains(td, (vm_offset_t)et, sizeof(*et)) failed at /var/jenkins/workspace/pfSense-CE-snapshots-master-main/sources/FreeBSD-src-devel-main/sys/kern/subr_epoch.c:473
cpuid = 1
time = 1686123907
KDB: enter: panic
panic: Assertion kstack_contains(td, (vm_offset_t)et, sizeof(*et)) failed at /var/jenkins/workspace/pfSense-CE-snapshots-master-main/sources/FreeBSD-src-devel-main/sys/kern/subr_epoch.c:473
cpuid = 1
time = 1686123907
KDB: enter: panicDoes this mean anything to anybody.
-
@PaPaStarr not sure to be honest but looks like an on chip data cache error. Could still be heat related but you are currently running 5c cooler than my setup (48c). That said I'm on a i3-9100T with a fan not an embedded chip.
Things you could try until you get the other box:
Swap out the power supply. Don't discount that you got a flakey power supply, it does happen.
Swap out the RAM. Try single sticks in different slots.
Change the thermal paste and make sure the entire die has good contact with the top of the case which acts as the fin stack.
Are you on the latest BIOS? Over at ServeTheHome the N5100 and N5105 exhibited some strange/random problems until fussing with the bios and still crashed randomly for some. It's a long thread but still worth checking out for potential solution.Sorry can't offer more specific help.
-
@cappie I swapped power supplies (had one with the right power specs laying around). I will pass on the heatsink reapplication, mainly since I am swapping the box but I will pass on in my notes to the seller. I will run with the power supply for a while and if no change will swap the RAM sticks. If I have time I will go through the bios and confirm the latest version. I love Servethehome and will for sure go through the post. Thank you once again.
-
@PaPaStarr
MCA: CPU 1
UNCOR = uncorrected error
PCC = processor context corrupted
DCACHE L1 DRD error - data read from L1 cacheUnrecoverable machine check exception
i'd say 99% it's the CPU
-
@kiokoman Excellent. Thank you for the knowledge on the error code. I dont see it as much I can do to fix it and hopefully the new box with a J4125 CPU will do better. I believe its my Uncle Murphy showing up and I got a wonky CPU.