Upgraded to 2.4.4_3 - system now unstable
-
UPDATE: system board completely failed. Timing with the reboot due to the 2.4.4_3 update was coincidental - issue had nothing to do with pfSense. Intel C2000 bug strikes again - kudos to Supermicro for their speedy response.
Woke up this morning to a completely hung pfSense node - been stable for 2 years now, but today it was dead. Completely DOA - serial was dead, VGA was dead - no output at all. Even the old fashioned "press numlock on the keyboard and see if it flashes" trick didn't work - dead dead dead.
Edit to add: Upgraded to 2.4.4_3 a few days ago, and everything seemed OK initially. That's the only change that's happened here in months.
Rebooted, everything came up OK, and about 20 minutes later, hung again.
Cycle continued on and on this morning. Every 20 minutes or so, it would do a hard hang - the system would be completely unresponsive.
I ended up putting an old, slow Supermicro Atom D510 system back in place temporarily until I can figure out what's wrong. Restored a configuration file from the "dead" system (grabbed it between hangs) ...
Any suggestions on how to troubleshoot such a beast? System.log shows no entries at all around the hang - last entry is an arpwatch message, and then the kernel reboot. Nothing of interest.
All this said - I've now booted it up with no network cables attached, and it seems stable. Hasn't hung yet in the last 1/2 hour or so.
If this were some oddball configuration problem, I'd expect the "new" D510 to have the same issues since I just restored the config file from the "broken" node - but it's working fine (if slow - can't keep up with the gigabit fiber WAN).
Any ideas at all? I don't get a kernel dump or panic - it's just hung hard. The only thing I can think of is that maybe the PCIe network card (Sun x1109A / Intel 10GbE 82599) is cooked somehow...?
Hardware configuration:
Supermicro 5018A-TN7B (Intel Atom C2758 CPU)
32GB RAM
2x SanDisk 120GB SSDs (Zpool mirrored)
1x Sun X1109A / Intel 82599 10GbE PCIe adapter (dual port; running at 1Gb/sec per port)
7x Onboard 1GbE portsSoftware configuration:
pfSense 2.4.4-RELEASE-p3Package list:
acme
arpwatch
bandwidthd
Cron
haproxy
iftop
mtr-nox11
nmap
ntopng
nut
openvpn-client-export
pfBlockerNG
RRD_SummaryNetworks
WAN (Verizon FiOS - symmetric 1GB/sec)
LAN (1GB connection to HPE 1920-48G-PoE+ switch)
OPT1 (1GB connection to Guest VLAN)
DMZ (1GB connection to 'hosting' VLAN for external services) -
I would see if a different OS has the same issue on a liveboot, for that matter try live booting the pfSense installer, if it still hangs you have a hard ware issue.
-
@dmurphynj said in Upgraded to 2.4.4_3 - system now unstable:
Edit to add: Upgraded to 2.4.4_3 a few days ago, and everything seemed OK initially.
Upgraded from what?
-
Thanks guys. I think the C2000 bug has gotten to us... now it won't power back up at all. I ran a memtest86+ and it passed. Powered down, and now won't power back on at all.
Previously was on 2.4.4_2 - so not much of a change there.
Looks like cooked hardware - just opened a case with Supermicro. But it was bizarre that the problem only appeared under load at first - booting single-user mode or even multi-user mode without network cables attached, the system stabilized. Once I put a load on it, that's when it hung. Now we're just dead as a doornail.
Dang Atoms.... should've chosen a different CPU.
-
Everything works until it breaks, and coincidence is the mortal enemy of troubleshooting.
-
@KOM FOR SURE ...
I didn't think it was really the upgrade, but wanted to see if anyone else has run into a hard fault scenario like this.
If it were really pfSense, I'd expect at least a kernel panic. Just was curious if anyone could see a scenario where the software would hard-stop the CPU without a panic. I haven't seen that kind of hard fault in years ...
Love ya, Intel. Really.
-
unstable is a small word. i had a completely unusable box. Hoperfully my installation was a virtualized one and did restore the whole image. I dont know what happened but if somebody needs the image with the issues i will provide so sombody with deep knowledge of pfsense can understand why it just broke
-
@albgen said in Upgraded to 2.4.4_3 - system now unstable:
unstable is a small word. i had a completely unusable box. Hoperfully my installation was a virtualized one and did restore the whole image. I dont know what happened but if somebody needs the image with the issues i will provide so sombody with deep knowledge of pfsense can understand why it just broke
Spouting things in a thread that resolved around a hardware issue and has nothing to do with yours is bad manners. I understand frustration after an update gone wrong. I don't understand spreading FUD without necessary debugging myself. Standing around and shouting "I won't upgrade anymore, do it yourself if you want" is nothing that helps either you or anyone else.
-
@JeGr
ok your thought.let me write again what i did wrote on the other discussion:
This instance of pfSense is installed on january 2018. After that i did the regular updates from the UI. Thats it. Arrived today and i had a dead box. What happened? well i don't know, it seem this version(2.4.4_3) is braking something.
Never happened since using monowall and pfsense. First time!
Let me point out also that i'm the only one that can manage this box so the chance somebody has done something on the box else is zero! The web interface and the ssh is not even exposed on the wan interface. -
You're hijacking another thread for your topic, that has clearly nothing to do with it. Just stop that, please.
-
@albgen said in Upgraded to 2.4.4_3 - system now unstable:
Arrived today and i had a dead box.
'Dead box' is not a helpful term for debugging. Was it powered off with no LEDs? Was it totally unresponsive via console? Was it showing any errors? Did you reboot it? etc etc
What happened? well i don't know, it seem this version(2.4.4_3) is braking something.
And what have you seen that makes you believe this? I've been here for many years, and if I had a dollar for every user who complained about pfSense - only for the real fault to be bad hardware - then I could retire already.
Never happened since using monowall and pfsense. First time!
Like I said earlier, everything works until it breaks.
As JeGr said, if you have a problem then post a new thread.