After replacing a failed (cache only) Cisco 12G SAS controller, the following: kdb_enter+0x32: movq
-
Re: pfSense as VM > Stopped at kdb_enter+0x32
It is not a virtual machine it is a bare metal installation, we received a "Moderate Fault" log from the Cisco CIMC of one of our pfSense machines, indicating a RAID controller cache failure.
We had on the shelf a replacement RAID card with a 1GB cache module with exactly the same FW.
After a quick replacement, pfSense starts with this error (below) and cannot be used, if I rebuild the faulty RAID card it works without any problems, of course with the original error.
Has anyone come across something similar?
Is it possible to replace a RAID card without a full pfS reinstall? -
This seems unrelated to the linked thread about Proxmox?
MCA errors like that are almost exclusively a hardware issue.
Steve
-
@stephenw10 said in After replacing a failed (cache only) Cisco 12G SAS controller, the following: kdb_enter+0x32: movq:
This seems unrelated to the linked thread about Proxmox?
Yes, as I wrote it is not a virtual environment "It is not a virtual machine it is a bare metal installation"
Which is identical in both cases (and google search threw this thread): kdb_enter+0x32: movq
-we clearly have a HW error, the RAID card cache is degraded ("Moderate Fault")
So we replaced it with a completely identical one RAID cont., but pfSense won't start with it, I think pfSense should "survive" a HW element replacement
I could replace the RAID controller and reinstall the whole pfSense from scratch, but I wanted to avoid that, because of the longer downtime.
I was wondering if anyone had any ideas...
PS:
currently this pfS install running with the original (faulty) RAID controller, but something needs to be done as it already predicts other possible failures -
Right but you're seeing that MCA error with the replacement card as I understand it?
Very occasionally you will see that when a software update exposes some hardware conflict that was always there but never hit.
Here though if the hardware is identical you know that shouldn't happen so it pretty much has to be bad hardware.
-
@stephenw10 said in After replacing a failed (cache only) Cisco 12G SAS controller, the following: kdb_enter+0x32: movq:
Right but you're seeing that MCA error with the replacement card as I understand it?
I only get this error when I start with the new (ergo replaced) card .
when I put the partially faulty card back (as it is only moderate fault) so it still works, but no cache, but no MCA error the pfsense boots fine
+++edit:
with the replacement card the Cisco HW check runs (in CIMC) and does not find the new RAID card faulty but pfS does not start
I should also mention that we have 3 sets of these spare cards (Cisco UCS 12G SAS - since we have several Cisco UCS-C2xxM4 servers) and all of them behave like this -
I would look for hardware revision differences or firmware differences then.
There is nothing you can do in pfSense to correct that other than disabling the driver entirely so it never tries to access it.
-
yes, I was afraid of that...
by the way, I compared the cards and all of them were made on the same day, their serial numbers are within a thousand pieces, and I chose the one that differs from the original (faulty) serial number by only about two hundred, so they might have been on the production line at the same time :)
Well, I suspect - because this installation is configured to Cisco doing RAID1 with two physical SAS disks (it's not good) and pfSense only sees the Cisco VD boot drive and configured for plain ZFS.
At this point something goes wrong, when another RAID controller is inserted the pfSense can't handle it...
(unfortunately I inherited this setting from another colleague who no longer works with us)Since I have to reinstall pfSense anyway (this is now clear to me) I think I will skip the Cisco RAID1 and install the new pfSense with ZFS RAID on the two SAS disks
One more question, this Cisco is running on CE version and we plan to switch to the paid version.
I haven't followed the updates here for a long time, so my question is 2.7.2 config.xml compatible with 24.03? - because I would have to make this version switch now. -
Yes you can import a 2.7.2 config into 24.03.
-
Thanks for the help and info