Replaced MOBO and now can't get install to complete
-
[If I have this in the wrong place, let me know. This just seemed like the right one for this weirdness.]
I have 2.7.1 running on a back up box because my main tower for running pfSense had a MOBO go bad (don't ask me, this was my third one in 18 months for this tower). This tower has an AMD Ryzen, forget which CPU but it has at least 8 cores....
And the real weirdness started when I attempted to upgrade to 2.7.1 on my main tower.
[The back up box is running just fine]
Ok, to the problem. This system had some odd motherboard issue and started spitting out DAT errors, and other weirdness (I do mainframe computers for a living, used to work at the firmware level -- I've never seen this kind of problem and I took care of the Machine Check code/logic). Long story short, I did all the diag work I could and finally took it to Microcenter where they tested CPU, RAM, and MOBO. Finally they decided the MOBO was bad. Now I have a new ASROCK B550 PRO4 MOBO. [They had run some diagnostic test software for a while once the new Mobo was in]
So to be absolutely sure any corruption is gone from the disk drives, I took a SUSE Leap Live DVD and reformatted both hard drives. Then I loaded my PFsense DVD and am now getting this error: ufs:: /dev/da0s1a
zfs:zroot/ROOT/default
cd9660: /dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 ) << that is what is on the screen...? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual inputmountroot>
I have tried UEFI and normal install. And I have no idea what this problem is. I have been searching the online manuals.
I know that this DVD is the one I had used for both of these units to get them working.
One last thing, the prior message in doing a full install was:
Mounting from zfs:pfSense/ROOT/default failed with error 6; retrying for 3 more seconds
Mounting from zfs:pfSense/ROOT/default failed with error 6.Just point me to where this is in a manual. I've spent several hours trying different USB DVD readers (I have 2), different length cables, and different USB ports. No Joy. And I would not be surprised if this is a bad MOBO.
-
Are you not able to use the memstick image on a USB flash drive?
Been a while since I've tested the ISO image on an actual DVD drive. Longer for a USB connected one.
Steve
-
I've been using these with various computers since I have so many DVDs around (music, doc, Knoppix/Suse LIVE, etc.).
But I will find one of my thumb drives I can clear, and get it set up to do an install.
L8r.
-
@Wylbur
Ok, built a usb thumb drive for install.The install identified the USB drive and I could see the system was loading and things were going by on the screen rapidly and then....
It got pretty much the same failure as from a USB DVD install:
Mounting from zfs:pfSense/ROOT/default failed with error 6; retrying for 3 more seconds......
:
:
Abort manual input...
mountroot>
And we had such a good thing going.
Any more suggestions?
-
Exactly which image did you use?
Did it complete the install and then fail to boot the result?
The odd thing there is that it's trying to mount ZFS and the installer itself is UFS.
What does it show for available devices there when you enter
?
Does it show the USB device (probably da0) detected in the boot logs before that point?
It's possible it's simply booting too fast and there is a delay from the USB devices initialising. In that case you can add a boot delay, see:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/installation.html#booting-from-usbSteve
-
@stephenw10 said in Replaced MOBO and now can't get install to complete:
Did it complete the install and then fail to boot the result?
The odd thing there is that it's trying to mount ZFS and the installer itself is UFS.
What does it show for available devices there when you enter ?
Does it show the USB device (probably da0) detected in the boot logs before that point?
It's possible it's simply booting too fast and there is a delay from the USB devices initialising. In that case you can add a boot delay, see:
https://docs.netgate.com/pfsense/en/latest/troubleshooting/installation.html#booting-from-usb
First - As best I can tell, it was still on init stuff. I didn't see a boot message where it was booting from what it laid down.There are 2 hard drives on this system that I reformatted (mentioned that before). I do not have any SSD in this system.
Either power on or hit <reset> and we start.....
Ok, <F11> to select boot -- USB: SanDisk
Then stuff goes streaming by, and finally I get this:
ada1: 476940MB (976773168 512 byte sectors)
ada1: quirks=0x1<4K>
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
da0: <SanDisk Cruzer Glide 1.00> Removeable Direct Access SPC-4 SCSI device
da0:Serial Number 4C531001371208106022
da0: quirks=0x2<NO_6_BYTE>
Mounting from zfs:pfSense/ROOT/default failed with error 6; retrying for 3 more seconds
:
:
:
finally
mountroot >I gave it the command as the manual listed. But it was mentioned needing to escape. I don't know what the escape key/character is. I tried a few things, but no joy (and I looked for escape char, etc., and didn't see one defined). So it couldn't take and make use of the delay info I gave it. -- it "took it" it just didn't know what to do with or about it.
So it had to fetch a boot loader from somewhere to get started and it was reading the SanDisk (thumb drive).
I put that thumb drive on a Suse Leap 15.5 system to query it and I get this:
sdd1 14.59 GiB Ext4 Partition 2918 30603263
start endMaybe someone should do some testing of DVD read and install, because that is how I got pfSense to initially load on this machine (before the Mobo went wierd).
Wylbur.
-
You probably need to clean the target drives. One or both still has ZFS data on it and it's trying to mount that. Especially if it was previusly a mirror across both drives.
See: https://docs.netgate.com/pfsense/en/latest/troubleshooting/multiple-disks.html#clear-the-disk -
As I said, I had reformatted them before I started this. So, what should I reformat them with? EXT(?) or XFS, or what? I will be using the partitioner from either Suse Leap 15.6 or Knoppix 9.1.
I would imagine then, that the first thing the installer will do is check the file system, and if it doesn't recognize it, it will do its own formatting. And perhaps this is the problem --could it be using the wrong file system?
Wylbur
-
You don't need to format them the installer does that. By default it uses ZFS.
The problem is that most disk formatting tools only remove the data structures and do not actually write zeros to the disk. ZFS will search the disk for existing ZFS pool metadata and tries to use it.
Hence you need to clean the disk to remove that.
-
Cleared the disk drives (2) and then some how the thumb drive had been erased (it was not plugged into the system while it was clearing the drives -- Rod Sirling has to be in my LAN somewhere). So had to drop back to the DVD reader and the install ran to completion. BTW 2TB drives take a very long time (doing the clean in parallel) and I had them set up for only 2 passes not the 10 (default).
So this whole exercise appears to have been caused by the disk drives being corrupted in some fashion.
Bottom line is, it seems to be working for now. Next stop -- upgrade to 2.7.2
-
Nice. Yeah if you actually have to write zeros to the entire drive that can take....a while!
That shouldn't be required as removing the zfs data and the partition tables is sufficient if they are accessible. Writing zeros does that of course.
Anyway glad you're up and running.
-
@stephenw10
I'm trying to find all the items I've had open and put in a final post, such as this one:After intalling Service_Watchdog and it now running, we no longer see issues in our logs. The principle one being the WAN interface failing. This is not the fix for what appears to have been a MOBO and later a power supply problem. But it is interesting that this fixed the problems on the backup pfsense server and the primary server with the ISP on the WAN. And the backup server is NOT running with a dual port Intel ethernet adapter.
-
What services did you enable it for? Do you see it logging restarting them?
-
I have it set up for LAN. And I've seen the error message that I used to get, but the system keeps running after that now. I didn't use WAN because after reading what it says I thought that might be a bad idea.