ZFS ISSUES!!! built on Wed Jul 11 16:46:22 EDT 2018



  • Glad to see you guys are bringing this bug back on the latest snapshot.

    https://redmine.pfsense.org/issues/6929



  • https://forum.netgate.com/topic/132727/latest-snap-stopping-with-error-2-4-on-watchguard-xcs-box

    I was running a snap from earlier this morning and then tried the one labeled about 1130 when I started seeing this.



  • Someone has filled out a redmine ticket related to it. I suggest tracking progress there.

    https://redmine.pfsense.org/issues/8639



  • If i install at least built on Thu Jul 12 08:51:04 EDT 2018 manually i can boot pFsense again. Don't know if is a update problem.



  • I have not updated mine yet, but will be getting out of work soon to test as I don't have remote IPMI setup yet and am afraid to do it remotely at this moment of noway to get it back up.



  • @raul-ramos It's a no-go for me on that snapshot, although I did not manually modify the config after to add the lines back in.



  • Yes, this is upgrade bug. Clean install works, but if you upgrade it to the next snapshot it is going to fail on boot.



  • Updated redmine as well...

    1. Clean install of (2.4.4-DEVELOPMENT (amd64)built on Fri Jul 13 10:56:07 EDT 2018FreeBSD 11.2-RELEASE)
    2. Restore config
    3. Reboot
    4. Upgrade and reboot (2.4.4.a.20180713.1957)
    5. Borked system zfs won`t mount


  • @maverick_slo
    Actually you don't need to restore config, at least it is not necessary to reproduce the problem. I've used clean install in VM and right after boot updated it with option 13, without configuring anything.



  • Yeah I know.
    I just wanted to note exactly what I did.

    But lack of offical response makes me concerned...
    At least they could pull upgrades.



  • @maverick_slo
    May be it's vacations. Delays are expected in summer time ☺
    And this is development branch that have low priority I think, because everyone who uses it, must be ready for fail. The other thing embarrassing me is that 2.4.4 have too much major changes for the minor version.



  • Should be fixed with recent snapshot.



  • Tried todazs and same mountroot error.



  • @maverick_slo
    Before upgrade check that your /boot/loader.conf contains

    zfs_load="YES"
    

    Mine was just missing. I've added it manually and upgraded successfully to JUL 17 7:07 version.



  • I do have zfs_load="YES" on loader.conf and it failed to upgrade from:

    2.4.4-DEVELOPMENT (amd64)
    built on Fri Jul 13 19:58:06 EDT 2018
    FreeBSD 11.2-RELEASE

    to:

    Can't be sure snapshot version. It was arround 14:00 GMT+1.



  • 0_1531844987826_VirtualBox_pfSense-244 testing bug_17_07_2018_19_25_49.png

    Clean install and upgrade, both versions are on the screenshot. No problem after boot and reboot.



  • Well.... it didn't work for me.

    EDIT: sorry. Adding pics.

    EDIT2: Now it should be here.

    0_1531846918393_pic1.jpg 0_1531846947671_pic2.jpg



  • For me either... Same error

    Updating repositories metadata...
    Updating pfSense-core repository catalogue...
    pfSense-core repository is up to date.
    Updating pfSense repository catalogue...
    pfSense repository is up to date.
    All repositories are up to date.

    Setting vital flag on pkg... done.
    Downloading upgrade packages...
    Updating pfSense-core repository catalogue...
    pfSense-core repository is up to date.
    Updating pfSense repository catalogue...
    pfSense repository is up to date.
    All repositories are up to date.
    Checking for upgrades (9 candidates): ......... done
    Processing candidates (9 candidates): ......... done
    The following 10 package(s) will be affected (of 0 checked):

    New packages to be INSTALLED:
    openvpn-auth-script: 1.0.0.3 [pfSense]

    Installed packages to be UPGRADED:
    snort: 2.9.11.1_1 -> 2.9.11.1_2 [pfSense]
    php72-pfSense-module: 0.62_1 -> 0.62_5 [pfSense]
    pfSense-rc: 2.4.4.a.20180713.1056 -> 2.4.4.a.20180717.0756 [pfSense-core]
    pfSense-kernel-pfSense: 2.4.4.a.20180713.1056 -> 2.4.4.a.20180717.0756 [pfSense-core]
    pfSense-default-config: 2.4.4.a.20180713.1056 -> 2.4.4.a.20180717.0756 [pfSense-core]
    pfSense-base: 2.4.4.a.20180713.1056 -> 2.4.4.a.20180717.0756 [pfSense-core]
    pfSense: 2.4.4.a.20180713.0955 -> 2.4.4.a.20180717.0730 [pfSense]
    git: 2.18.0 -> 2.18.0_1 [pfSense]
    e2fsprogs-libuuid: 1.44.2_1 -> 1.44.3 [pfSense]

    Number of packages to be installed: 1
    Number of packages to be upgraded: 9

    57 MiB to be downloaded.
    [1/10] Fetching snort-2.9.11.1_2.txz: .......... done
    [2/10] Fetching php72-pfSense-module-0.62_5.txz: ...... done
    [3/10] Fetching pfSense-rc-2.4.4.a.20180717.0756.txz: .. done
    [4/10] Fetching pfSense-kernel-pfSense-2.4.4.a.20180717.0756.txz: .......... done
    [5/10] Fetching pfSense-default-config-2.4.4.a.20180717.0756.txz: . done
    [6/10] Fetching pfSense-base-2.4.4.a.20180717.0756.txz: .......... done
    [7/10] Fetching pfSense-2.4.4.a.20180717.0730.txz: . done
    [8/10] Fetching git-2.18.0_1.txz: .......... done
    [9/10] Fetching e2fsprogs-libuuid-1.44.3.txz: ..... done
    [10/10] Fetching openvpn-auth-script-1.0.0.3.txz: . done
    Checking integrity... done (0 conflicting)

    Upgrading pfSense kernel...
    Checking integrity... done (0 conflicting)
    The following 1 package(s) will be affected (of 0 checked):

    Installed packages to be UPGRADED:
    pfSense-kernel-pfSense: 2.4.4.a.20180713.1056 -> 2.4.4.a.20180717.0756 [pfSense-core]

    Number of packages to be upgraded: 1
    [1/1] Upgrading pfSense-kernel-pfSense from 2.4.4.a.20180713.1056 to 2.4.4.a.20180717.0756...
    [1/1] Extracting pfSense-kernel-pfSense-2.4.4.a.20180717.0756: .......... done
    ===> Keeping a copy of current kernel in /boot/kernel.old
    Upgrade is complete. Rebooting in 10 seconds.
    Success



  • Will try exactly 2.4.4.a.20180713.1056 to see what happens. ☺



  • @maverick_slo
    yes, it refuses to boot with the same error after upgrade.
    I don't understand how it's happening... Ask on redmine, please.



  • ZFS needs opensolaris.ko.

    /boot/loader.conf should have this line.

    opensolaris_load="YES"
    zfs_load="YES"
    


  • I've real hardware installation and it never have had this first line.

    kern.cam.boot_delay=10000
    kern.geom.label.disk_ident.enable="0"
    kern.geom.label.gptid.enable="0"
    vfs.zfs.min_auto_ashift=12
    zfs_load="YES"
    autoboot_delay="3"
    hw.usb.no_pf="1"
    
    

    All I have, I think, years since zfs introduced.



  • Just now fixed broken installation by booting it manually loading all kernels needed and edited loader.conf adding one line

    zfs_load="YES"
    

    Works like a charm.



  • I have a 2.4.3 that don't have the "opensolaris_load="YES"" and an earlier version of 2.4.4 with untouched loader.conf that have the line.
    When this problem begin i solve the boot loading zfs manually on start and i had to load kernel, opensolaris.ko and zfs.ko. Loading zfs.ko without opensolaris.ko i get zfs needs opensolaris.ko error message.
    So i don't know...

    Edit:
    @w0w said in ZFS ISSUES!!! built on Wed Jul 11 16:46:22 EDT 2018:

    Works like a charm.

    Nice



  • I've downloaded pfSense-CE-memstick-2.4.4-DEVELOPMENT-amd64-20180710-0609, I think this version was compiled before bug was introduced. Installed and upgraded it to the latest. Boots just fine. So I expect only versions containing this static config file are affected and you should clean install the latest version to fix everything. May be there is some other way to fix this... don't know.


  • Rebel Alliance Developer Netgate

    You don't need opensolaris_load="YES", only zfs_load="YES". When loading the .ko files at the loader prompt you need to load them both manually but not when using the loader.conf entry.

    The problem is that the kernel package was including its own copy of /boot/loader.conf which clobbered the copy made by the installer which included the zfs line.

    There was another fix put in this morning that has not made it into a snapshot yet that should take care of any remaining issues. Before you upgrade, make sure your /boot/loader.conf or /boot/loader.conf.local contains zfs_load="YES".


  • Rebel Alliance Developer Netgate

    For me, the latest snapshot upgrades OK from a VM that previously failed. Everything should be OK now, but additional feedback for upgrades/fresh installs would help.



  • Previously failed 2.4.4.a.20180713.1056 now upgraded successfully.



  • Yup all fine.

    Thanks!



  • Me too. I've upgraded from 2.4.4.a.20180713 without a problem.



  • Just updating hasn't cured it for me - I'm still having issues on 2.4.4.a.20180723.2155

    I'm adding the zfs_load line to loader.conf now and will see how that goes next time I need to reboot.



  • Looks like editing the config manually sorted it (just in case anyone else is in the same boat.)



  • If you have had already broken installation, where config entry was missing, it is expected that you will fail to boot, until you fix it manually. Upgrade does not "fix" this. Because it should not do it. Globally it's not broken and it got broken once or twice before only on development releases. There is no need to add some code to fix this if it never going to broke itself on stable releases. ☺



  • You can say that, but it wasn't the behaviour I was expecting - I expected if an update broke it then an update will fix it and included the info for others who may have had the same expectation.


  • Rebel Alliance Developer Netgate

    @motific said in ZFS ISSUES!!! built on Wed Jul 11 16:46:22 EDT 2018:

    You can say that, but it wasn't the behaviour I was expecting - I expected if an update broke it then an update will fix it and included the info for others who may have had the same expectation.

    Except this update broke things in a way that there wasn't a good way for the firewall to determine what went missing. We maybe could have guessed ZFS based on the mounted filesystems and tossed it back in, but that's a lot of extra work to fix a small number of systems that landed on a problem snapshot that was only a problem for a few days.

    They are development snapshots, there will always be risk involved with running them.



  • Broken system is a broken system and with these development snapshots you should be prepared to nuke your system at short notice and reinstall/restore from a backup if problems arise. Surely you're not expecting these snapshots to be production ready?



  • @jimp - , that’s entirely fine by me. I mentioned it in case others had the same expectations I did... nothing more, nothing less. Loving your work, as ever.