Warning! Do not update with todays 1/10 snapshot!



  • I just tried to update the firewall from 27/9 to 1/10. That kills the FW. At least on my hardware (intel).

    I noticed that after the update the computer could not find a bootable device any more. I did reinstall the 27/9 version.

    Louis


  • LAYER 8

    probably something went wrong during your upgrade, it's ok for me

    Immagine.jpg



  • No problem on a proxmox VM (qemu-KVM) x 2 (2 updates in diferente proxmox VMs)
    built on Thu Oct 01 00:53:52 EDT 2020



  • I did try three times:

    • one fresh install
    • two upgrades from 27/9

    All three gave problems / failed. Perhaps not all in exactly the same way. I did notice:

    • once system starting but restarting all the time before it was up
      (I think that was with the clean install)
    • and not finding an OS at all
      (I think that was after the upgrades)

    I do not know what is causing this. But good to hear that others do not have problems.

    Louis



  • I just noticed from kiokoman's screenshot that the os has changed from:

    FreeBSD 12.2-PRERELEASE to 12.2 STABLE.

    That could be a clue, could e.g. be related to a package I installed. I have to investigate.

    Louis


  • LAYER 8

    @louis2 said in Warning! Do not update with todays 1/10 snapshot!:

    not finding an OS at all

    strange, it's usually a wrong bios settings like efi instead of bios or vice versa combined with a wrong partition table mbr/gpt


  • LAYER 8

    ok it seems that it does not work
    if i set efi bios on a vm machine it stop booting after upgrading
    also the virtual machine power off alone after the error

    Immagine.jpg

    I will try again to be sure



  • What is also .. strange is that 12.2-STABLE does not exist at all, just as 12.2.-PRERELEASE did not exist !

    See https://www.freebsd.org/where.html

    For me it would help if the OS-versions as show in pfSense would be in line with formally existing FreeBSD versions

    Louis



  • I always boot using / from EFI

    Louis


  • LAYER 8

    nice, there is a new build now.. 20201001.0650, now i need to test it again ๐Ÿ˜ฅ



  • I would not do that, since the change that it is fixed is ... very small


  • LAYER 8

    yeah, leave it to me, I can test it in a vm

    ok with the new build after the reboot done by the upgrade process you don't lose access to the WebGUI but if you reboot again it does not boot anymore



  • 2.5.0-DEVELOPMENT (amd64)
    built on Fri Oct 02 06:54:02 EDT 2020
    FreeBSD 12.2-STABLE

    Looks OK in VM (VirtualBOX) UEFI enabled.



  • Same for the real hardware.



  • @w0w Seeing your positive findings, I did retest.

    Sorry to say, but NOT OK here!
    System not booting, after update to actual snapshot!

    Louis


  • LAYER 8

    @w0w
    did you reboot twice?



  • Given wow his positive message, I did start an update from the gui.

    I noticed that the FW did not return / did not become active, so I looked ad my kvm-switch. Trying to boot form network since it could not find any valid local startup partion.

    So I did force a restart via "Cntr-Alt-Del" and again noticed no valid boot. Did that another time same result.

    So .... not ok yet. Notice that redmine has started an "urgent" issue related to this problem (10943), which is not reported as fixed.

    Louis



  • When I upgraded this morning, system was caught in reboot loop trying to load:

    FreeBSD 12.2-STABLE 2c7ab6a3c3f(devel-12) pfSense amd64

    When I switched to kernel.old it came up with:

    2.5.0-DEVELOPMENT (amd64)
    built on Sat Oct 03 06:53:59 EDT 2020
    FreeBSD 12.2-PRERELEASE

    Crashes are here: https://gist.github.com/emes/cba7dfb53e7127f1b122c6045e341983



  • Thanks for the info. I almost upgrade it.


  • LAYER 8

    @msm i think you have a different problem for kernel.old
    e1000_reset_hw_82571
    there is a driver problem with your e1000
    but maybe the old kernel is incompatible with other upgrades done by pfsense



  • Yes, was just filing under warnings for others.

    You meant to say "new kernel", no?


  • LAYER 8

    with new kernel you have boot loop as we have right?
    you said that switching to kernel.old lead to a crash dump



  • @kiokoman said in Warning! Do not update with todays 1/10 snapshot!:

    @w0w
    did you reboot twice?

    Three times. If it does matter I use ZFS.
    On the real hardware I have mirrored ZFS volume. Sometimes after upgrade it comes with old version back, so I need to update this twice sometimes. On the latest snapshots I did it every time today, but I didn't attach any importance to it. May be it's failed really, but booted because of ZFS, dunno. This does not explain why VM version on ZFS but without mirror works fine.


  • LAYER 8

    could be a combination of efi and ufs



  • Installed snapshot from Sep 16 19:03 in the Oracle VirtualBox VM using BIOS emulation and selecting AUTO UEFI UFS, after installation completed, I've switched to UEFI and then booted the VM and started upgrade from console, option 13, it loaded latest files from 4 oct and installed them without any error. Rebooted 5 times โ€” no problem, am I lucky one or we just missing something?
    Here is the screenshot of my VM settings:
    sets.jpg
    Install process:
    https://streamable.com/v81y0j
    Boot, upgrade and several reboots:
    https://streamable.com/50idjx



  • I just did a lot of testing using a today's snapshot

    First test was an upgrade from the GUI. Not successful, but it did reboot ๐Ÿ˜Š

    Second test was a new install from an usb-stick without any config. That did run, but I could not use it (no-config).

    Third test was a new install from usb, with my config on a second usb-stick, and that did not work ๐Ÿ˜ข

    And I did notice two other problems:

    • the config file did have wrong entry's according to the parser. I noticed two "dhcpddata" and "dhcpdv6data" both occurring twice! And I think I agree with the parser ๐Ÿ˜Š and deleted the "redundant (different!) entry's.
    • then there was another issue related to lagg's. As soon as the startup sequence reaches "Configurating LAGG interfaces " it crashed and started again

    So it seems that there is more than one issue

    Whatever have I stopped testing for now, being back op the older version.

    Louis



  • Hmmm... it is possible that you have broken config. Netgate team did a lot of changes in the middle of September, adding new features for XML parser and backup. A lot of changes in the code. I have problems with CARP now and there is still something broken. What version do you have now, 27/9?



  • A few remarks:

    Be aware I do not now what is causing the problems I have.

    Given my findings reported a few hours ago, I did decide to have a look what is the next action after Configurating LAGGS
    So I did boot and noticed that is Configurating VLANs

    Very recent I noticed I had a crashdump when running the 27/9 version, which might be related to laggs as well.
    <118>Configuring VLAN interfaces...
    <6>vlan0: changing name to 'lagg0.10'
    panic: sleeping in an epoch section
    cpuid = 2
    time = 1601543446
    KDB: enter: panic

    However, despite that dump the 27/9 version seems to work OK

    After the config changes I did do earlier today, related to the xml-parser issues, I did decide to test what happens when I made changes to the lagg.

    So, I did remove one of the interfaces from the lagg and assign all vlans related to the lagg to that "now free interface".
    => no issues
    => reboot no issues
    => upgrade to todays snapshot => no issues

    Than I did move one of the vlan's back to the lagg
    => crash
    => did recover with the vlan assigned to its previous IO

    So may be .... :

    • there was a boot issue
    • and a config and/or parser issue
    • and a lagg issue

    So far, my actual findings, not 100% clear yet!

    Louis



  • Starting to narrow things down Louis2. Forgive me if I missed this or I am just confounding responses, but this only seems to bite on your setup if you have lagg, but does having a lagg cause the issue on bare-metal and in a VM or just in a VM?



  • @vesalius I do no have VM's here so all my findings relate to real hardware.

    For info

    • the LAGG is normally IGB0 plus IGB1
    • and part of my vlans are assigned to that LAGG
    • since the LAGG causes trouble I removed IGB0 from the LAGG and moved those VLAN's to IGB0
    • then I removed the LAGG completely

    That is the test config I am using now. The FW seems to work that way
    ๐Ÿ˜Š
    My network is not, since the attached switch does expect a LAGG ๐Ÿ˜–

    So I have to return to a working config tomorrow (the older snapshot and the older config probably)

    My main computer is attached to another switch, thats why I can write this reply :)

    Louis





  • I would not be surprised ๐Ÿ˜Š
    It is OS-related and severe that is for sure.

    Netgate should have a look, so I uploaded one of my crash dumps for analyses.

    Louis



  • I've also had issues with upgrading to the latest snapshots from a previous snapshot, I've had to wipe my device and restore config.

    It seems to be fine when upgrading from 2.4.5-p1 to the latest snapshot but I bet it'll break again if I then upgrade to another snapshot.

    Sorry I don't have logged what exactly the error was but my google search history points to

    "Cannot open /boot/lua/cli
    Lua Error loading module, file not found"

    If it happens again I'll take a screenshot.



  • Guys, it's a bug. I tried to update my bios and use basic settings (disable all turbo boost/overclock). It's still crached.
    I think someone added "(" somewhere by accident. ;)



  • I have pfSense installed bare metal using EFI/GPT/ZFS (no mirroring) currently on an AMD EPYC 3251 SuperMicro build and previously (a few days ago) on an Intel D-1541 based firewall, I have not had issues rolling from one build to another daily since August, I tested extra reboots after upgrades recently after seeing this thread and no issues were seen. I finally did a clean install of the Oct 3 morning snapshot for another issue and still have not run into this crash problem having done upgrades to the latest builds each day since. My network adapter is an Intel X710-T2L and I have no VLANS or LAGG running if that helps. Maybe something to do with specific configurations or hardware?



  • Per Jim Pingle on Redmine seems like netgate knows about the LAGG issue hitting the OP @louis2 and may have a fix for it.
    https://redmine.pfsense.org/issues/10956

    There is also another EFI bug that needs to be tracked down.
    https://redmine.pfsense.org/issues/10943



  • Good news!

    Yesterday afternoon "jim/redmine" asked me to test/confirm a fix for the lagg problem.

    Triggered by that signal I did upgrade to yesterday evening snapshot. That worked!
    This morning I checked my logging's and I did not notice disturbing things.

    So

    • boot issue seems to be solved
    • lagg issue seems to be solved (I do not know if that is by a temporarily patch or with the final solution)

    About my third problem "xml-parser and/or config issues", I did not noticed that problem any more, after I "fixed" my config. I do now what caused those issues and if things have been fixed. I can just advice to monitor the boot/startup messages on the console, in case you have boot/startup problems.

    Louis



  • OK, I'll give it a test later tonight when I get home and I'll report back.



  • Boot, lagg and e1000 issues all seem to be resolved with latest build.

    I did notice that dhcpv6 client stopped working for me after 10/1 updates, but will look at that separately.



  • @Quarkz I doubt pfSense activates invariants.


Log in to reply