Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7

    Scheduled Pinned Locked Moved General pfSense Questions
    59 Posts 6 Posters 675 Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      rfranzke
      last edited by

      So did the 2.8.1 upgrade and let the boxes do what they do. After everything was stable, I shut them both down and brought them up again. Within 5 minutes the backup FW crashed. So unfortunately, the beta version did not fix my issue it seems. I no longer have any option to go back to 2.8.0 so seems I'm stuck with a beta that doesn't fix anything for me. Worth a try I guess but I guess I am relegated to starting over unless someone can sort out what the posted crash reports say. I can't make heads or tails of it.

      1 Reply Last reply Reply Quote 0
      • N Offline
        netblues @rfranzke
        last edited by

        @rfranzke As it is already stated , you can't go back while in beta or rc
        When the release comes out you will be able to upgrade to final.

        It;s sad you have to dig around hardware issues, however pf sense is a complete hardware solution
        If you opted for such a solution you would have the complete enterprise ready platform.
        Running it on any other platform has some risks.
        However there are options.

        a. Virtualize the whole thing under any hypervisor. Lots of options here.
        Then everything is a file and you also get snapshots.
        The overhead is negligible, and many have been doing this for years in HA setups.
        I'm one of them.
        On top of that, pfsense being available on aws and azure also means it runs well under a hypervisor and is more or less supported.

        b. Spend 120$ for each instance and upgrade to plus.
        You will get tac support (low on calories too) and most of all , boot environments
        which allows you to go back.

        Imho virtualization gives you better control than boot environments by design, however knowing your way around running virtualized pf in a ha production env, does require some knowledge, especially in a crisis.

        Don't worry too much about going back to 2.8, 2.8,1 beta is quite stable (under kvm) too.

        S R 2 Replies Last reply Reply Quote 0
        • S Offline
          SteveITS Rebel Alliance @netblues
          last edited by

          I keep forgetting but I believe the bectl command works on CE, it's just that BEs are in the web GUI in Plus. Just gotta make the BE first (sorry).

          Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
          When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
          Upvote ๐Ÿ‘ helpful posts!

          R 1 Reply Last reply Reply Quote 0
          • R Offline
            rfranzke @netblues
            last edited by

            @netblues Yes fair point on the hardware bit. Buying the NetGate hardware would likely solve my issues here. Trying to get all this stuff to work on commodity hardware is a challenge, likely more to do with the BSD foundation it sits on than PFSense itself. It's the leg up the Palo Altos and Cisco's of the world have in the enterprise space. I am grateful there is a CE option at all here. The real issue is that I am not sure what to do with this crash report.

            Would it be fair to say the support contract would likely get me answers on the crash report or will this boil down to 'start over' while paying for the privilege?

            Honestly my real disappointment is not being able to have this thing deployed. Looks like it would work great, but I cannot even get it out of the stable.

            Thanks for all the replies.

            1 Reply Last reply Reply Quote 0
            • R Offline
              rfranzke @SteveITS
              last edited by

              @SteveITS Well good to know for future. Would help with worries about doing upgrades when this finally goes to production. I'll keep it in mind for next upgrade process. Thanks for that tip.

              Is there any chance my config for 2.8 would work if I reinstalled to 2.7 and restored it? I guess thinking about it now my only real issue I seemed to have with moving to new hardware was to do with the change in NIC hardware. Could be wrong. I know you said generally newer configs to older versions is a no go but.......

              S 1 Reply Last reply Reply Quote 0
              • S Offline
                SteveITS Rebel Alliance @rfranzke
                last edited by

                @rfranzke Normally no, sorry. They have a table linked on page https://docs.netgate.com/pfsense/en/latest/backup/restore-different-version.html and 2.8.x is config file v24.0.

                The files are XML so you could compare them and edit manually. Or make recent changes again.

                The time I've run into trouble restoring to different hardware, as I recall now, is by clicking Apply before clicking Save, on the page where you assign interfaces. I don't know if they fixed that. But then in that state pfSense will stop during boot to ask you to reassign interfaces via the console.

                Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                Upvote ๐Ÿ‘ helpful posts!

                1 Reply Last reply Reply Quote 0
                • R Offline
                  rfranzke
                  last edited by

                  I prolly have 2.7 backups somewhere that I took before the upgrade that will get me most of the way there. I forget now the status on the various installed packages. Seems like there was a way to have packages installed that are needed as part of the restore but cannot remember if there was some backup tick you had to check when doing the backup to support that. Maybe I am dreaming on that.

                  So, should I get the base PFSense installed, then packages installed, then restore the config for the correct version? Or should I get base installed, install packages, get HA/sync working, and then restore. Or maybe I can simply install and restore the backups I have to make this work. The backups I have is for everything.

                  Thanks all for the help. I'm committed to getting this going.

                  S 1 Reply Last reply Reply Quote 0
                  • S Offline
                    SteveITS Rebel Alliance @rfranzke
                    last edited by

                    @rfranzke The restore will install packages that were in the backup. There's no need to manually install packages.

                    It is possible to skip backing up packages. Or restore parts of a config file.

                    This may help:
                    https://docs.netgate.com/pfsense/en/latest/backup/restore-during-install.html#restore-using-the-external-configuration-locator-ecl

                    Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                    When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                    Upvote ๐Ÿ‘ helpful posts!

                    R 1 Reply Last reply Reply Quote 0
                    • R Offline
                      rfranzke @SteveITS
                      last edited by

                      @SteveITS Thanks for this. I am gonna try this. I found backups labelled with version 23.3 which I think is for 2.7.2.

                      Is there any value in trying to download the 2.8.0 installer thinking that the upgrade process itself was responsible for the issues I'm having or should I just get back to 2.7.2. My issue here is that if I get this going again on 2.7.2, I'd be too afraid to ever upgrade this setup in the future. I really would like to be able to upgrade this install.

                      Still really wish I could find some guidance on using this dump file to figure out what's causing this specifically.

                      N S 2 Replies Last reply Reply Quote 0
                      • N Offline
                        netblues @rfranzke
                        last edited by netblues

                        @rfranzke When running in ha, you can upgrade secondary node only and failover to it.
                        Having a backup of secondary is doable.
                        If it doesnot crash, you csn upgrade the primary too, or restore.

                        This is how upgrades are done

                        As for installer, I doubt a failed package install can crash the whole thing.
                        This is freebsd, not windows me.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S Offline
                          stephenw10 Netgate Administrator
                          last edited by

                          Do you have other crash reports?

                          That one is not very revealing. However if they are all identical crashes it's probably a software issue.

                          R 1 Reply Last reply Reply Quote 0
                          • S Offline
                            SteveITS Rebel Alliance @rfranzke
                            last edited by

                            @rfranzke said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                            value in trying to download the 2.8.0 installer

                            There is not a "2.8.0 installer"...there is a 2.7.2 installer, and the new Netgate Installer which lets you choose versions.

                            @netblues said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                            When running in ha, you can upgrade secondary node only and failover to it.
                            Having a backup of secondary is doable.
                            If it doesnot crash, you csn upgrade the primary too, or restore.

                            Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:
                            https://docs.netgate.com/pfsense/en/latest/install/upgrade-guide-ha.html#pfsync-considerations
                            I've never tried to run a different version for long enough for a failover to matter so I can't say offhand if that's actually a normal problem or just a possibility.

                            Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                            When upgrading, allow 10-15 minutes to reboot, or more depending on packages, and device or disk speed.
                            Upvote ๐Ÿ‘ helpful posts!

                            N 1 Reply Last reply Reply Quote 0
                            • R Offline
                              rfranzke @stephenw10
                              last edited by rfranzke

                              @stephenw10 I have one that happened on the secondary FW right after the upgrade to 2.8.1. Same scenario basically. Boot up, runs for about 5-10 minutes and then panics. What is strange is that after this initial panic , it will run for a while solid with no crashes. Seems to be just as it boots up, so not sure if its something starting up that does this. I have FRR running an OSPF process to a lab switch to exchange some internal subnet routes. I had set the process to start when the CARP status is master. That was as somewhat recent config change, but it ran fine with no panics on 2.7. So maybe it trying to figure out if it should start that process or not while watching CARP status between the two machines. This just seems like something the two FWs are trying to work out between each other early in the boot process. Like a late starting daemon or something. I'll try in the morning to start one box up and let it run for a bit before starting the other one and see what we get.

                              See new dump attached. Thanks for checking here.

                              textdump.tar (2).0

                              1 Reply Last reply Reply Quote 0
                              • N Offline
                                netblues @SteveITS
                                last edited by

                                @SteveITS said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:

                                I agree, however 2.7.2 and 2.8 are on the same freebsd version, so its safe to do so.

                                And I have tested it recently too with no (obvious) issues

                                K 1 Reply Last reply Reply Quote 0
                                • K Offline
                                  kprovost @netblues
                                  last edited by

                                  @netblues said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                  @SteveITS said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:

                                  Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:

                                  pfsync tries very hard to be compatible between versions too, but bugs do happen.

                                  I agree, however 2.7.2 and 2.8 are on the same freebsd version, so it's safe to do so.

                                  They are not.
                                  They may both say "15", but they're not the same "15".

                                  N 1 Reply Last reply Reply Quote 0
                                  • N Offline
                                    netblues @kprovost
                                    last edited by

                                    @kprovost Still, minor versions/differences.

                                    No one would do that long term, but considering the situation above, that's the least of problems too.

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S Offline
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      Hmm, well that's a completely different backtrace.

                                      First Panic:

                                      db:1:pfs> bt
                                      Tracing pid 2 tid 100058 td 0xfffff8006c1a5740
                                      kdb_enter() at kdb_enter+0x33/frame 0xfffffe015852db10
                                      panic() at panic+0x43/frame 0xfffffe015852db70
                                      trap_fatal() at trap_fatal+0x40b/frame 0xfffffe015852dbd0
                                      trap_pfault() at trap_pfault+0x46/frame 0xfffffe015852dc20
                                      calltrap() at calltrap+0x8/frame 0xfffffe015852dc20
                                      --- trap 0xc, rip = 0xffffffff80cf2042, rsp = 0xfffffe015852dcf0, rbp = 0xfffffe015852dd90 ---
                                      __rw_wlock_hard() at __rw_wlock_hard+0x152/frame 0xfffffe015852dd90
                                      arptimer() at arptimer+0x252/frame 0xfffffe015852de10
                                      softclock_call_cc() at softclock_call_cc+0x16d/frame 0xfffffe015852dec0
                                      softclock_thread() at softclock_thread+0xe5/frame 0xfffffe015852def0
                                      fork_exit() at fork_exit+0x7b/frame 0xfffffe015852df30
                                      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015852df30
                                      --- trap 0xafafafaf, rip = 0xafafafafafafafaf, rsp = 0xafafafafafafafaf, rbp = 0xafafafafafafafaf ---
                                      

                                      2nd Panic:

                                      db:1:pfs> bt
                                      Tracing pid 11 tid 100003 td 0xfffff8006c179740
                                      kdb_enter() at kdb_enter+0x33/frame 0xfffffe015840b9c0
                                      panic() at panic+0x43/frame 0xfffffe015840ba20
                                      trap_fatal() at trap_fatal+0x40b/frame 0xfffffe015840ba80
                                      trap_pfault() at trap_pfault+0x46/frame 0xfffffe015840bad0
                                      calltrap() at calltrap+0x8/frame 0xfffffe015840bad0
                                      --- trap 0xc, rip = 0xffffffff80d15bdd, rsp = 0xfffffe015840bba0, rbp = 0xfffffe015840bc00 ---
                                      callout_process() at callout_process+0x1ad/frame 0xfffffe015840bc00
                                      handleevents() at handleevents+0x186/frame 0xfffffe015840bc40
                                      timercb() at timercb+0x236/frame 0xfffffe015840bc90
                                      lapic_handle_timer() at lapic_handle_timer+0xab/frame 0xfffffe015840bcb0
                                      Xtimerint() at Xtimerint+0xb1/frame 0xfffffe015840bcb0
                                      --- interrupt, rip = 0xffffffff804eb162, rsp = 0xfffffe015840bd80, rbp = 0xfffffe015840bdb0 ---
                                      acpi_cpu_idle() at acpi_cpu_idle+0x2e2/frame 0xfffffe015840bdb0
                                      cpu_idle_acpi() at cpu_idle_acpi+0x46/frame 0xfffffe015840bdd0
                                      cpu_idle() at cpu_idle+0x9d/frame 0xfffffe015840bdf0
                                      sched_idletd() at sched_idletd+0x546/frame 0xfffffe015840bef0
                                      fork_exit() at fork_exit+0x7b/frame 0xfffffe015840bf30
                                      fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015840bf30
                                      --- trap 0xafafafaf, rip = 0xafafafafafafafaf, rsp = 0xafafafafafafafaf, rbp = 0xafafafafafafafaf ---
                                      

                                      I would try to compare more crashes if you can. Different and random backtraces like that usually points to a hardware issue. But that's from different version so it may simply be different there.

                                      1 Reply Last reply Reply Quote 0
                                      • R Offline
                                        rfranzke
                                        last edited by

                                        OK quick update here. I ran through the reinstall process on the backup firewall using the USB stick for install as well as using the USB config import on first boot. Seems to have worked mostly. None of the packages got re-installed however using this method so I re-installed them manually. Not sure why that did not work or happen.....maybe it just doesn't do that using this method. No matter I know the process now. Very cool it works this way and good to know so thanks for that heads up here. For those of you keeping track at home, I have re-installed the backup FW with a fresh USB stick using the 2.8.0 option. So I at the moment have a 2.8.1 install on the primary firewall and 2.8.0 running on the backup firewall. I'll see how this goes in terms of crash dumps for a bit and then upgrade the primary if all goes well. I wanted to do this to see if I can get on a proper latest stable build of CE. If the crashes continue I will have new dumps to post of the same OS. 2.8.0 is where I think I want to be as the changes in the new .1 beta did not help me here. But if nothing else its good for this N00B to run through the backup/restore process. Will seehowit goes. Thanks for the help all.

                                        1 Reply Last reply Reply Quote 1
                                        • stephenw10S Offline
                                          stephenw10 Netgate Administrator
                                          last edited by

                                          Hmm, the package should get reinstalled when importing a config. The only reason it wouldn't is if there was no access to the repo after the boot when the install is triggered. But you should see something logged if that happens.
                                          I sometimes see that if the interface config is very different and that's still in progress. Restoring the config again after boot will correctly reinstall packages if that's the case.

                                          R 1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            rfranzke @stephenw10
                                            last edited by

                                            @stephenw10 Yes, I think this is exactly what happened here. After my last post I realized that the gateway I currently use to get to the Internet was not configured. I have a third link I use to get off net in my test environment. This config is for our data center environment and has IP address that do not exist here. So, I created a third DHCP interface to tie this into the actual LAN the boxes are currently on. I switch to using this interface as the gateway when I need them to be able to download PFBlockerNG updates, access Netgate Servers, etc. For some reason in my config I imported the GW was set to the normal GW which will work in my DC setup. Just doesn't work here. So, I had to manually switch to using the secondary gateway to download and install the missing packages.

                                            I got impatient and went ahead and reinstalled the primary with 2.8.0 and restored the config there. This time I saw a message on first login that said it was re-installing the packages in the background. I switched to the opt1 interface gateway and all the packages were installed perfectly. Not sure why I had to much trouble with previous backup/restores but this works slick today.

                                            So now I am running the HA pair both on fresh installs of 2.8.0 (not upgraded from 2.7.0). Will let this bake for today and see what we get. Will post any additional dumps I get here.

                                            Thanks all for the help.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.