Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7
-
@SteveITS Thanks for the reply. Being as new as I am with this, I'm not sure how I would get the 2.8.1 version on here. Assuming this is current RC status? You have to be part of the beta team to get access to this or is there a link readily available somewhere? Assuming I have to build a new installer USB stick to get this on there (my UI does not list this RC as available).
I can appreciate trying to figure this out via forum posts but am not beyond spending the coin for a support sub, which I believe requires plus to utilize. As I am still 'trialing' this move to PFSense, I was hoping to get past this issue before fully getting into bed with NetGate for something like that but could be convinced if they could figure this out. We need a new firewall for our org, and I was hoping PFSense was the answer. Not to rant but at this point I would have likely been fired if I had these actually in production with as many times as I have had to 'start over' now. I just cannot believe the answer here is re-install. Makes me feel like this is still just a hobbyists tool. Really want to like this but it's been a fight so far.
Thanks for all replies.
-
@rfranzke On page System > Update there's a choice of branches and you should be able to see the beta. I don't have a 2.x install to check but I've seen others post about it including Netgate so I'd think it should be there? It's a beta not RC.
https://www.reddit.com/r/PFSENSE/comments/1m2g8k2/pfsense_ce_281_beta_now_available/
Just be aware that packages also use that setting so if you change it and don't upgrade, change it back before updating or installing any packages. (see my sig)
Technically I think (assume) you can get the beta from the Netgate Installer but 1) you can just upgrade, and 2) you probably don't need a new Installer as I doubt that would be required...the Installer is separate from the product versions it installs.
-
@SteveITS Yes thanks here I see it now in the UI. I'll give this a try. Thanks for setting me straight.
Can you downgrade from 2.8.1 once you upgrade or are you stuck there running a beta version? What happens when this comes out of beta? Will it just show I'm in the stable branch when it does?
-
So did the 2.8.1 upgrade and let the boxes do what they do. After everything was stable, I shut them both down and brought them up again. Within 5 minutes the backup FW crashed. So unfortunately, the beta version did not fix my issue it seems. I no longer have any option to go back to 2.8.0 so seems I'm stuck with a beta that doesn't fix anything for me. Worth a try I guess but I guess I am relegated to starting over unless someone can sort out what the posted crash reports say. I can't make heads or tails of it.
-
@rfranzke As it is already stated , you can't go back while in beta or rc
When the release comes out you will be able to upgrade to final.It;s sad you have to dig around hardware issues, however pf sense is a complete hardware solution
If you opted for such a solution you would have the complete enterprise ready platform.
Running it on any other platform has some risks.
However there are options.a. Virtualize the whole thing under any hypervisor. Lots of options here.
Then everything is a file and you also get snapshots.
The overhead is negligible, and many have been doing this for years in HA setups.
I'm one of them.
On top of that, pfsense being available on aws and azure also means it runs well under a hypervisor and is more or less supported.b. Spend 120$ for each instance and upgrade to plus.
You will get tac support (low on calories too) and most of all , boot environments
which allows you to go back.Imho virtualization gives you better control than boot environments by design, however knowing your way around running virtualized pf in a ha production env, does require some knowledge, especially in a crisis.
Don't worry too much about going back to 2.8, 2.8,1 beta is quite stable (under kvm) too.
-
I keep forgetting but I believe the
bectl
command works on CE, it's just that BEs are in the web GUI in Plus. Just gotta make the BE first (sorry). -
@netblues Yes fair point on the hardware bit. Buying the NetGate hardware would likely solve my issues here. Trying to get all this stuff to work on commodity hardware is a challenge, likely more to do with the BSD foundation it sits on than PFSense itself. It's the leg up the Palo Altos and Cisco's of the world have in the enterprise space. I am grateful there is a CE option at all here. The real issue is that I am not sure what to do with this crash report.
Would it be fair to say the support contract would likely get me answers on the crash report or will this boil down to 'start over' while paying for the privilege?
Honestly my real disappointment is not being able to have this thing deployed. Looks like it would work great, but I cannot even get it out of the stable.
Thanks for all the replies.
-
@SteveITS Well good to know for future. Would help with worries about doing upgrades when this finally goes to production. I'll keep it in mind for next upgrade process. Thanks for that tip.
Is there any chance my config for 2.8 would work if I reinstalled to 2.7 and restored it? I guess thinking about it now my only real issue I seemed to have with moving to new hardware was to do with the change in NIC hardware. Could be wrong. I know you said generally newer configs to older versions is a no go but.......
-
@rfranzke Normally no, sorry. They have a table linked on page https://docs.netgate.com/pfsense/en/latest/backup/restore-different-version.html and 2.8.x is config file v24.0.
The files are XML so you could compare them and edit manually. Or make recent changes again.
The time I've run into trouble restoring to different hardware, as I recall now, is by clicking Apply before clicking Save, on the page where you assign interfaces. I don't know if they fixed that. But then in that state pfSense will stop during boot to ask you to reassign interfaces via the console.
-
I prolly have 2.7 backups somewhere that I took before the upgrade that will get me most of the way there. I forget now the status on the various installed packages. Seems like there was a way to have packages installed that are needed as part of the restore but cannot remember if there was some backup tick you had to check when doing the backup to support that. Maybe I am dreaming on that.
So, should I get the base PFSense installed, then packages installed, then restore the config for the correct version? Or should I get base installed, install packages, get HA/sync working, and then restore. Or maybe I can simply install and restore the backups I have to make this work. The backups I have is for everything.
Thanks all for the help. I'm committed to getting this going.
-
@rfranzke The restore will install packages that were in the backup. There's no need to manually install packages.
It is possible to skip backing up packages. Or restore parts of a config file.
This may help:
https://docs.netgate.com/pfsense/en/latest/backup/restore-during-install.html#restore-using-the-external-configuration-locator-ecl -
@SteveITS Thanks for this. I am gonna try this. I found backups labelled with version 23.3 which I think is for 2.7.2.
Is there any value in trying to download the 2.8.0 installer thinking that the upgrade process itself was responsible for the issues I'm having or should I just get back to 2.7.2. My issue here is that if I get this going again on 2.7.2, I'd be too afraid to ever upgrade this setup in the future. I really would like to be able to upgrade this install.
Still really wish I could find some guidance on using this dump file to figure out what's causing this specifically.
-
@rfranzke When running in ha, you can upgrade secondary node only and failover to it.
Having a backup of secondary is doable.
If it doesnot crash, you csn upgrade the primary too, or restore.This is how upgrades are done
As for installer, I doubt a failed package install can crash the whole thing.
This is freebsd, not windows me. -
Do you have other crash reports?
That one is not very revealing. However if they are all identical crashes it's probably a software issue.
-
@rfranzke said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
value in trying to download the 2.8.0 installer
There is not a "2.8.0 installer"...there is a 2.7.2 installer, and the new Netgate Installer which lets you choose versions.
@netblues said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
When running in ha, you can upgrade secondary node only and failover to it.
Having a backup of secondary is doable.
If it doesnot crash, you csn upgrade the primary too, or restore.Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:
https://docs.netgate.com/pfsense/en/latest/install/upgrade-guide-ha.html#pfsync-considerations
I've never tried to run a different version for long enough for a failover to matter so I can't say offhand if that's actually a normal problem or just a possibility. -
@stephenw10 I have one that happened on the secondary FW right after the upgrade to 2.8.1. Same scenario basically. Boot up, runs for about 5-10 minutes and then panics. What is strange is that after this initial panic , it will run for a while solid with no crashes. Seems to be just as it boots up, so not sure if its something starting up that does this. I have FRR running an OSPF process to a lab switch to exchange some internal subnet routes. I had set the process to start when the CARP status is master. That was as somewhat recent config change, but it ran fine with no panics on 2.7. So maybe it trying to figure out if it should start that process or not while watching CARP status between the two machines. This just seems like something the two FWs are trying to work out between each other early in the boot process. Like a late starting daemon or something. I'll try in the morning to start one box up and let it run for a bit before starting the other one and see what we get.
See new dump attached. Thanks for checking here.
-
@SteveITS said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:
I agree, however 2.7.2 and 2.8 are on the same freebsd version, so its safe to do so.
And I have tested it recently too with no (obvious) issues
-
@netblues said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
@SteveITS said in Frequent Crashing (Page Fault) After Upgrade to 2.8.0 From Latest 2.7:
Generally, yes, but per the docs pf may not sync states correctly between FreeBSD versions:
pfsync tries very hard to be compatible between versions too, but bugs do happen.
I agree, however 2.7.2 and 2.8 are on the same freebsd version, so it's safe to do so.
They are not.
They may both say "15", but they're not the same "15". -
@kprovost Still, minor versions/differences.
No one would do that long term, but considering the situation above, that's the least of problems too.
-
Hmm, well that's a completely different backtrace.
First Panic:
db:1:pfs> bt Tracing pid 2 tid 100058 td 0xfffff8006c1a5740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe015852db10 panic() at panic+0x43/frame 0xfffffe015852db70 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe015852dbd0 trap_pfault() at trap_pfault+0x46/frame 0xfffffe015852dc20 calltrap() at calltrap+0x8/frame 0xfffffe015852dc20 --- trap 0xc, rip = 0xffffffff80cf2042, rsp = 0xfffffe015852dcf0, rbp = 0xfffffe015852dd90 --- __rw_wlock_hard() at __rw_wlock_hard+0x152/frame 0xfffffe015852dd90 arptimer() at arptimer+0x252/frame 0xfffffe015852de10 softclock_call_cc() at softclock_call_cc+0x16d/frame 0xfffffe015852dec0 softclock_thread() at softclock_thread+0xe5/frame 0xfffffe015852def0 fork_exit() at fork_exit+0x7b/frame 0xfffffe015852df30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015852df30 --- trap 0xafafafaf, rip = 0xafafafafafafafaf, rsp = 0xafafafafafafafaf, rbp = 0xafafafafafafafaf ---
2nd Panic:
db:1:pfs> bt Tracing pid 11 tid 100003 td 0xfffff8006c179740 kdb_enter() at kdb_enter+0x33/frame 0xfffffe015840b9c0 panic() at panic+0x43/frame 0xfffffe015840ba20 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe015840ba80 trap_pfault() at trap_pfault+0x46/frame 0xfffffe015840bad0 calltrap() at calltrap+0x8/frame 0xfffffe015840bad0 --- trap 0xc, rip = 0xffffffff80d15bdd, rsp = 0xfffffe015840bba0, rbp = 0xfffffe015840bc00 --- callout_process() at callout_process+0x1ad/frame 0xfffffe015840bc00 handleevents() at handleevents+0x186/frame 0xfffffe015840bc40 timercb() at timercb+0x236/frame 0xfffffe015840bc90 lapic_handle_timer() at lapic_handle_timer+0xab/frame 0xfffffe015840bcb0 Xtimerint() at Xtimerint+0xb1/frame 0xfffffe015840bcb0 --- interrupt, rip = 0xffffffff804eb162, rsp = 0xfffffe015840bd80, rbp = 0xfffffe015840bdb0 --- acpi_cpu_idle() at acpi_cpu_idle+0x2e2/frame 0xfffffe015840bdb0 cpu_idle_acpi() at cpu_idle_acpi+0x46/frame 0xfffffe015840bdd0 cpu_idle() at cpu_idle+0x9d/frame 0xfffffe015840bdf0 sched_idletd() at sched_idletd+0x546/frame 0xfffffe015840bef0 fork_exit() at fork_exit+0x7b/frame 0xfffffe015840bf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe015840bf30 --- trap 0xafafafaf, rip = 0xafafafafafafafaf, rsp = 0xafafafafafafafaf, rbp = 0xafafafafafafafaf ---
I would try to compare more crashes if you can. Different and random backtraces like that usually points to a hardware issue. But that's from different version so it may simply be different there.