Upgrade to 21.02-RELEASE borked on SG-3100
-
@jimp said in Upgrade to 21.02-RELEASE borked on SG-3100:
Usually dogfooding the snapshots catches most things, but there are many more real-world configurations than we can possibly test.
I don't think my config is all that exotic, but should I share it with you all? Do you know what the issue is?
-
We found a way to reliably trigger it here in lab conditions so we can work on it, no need to provide more info at the moment.
-
@ahking19 Clearly. I'm usually don't mind being a testing guinea pig but not this week, too much going on.
-
Hi All,
Yesterday dune the update and you can tell, also for me not good.
My SG-3100 is stuck and keeps hanging in to A boot loop....With the serial connected, I can see that the Marvell U-Boot is OK.
When it start to boot pfsense i got this error.So need to create A ticket for reinstalling pfsense.
So my network needs to rely on mu ubnt ER-8 pro.Greetings Dennis
-
Ran into an interesting problem.
I can't add back my packages after moving back to an earlier version and I think it's because the place where it's checking for packages is the old place?
Here is the error
[2.4.5-RELEASE][admin@pfSense.i.lacy.ie]/root: pkg search FreeRadius pkg: Warning: Major OS version upgrade detected. Running "pkg bootstrap -f" recommended pkg: Repository pfSense-core missing. 'pkg update' required pkg: https://files01.netgate.com/pkg/pfSense_plus-v21_02_armv7-core/meta.txz: Not Found pkg: https://files01.netgate.com/pkg/pfSense_plus-v21_02_armv7-core/packagesite.txz: Not Found pkg: https://files01.netgate.com/pkg/pfSense_plus-v21_02_armv7-pfSense_plus-v21_02/meta.txz: Not Found pkg: https://files01.netgate.com/pkg/pfSense_plus-v21_02_armv7-pfSense_plus-v21_02/packagesite.txz: Not Found
-
The update to 21.02 for SG-3100 appliances has been temporarily pulled to fix a bug. So that has probably confused the package system as well on those appliances.
If you can, just be patient for a day or two and Netgate should get a new update for the SG-3100 posted. If you can't wait, there are instructions for rolling back to the previous release. You will need to open a ticket with Netgate for that. There is no charge for providing you a rollback image, and they will send you the link via email shortly after you open a ticket. Start the process here: https://go.netgate.com/support/login.
-
Under System | Update, set to Previous 2.4.5 to get the correct package repos for 2.4.5-p1.
-
After my ticket for support.
Within two minutes got the answer from the netgate team, gread work for that ;)
within 10 minutes my SG-310 was running again.Only i going to wait A few days to put it back in production.
Mainly because of testing this upgrade and their are still no packages available.
Its what Bill said "give Netgate the time to work it out" -
@evaknievel Sometimes if you switch the branch back it may the case that PKG itself gets stuck. A fix is in the prev branche to issue "pkg-static install -f pkg"
Small write up about that problem often Showing Shared object "libarchive.so.7" not found, required by "pkg"
For my SG-3100 it's strangly fine now. No crash since >24h but need to say I disabled all packages. Hope there is a "fix" soonish don't want to roll back completely.
-
@yaminb said in Upgrade to 21.02-RELEASE borked on SG-3100:
@jimp said in Upgrade to 21.02-RELEASE borked on SG-3100:
hw.ncpu=1
Really silly question just for completeness.
I don't have a loader.conf.local.
I made the change in loader.conf. I'm assuming this is good.
Our 3100 locked up after about 16 hours. Then twice more in the next 8 hours. We applied the "hw.ncpu=1" fix to loader.conf and it has been running fine for the last 8 hours. (Like @yaminb we could not find a loader.conf.local.)
-
@warlordzico said in Upgrade to 21.02-RELEASE borked on SG-3100:
I've upgraded my SG-3100 yesterday evening. I never had a respons from the SG-3100 anymore.
This morning i saw three blue leds slowly blinking in sync.
I pulled the power plug. The firewall restarted and finnished the setup. Till now, everything works with only one remark. Snort doesn't startup anymore. Tonight i'm gonna remove the snort package and do a clean install of it.Running 21.02 without any package installed, no hickups till now.
Everything works fine except: i cann't install any package. Package manager has following error: unable to retrieve package information.Next step, trying to get a Wireguard VPN running.
-
@lnguyen perfect thank you!
-
@deltaone said in Upgrade to 21.02-RELEASE borked on SG-3100:
Our 3100 locked up after about 16 hours. Then twice more in the next 8 hours. We applied the "hw.ncpu=1" fix to loader.conf and it has been running fine for the last 8 hours. (Like @yaminb we could not find a loader.conf.local.)
I think what @jimp stated was:
Create /boot/loader.conf.local if it doesn't exist, as loader.conf can be overwritten by pfSense.echo hw.ncpu=1 >> /boot/loader.conf.local
I agree with this as it won't be overwritten and easily reverted once a patch is released by simply issuing:
rm /boot/loader.conf.local
-
@lnguyen said in Upgrade to 21.02-RELEASE borked on SG-3100:
@deltaone said in Upgrade to 21.02-RELEASE borked on SG-3100:
Our 3100 locked up after about 16 hours. Then twice more in the next 8 hours. We applied the "hw.ncpu=1" fix to loader.conf and it has been running fine for the last 8 hours. (Like @yaminb we could not find a loader.conf.local.)
I think what @jimp stated was:
Create /boot/loader.conf.local if it doesn't exist, as loader.conf can be overwritten by pfSense.echo hw.ncpu=1 >> /boot/loader.conf.local
I agree with this as it won't be overwritten and easily reverted once a patch is released by simply issuing:
rm /boot/loader.conf.local
Good catch. In our case, we do want loader.conf to be over-written so we are again back at two CPUs.
-
@deltaone Just issue the command above and reboot. Once an official patch is released, you can issue the rm command and reboot.
-
If you make the loader.conf.local file, does the appliance use it since its the last one listed in the string?
loader_conf_files="/boot/device.hints /boot/loader.conf /boot/loader.conf.local"
-
@lnguyen said in Upgrade to 21.02-RELEASE borked on SG-3100:
@deltaone Just issue the command above and reboot. Once an official patch is released, you can issue the rm command and reboot.
Have done so. Thanks.
-
@rsherwood_va I have been running smoothly since the original problem. I just had a lockup this morning - tried creating /boot/loader.conf.local and adding hw.ncpu=1, then rebooting as recommended by the team here, but that did not resolve the issue. As before, the following process worked:
- ssh in to the router (you may need to use the IP address)
- choose option 11 (Restart webConfigurator)
- log in to the web gui
- (optional - not sure if this helped) Under status -> services, start the DNS and DHCPD services
- Under system-> setup wizard. Accept all the answers (it will remember what you chose last time, except for the admin password - reenter the old one)
rerunning the PFSense first-time setup wizard (accepting all the previously chosen values and choosing to enter the existing admin password as the "new" admin password) resolved the issue - I am now running smoothly.
I haven't seen the new update yet, so I'm assuming the issue is trickier than they thought. If you are stuck, try running the wizard again. -
Another update.
With the hw.ncpu=1 fix, it seemed to run fine, but now has locked up twice.
It feels like ncpu=1 has helped, but I don't think it's the issueI've put in a scheduled cron reboot every night to see if that keeps it up during working hours.
-
@yaminb said in Upgrade to 21.02-RELEASE borked on SG-3100:
Another update.
With the hw.ncpu=1 fix, it seemed to run fine, but now has locked up twice.
It feels like ncpu=1 has helped, but I don't think it's the issueI've put in a scheduled cron reboot every night to see if that keeps it up during working hours.
The switch to using a single CPU is a workaround that minimizes the chance of hitting the bug, but it does not eliminate the chance.
The actual problem has been identified and a fix is being tested. Here is a link to the discussion by the FreeBSD kernel programming nerds* of the problem and the fix: https://reviews.freebsd.org/D28821. I believe the pfSense team is now vigorously testing images with this fix applied to be sure the fix is really "the fix". And from the activity on the Redmine bug site for pfSense, it looks like a few other bugs are being addressed as well.
Note -- I don't mean "nerds" in an insulting sense . But when you live in the world of kernel spin locks and mutexes, and actually understand all that stuff, you are obligated to proudly wear the title of "kernel programming nerd".