Upgrade to 21.02-RELEASE borked on SG-3100
-
All this Update makes me a bit frustrated. It's one point why I choose to buy original netgate appliances instead of build your own. I hoped at least for their own build Products they do a very intense test before rolling out a backup.
No Question you cant forseen everything but if you read through the forum how many have trouble it seems like a bad release / quality control.
Hope they will find the issue and take some learnings.
For my upgraded testing SG-3100 it's not important, it is a non live LAB system but until this one is rock solid stable I will not start rolling out CPE Upgrades :)
-
@solarizde My system has now been running for 21h after failing about 10x at random 5m-6h intervals. Probably not related, but the only difference is that USB console is connected for debug purposes...
-
Hello,
Yesterday, I have re-installedmy device SG3100 with the firmware 21.02.
I have contact the support and them wend me the image to flash the device.
I have doing the configuration. During 2 hours no problem but after the system has explose and all the configuration has down like the other time....
I will try to downgrade the version... -
Same problem here, SG-3100 (21.02) runs for anything up to a few hours and then just locks up completely. Hard reset required, logs do not yield any clues at this stage. I've only got two packages installed PfBlockerNG and BandwidthD. Will try another clean install but not very hopeful as others have found this has not solved the problem.
Also just wondering if this problem is primarily occurring with the SG-3100. -
I have two devices available. First experienced problems right away, the other remote install just crashed/became unresponsible after ~1.5 day.
After power cycle openvpn now works, but stopped routing internal subnets/forwarding traffic. Unable to connect to gui.
-
Netgate just updated their twitter account with this.
Netgate
@NetgateUSAA problem has been reported by some users of the Netgate SG-3100 appliance who have upgraded to pfSense Plus version 21.02. Our engineering team is working to correct the issue as quickly as possible. In the meantime, we have suspend the upgrade for the SG-3100.
-
As a temporary workaround until we can put out a fix, you can reduce the number of CPUs used by the OS to 1 by adding
hw.ncpu=1
to/boot/loader.conf.local
and then rebooting. You'll lose some performance but it appears (so far) to not trigger the issue when set that way. Otherwise, you can step back to 2.4.5-p1 and wait for a fix. -
@jimp How do we rollback to the previous version?
-
@evaknievel Information on how to downgrade is here:
https://docs.netgate.com/pfsense/en/latest/solutions/sg-3100/reinstall-pfsense.html -
@jsnaza we are talking minimum 2 days for a fix right?
-
@evaknievel I am not sure I am just another person that ran into this issue and downgraded. I am not one of the developers. I am just sharing the knowledge I have.
-
@jimp Thank you.
What may also be helpful is to inform the user base on what the release QA process is and how this may have slipped through testing.
Infrastructure component updates are highly sensitive (as you're well aware) and thus confidence in quality is key to keeping "customers" confident in the product.
Shit happens but there should be a post-mortem to figure out how this slipped by.
-
@jimp we are talking minimum 2 days for this fix right?
-
@jimp If we make the single CPU change in 2.4.5-p1 before we install the update, will that setting persist after the initial reboot into 21.02?
-
@yaminb
Should update myself. Looks like my sg 3100 just crashed as well after about 1 day. reboot seems to have it back in order.I'll try the 1 cpu temporary fix
-
@lohphat if you have a SG-3100 don't upgrade until a fix is available. Stay on 2.4.5-p1.
-
@jimp said in Upgrade to 21.02-RELEASE borked on SG-3100:
hw.ncpu=1
Really silly question just for completeness.
I don't have a loader.conf.local.
I made the change in loader.conf. I'm assuming this is good.
/boot: cat loader.conf kern.shutdown.secure_halt=1 kern.cam.boot_delay=10000 kern.ipc.nmbclusters="1000000" hw.ncpu=1 boot_serial="YES" console="comconsole" comconsole_speed="115200" hw.e6000sw.default_disabled=1 autoboot_delay="3" hw.hn.vf_transparent="0" hw.hn.use_if_start="1"
-
Create
/boot/loader.conf.local
if it doesn't exist, asloader.conf
can be overwritten by pfSense.@lohphat said in Upgrade to 21.02-RELEASE borked on SG-3100:
what the release QA process is and how this may have slipped through testing.
Several of us have 3100s and use them in various ways, including a couple of us using them on the edge, running snapshots, but this problem takes a specific load and setup to trigger that apparently none of us hit somehow. Usually dogfooding the snapshots catches most things, but there are many more real-world configurations than we can possibly test.
-
@jimp said in Upgrade to 21.02-RELEASE borked on SG-3100:
Usually dogfooding the snapshots catches most things, but there are many more real-world configurations than we can possibly test.
I don't think my config is all that exotic, but should I share it with you all? Do you know what the issue is?
-
We found a way to reliably trigger it here in lab conditions so we can work on it, no need to provide more info at the moment.