Upgrade to 21.02-RELEASE borked on SG-3100
All, in case any of you have this problem, my SG-3100 got borked during the upgrade. The upgrade appeared to work perfectly, but internet connectivity was not working. I initially believed that the issue was DNS, since the service was no longer running, but even after reconfiguring DNS and renewing my leases, I was able to resolve addresses without issue but no internet access worked.
Reboots failed, reloading firewall rules failed, even after making some minor changes to be sure a new set was loaded.
I was able to SSH into the device and from there I could access all subnets and ping internet resources, but nothing in my LAN was able to speak to anything but the gateway device.
After a few hours of troubleshooting, I ran the pfSense General Setup Wizard again. After it was completed, everything just came back to life.
I have no idea if that was the fix, or if something coincidentally started working, but if you are stuck like I was, try the General Setup under "System".
pfSense, please fix this in future versions. I am terrified to upgrade my device. This is the first time I've done it and I thought my device was bricked.
Yah I just had a similar issue. My upgrade went fine earlier, but a few minutes ago my internet completely went down, and the only way to fix it is to hard restart the device. My devices all show no internet access. I cant even load the web UI.
I reported this issue back when I was testing 2.5, and it appears its still there in 21.06.
All interfaces still show ipv4/ipv6 addresses. It feels like routes are falling out of the routing table completely.
I just did the update (like an idiot) on the SG-3100 and everything went fine as @rsherwood_va and @behemyth both said but after a few mins the GUI was unresponsive, I couldn't ping it and everything died. I had to hard reset it a few times and it seems to be stable now for about ~10 mins. If it dies again, I'll try the General Setup Wizard or probably just request the 2.4.5 image from Netgate and roll it back. I have to work tomorrow so I'm not too impressed right now.
Exactly the Issue Im having on my SG-3100 yesterday after the update. Got some weird logs which looking unrelated.
Everything was looking "normal" I can ping internet address from the Diagnose>Ping but not from LAN or any VLAN. Seems like all packages coming from the interfaces are suddenly discarded without any log / notice.
Port error and collision counters still 0, package counter is increasing so it "physically" receive packages but then it seems to drop all of them. Only thing what helps is a reboot.
Posted some more Infos here:
I have the same problem after the upgrade.
I have reboot many time and I have the problem yet.
During 10mn arrount the system is ok and after these is a crash and the DHCP lost all and change all the ip address on the different devices...
When I can connect on the PFSense dashboard, I can see all the parameters are ok like in the previous version...
I don't understand.
I will try to degrese the version.
Just a quick morning update. It’s been 6 hrs and still stable, I don’t know how but I’ll take it. I will request 2.4.5p1 image today just in case.
Another update, dead after 7 hrs. Another hard reset and request for 2.4.5p1 download link. Will roll back until this is resolved.
Another fail here on a 2.45 to 21.02 upgrade.
Looked like the upgrade worked but failed to reboot.
Waited for over 30 minutes and tried a hard power cycle and ... nothing just 3 blue flashing lights.
I couldn't take it anymore, I have to work so I did a downgrade and reinstalled my config. All good on 2.4.5p1 again!
Thanks, I'm trying to submit a ticket right now.
For what it's worth, I updated my SG-3100 this afternoon and everything went smooth.
The only issue was that for maybe 5 minutes, it was giving an error on checking for the update server or something like that. It went away on it's own.
I've upgraded my SG-3100 yesterday evening. I never had a respons from the SG-3100 anymore.
This morning i saw three blue leds slowly blinking in sync.
I pulled the power plug. The firewall restarted and finnished the setup. Till now, everything works with only one remark. Snort doesn't startup anymore. Tonight i'm gonna remove the snort package and do a clean install of it.
Well, I was up and running on 21.02 for about 3 hours, when the network went down again.
The SG-3100 had locked up. (this after a clean install no PFblocker-ng installed or SNORT)
Power Cycle brought everything back online but for how long?
We shall see.
Error during upgrade to 21.02
pkg-static: https://files01.netgate.com/pkg/pfSense_plus-v21_02_armv7-pfSense_plus-v21_02/All/libdaemon-0.14_1.txz: No route to host Failed
Did a DNS look up on files01.netgate.com = 184.108.40.206
Looked in the firewall log for any blockage and there was only allows. Nothing blocking
I started a pack capture on the Netgate SG-3100 (unlimited full frame) and re-ran the update and stopped the trace after it failed.
I searched for the IP address and this is what I see:
This look like netgate is blocking the traffic. Maybe we are hitting a limit or something. Before this point I was able to have full on two way TCP/TSL traffic with this IP.
Same here SG-3100 upgraded to 21.02-RELEASE (arm)
After a while throughout today the GUI was unresponsive, both ethernet AND wireless clients are unable to even ping pfsense router ip address, all LED's look normal and after powering off the router and it comes back all is working 3-4 times today already.
Does anyone know how to force a downgrade back to 2.4?
@sabennett Having similar problems with a 3100. Thanks for opening a ticket.
@softcoder Information on a downgrade is here:
I have the same issue with my SG-3100:
-upgraded fine to 21.02 and all look good at first then it hang/froze no DHCP no DNS and no response from the GUI the only way to get it back was to unplug it and plug it back, then it froze again after 1 hr then again after 3 hrs (just normal usage) there is no info on the System Logs that points to what hang it just register the bootup
I will downgrade too.
All this Update makes me a bit frustrated. It's one point why I choose to buy original netgate appliances instead of build your own. I hoped at least for their own build Products they do a very intense test before rolling out a backup.
No Question you cant forseen everything but if you read through the forum how many have trouble it seems like a bad release / quality control.
Hope they will find the issue and take some learnings.
For my upgraded testing SG-3100 it's not important, it is a non live LAB system but until this one is rock solid stable I will not start rolling out CPE Upgrades :)
@solarizde My system has now been running for 21h after failing about 10x at random 5m-6h intervals. Probably not related, but the only difference is that USB console is connected for debug purposes...
Yesterday, I have re-installedmy device SG3100 with the firmware 21.02.
I have contact the support and them wend me the image to flash the device.
I have doing the configuration. During 2 hours no problem but after the system has explose and all the configuration has down like the other time....
I will try to downgrade the version...
Same problem here, SG-3100 (21.02) runs for anything up to a few hours and then just locks up completely. Hard reset required, logs do not yield any clues at this stage. I've only got two packages installed PfBlockerNG and BandwidthD. Will try another clean install but not very hopeful as others have found this has not solved the problem.
Also just wondering if this problem is primarily occurring with the SG-3100.
I have two devices available. First experienced problems right away, the other remote install just crashed/became unresponsible after ~1.5 day.
After power cycle openvpn now works, but stopped routing internal subnets/forwarding traffic. Unable to connect to gui.
Netgate just updated their twitter account with this.
A problem has been reported by some users of the Netgate SG-3100 appliance who have upgraded to pfSense Plus version 21.02. Our engineering team is working to correct the issue as quickly as possible. In the meantime, we have suspend the upgrade for the SG-3100.
As a temporary workaround until we can put out a fix, you can reduce the number of CPUs used by the OS to 1 by adding
/boot/loader.conf.localand then rebooting. You'll lose some performance but it appears (so far) to not trigger the issue when set that way. Otherwise, you can step back to 2.4.5-p1 and wait for a fix.
@jimp How do we rollback to the previous version?
@evaknievel Information on how to downgrade is here:
@jsnaza we are talking minimum 2 days for a fix right?
@evaknievel I am not sure I am just another person that ran into this issue and downgraded. I am not one of the developers. I am just sharing the knowledge I have.
lohphat last edited by lohphat
@jimp Thank you.
What may also be helpful is to inform the user base on what the release QA process is and how this may have slipped through testing.
Infrastructure component updates are highly sensitive (as you're well aware) and thus confidence in quality is key to keeping "customers" confident in the product.
Shit happens but there should be a post-mortem to figure out how this slipped by.
@jimp we are talking minimum 2 days for this fix right?
lohphat last edited by
@jimp If we make the single CPU change in 2.4.5-p1 before we install the update, will that setting persist after the initial reboot into 21.02?
Should update myself. Looks like my sg 3100 just crashed as well after about 1 day. reboot seems to have it back in order.
I'll try the 1 cpu temporary fix
ahking19 last edited by
@lohphat if you have a SG-3100 don't upgrade until a fix is available. Stay on 2.4.5-p1.
Really silly question just for completeness.
I don't have a loader.conf.local.
I made the change in loader.conf. I'm assuming this is good.
/boot: cat loader.conf kern.shutdown.secure_halt=1 kern.cam.boot_delay=10000 kern.ipc.nmbclusters="1000000" hw.ncpu=1 boot_serial="YES" console="comconsole" comconsole_speed="115200" hw.e6000sw.default_disabled=1 autoboot_delay="3" hw.hn.vf_transparent="0" hw.hn.use_if_start="1"
/boot/loader.conf.localif it doesn't exist, as
loader.confcan be overwritten by pfSense.
what the release QA process is and how this may have slipped through testing.
Several of us have 3100s and use them in various ways, including a couple of us using them on the edge, running snapshots, but this problem takes a specific load and setup to trigger that apparently none of us hit somehow. Usually dogfooding the snapshots catches most things, but there are many more real-world configurations than we can possibly test.
We found a way to reliably trigger it here in lab conditions so we can work on it, no need to provide more info at the moment.
lohphat last edited by
@ahking19 Clearly. I'm usually don't mind being a testing guinea pig but not this week, too much going on.
Yesterday dune the update and you can tell, also for me not good.
My SG-3100 is stuck and keeps hanging in to A boot loop....
With the serial connected, I can see that the Marvell U-Boot is OK.
When it start to boot pfsense i got this error.
So need to create A ticket for reinstalling pfsense.
So my network needs to rely on mu ubnt ER-8 pro.