sg-3100 upgrade failure to 21.02-p1
-
I ran the auto-update thru the web dashboard to 21.02-p1. Everything seemed to update as normal, no errors except it wouldn't recover from "waiting to reboot". It did not recover so I logged in via console and wasn't able to access the pfsense console menu without encountering errors:
Welcome to Netgate pfSense Plus 21.02-RELEASE (Patch 1)... .ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib /usr/local/lib/compat/pkg /usr/lib/engines /usr/local/lib/>>> Removing vital flag from php74... .. done. Initializing........... done. done. Updating configuration........5_to_206() #3 /etc/rc.bootup(132): convert_config() #4 {main} thrown in /etc/inc/pfsense-utils.inc on line 1471 PHP ERROR: Type: 1, File: /etc/inc/pfsense-utils.inc, Line: 1471, Message: Uncaught Error: Call to undefined function posix_kill() in /etc/inc/pfsense-utils.inc:1471 Stack trace: #0 /etc/inc/pfsense-utils.inc(1452): reload_ttys() #1 /etc/inc/upgrade_config.inc(6289): console_configure() #2 /etc/inc/config.lib.inc(482): upgrade_205_to_206() #3 /etc/rc.bootup(132): convert_config() #4 {main} thrownStarting CRON... done. Netgate pfSense Plus 21.02-RELEASE (Patch 1) arm Mon Feb 22 09:38:52 EST 2021 Bootup complete
I also tried to manually start the console via /etc/rc.initial and it's erroring out and unable to establish a WAN IP -- so no internet access:
*** Welcome to Netgate pfSense Plus 21.02-RELEASE-p1 (arm) on home *** WAN (wan) -> mvneta2 -> LAN (lan) -> mvneta1 -> LAB (opt1) -> mvneta0 -> v4: 192.168.1.1/27 Fatal error: Uncaught Error: Call to undefined function ctype_digit() in /etc/inc/util.inc:246 Stack trace: #0 /etc/inc/interfaces.inc(217): is_numericint('10') #1 /etc/inc/interfaces.inc(308): vlan_valid_tag('10') #2 /etc/inc/interfaces.inc(6062): interface_is_vlan('mvneta1.10') #3 /etc/inc/interfaces.inc(6530): get_real_interface('mvneta1.10') #4 /etc/inc/interfaces.inc(6782): find_interface_ipv6('mvneta1.10', false) #5 /etc/rc.banner(147): get_interface_ipv6('opt2') #6 {main} thrown in /etc/inc/util.inc on line 246 PHP ERROR: Type: 1, File: /etc/inc/util.inc, Line: 246, Message: Uncaught Error: Call to undefined function ctype_digit() in /etc/inc/util.inc:246 Stack trace: #0 /etc/inc/interfaces.inc(217): is_numericint('10') #1 /etc/inc/interfaces.inc(308): vlan_valid_tag('10') #2 /etc/inc/interfaces.inc(6062): interface_is_vlan('mvneta1.10') #3 /etc/inc/interfaces.inc(6530): get_real_interface('mvneta1.10') #4 /etc/inc/interfaces.inc(6782): find_interface_ipv6('mvneta1.10', false) #5 /etc/rc.banner(147): get_interface_ipv6('opt2') #6 {main} thrown 0) Logout (SSH only) 9) pfTop 1) Assign Interfaces 10) Filter Logs 2) Set interface(s) IP address 11) Restart webConfigurator 3) Reset webConfigurator password 12) PHP shell + Netgate pfSense Plus tools 4) Reset to factory defaults 13) Update from console 5) Reboot system 14) Enable Secure Shell (sshd) 6) Halt system 15) Restore recent configuration 7) Ping host 16) Restart PHP-FPM 8) Shell Enter an option:
You can also see that mvneta0 has an IP (192.168.1.1), but I can't connect via web or SSH.
I also ran check on the file system 5 times (fsck -fy /). No issues found.
Thoughts on what I can do next?
Also, I'd like to at least recover the backup files that are on the sg3100, but can't figure out how to download them to my laptop via console cable
-
Discovered a couple semi-related posts that referenced filing a support ticket, marked as "General Problem Description: Firmware Access". Create account and open a ticket here: https://go.netgate.com/support/login
They responded within an hour or so with a link to the firmware. Followed the instructions here: https://docs.netgate.com/pfsense/en/latest/solutions/sg-3100/reinstall-pfsense.html
One thing I would recommend (if you only have console access) would have been to copy the "newest" config backups on the 3100 to a different folder prior to the reinstall, as they're all wiped during the process. Luckily I had an older saved config backup.
Left console plugged in for ~10hrs and no errors have shown up, seems stable now.
Still, very disappointed that this release made its way thru to main release and still requires digging across the forum to recover. Many users have issues documented, and my own install is quite a vanilla setup (a couple VLANs, pfNG, a few firewall rules).
-
@neatneat To that end, is the SG-3100 going to be discontinued? Why is it not being tested?
-
@s0m3f00l said in sg-3100 upgrade failure to 21.02-p1:
@neatneat To that end, is the SG-3100 going to be discontinued? Why is it not being tested?
The EOL is from 1 to 3 years..
Some users, until this day, are using the sg-2440 which has been discontinued, and they are still updating it to the latest firmware.Based on this, and also based on the fact that there are several redmine bug reports being worked on, I don't think they are 'abandoning' the sg-3100..
They are working to get things fixed, and they are working hard..
I think there are at least 5 redmine bug reports being worked on.. -
The SG-3100 has a 32-bit ARM CPU. That platform has some "unique challenges" in terms of software caused by its 32-bit architecture - especially when compared to Intel platforms or aarch64 64-bit platforms.
For example, things that are currently working just fine on other pfSense hardware platforms (both generic hardware and Netgate appliances) are failing on the 32-bit ARM hardware. It's the same source code, but compiled using a different compiler. The resulting binary code is where the problems are.
-
@bmeeks said in sg-3100 upgrade failure to 21.02-p1:
The SG-3100 has a 32-bit ARM CPU. That platform has some "unique challenges" in terms of software caused by its 32-bit architecture - especially when compared to Intel platforms or aarch64 64-bit platforms.
For example, things that are currently working just fine on other pfSense hardware platforms (both generic hardware and Netgate appliances) are failing on the 32-bit ARM hardware. It's the same source code, but compiled using a different compiler. The resulting binary code is where the problems are.
Do you think this is possible to fix?
It seems to be a very complex thing to do, right? I mean, problem with the compiler doesn't sound trivial to me.. -
@mcury said in sg-3100 upgrade failure to 21.02-p1:
@bmeeks said in sg-3100 upgrade failure to 21.02-p1:
The SG-3100 has a 32-bit ARM CPU. That platform has some "unique challenges" in terms of software caused by its 32-bit architecture - especially when compared to Intel platforms or aarch64 64-bit platforms.
For example, things that are currently working just fine on other pfSense hardware platforms (both generic hardware and Netgate appliances) are failing on the 32-bit ARM hardware. It's the same source code, but compiled using a different compiler. The resulting binary code is where the problems are.
Do you think this is possible to fix?
It seems to be a very complex thing to do, right? I mean, problem with the compiler doesn't sound trivial to me..No, it's not a trivial fix. It's also something us package maintainers can't reasonably fix either. IMHO it will require a team effort similar to what was put together to find and fix the first SG-3100 bug with the network traffic stalling (I'm talking about the bug fixed by the 21.02_1 release for SG-3100).
With Snort and Suricata it's two different bugs. One is the Signal 10 unaligned access violation, and the other is something very funny happening within PHP itself that causes PHP to issue a Signal 11 Segmentation Fault. The PHP bug also appears to impact pfBlockerNG-devel. There are perhaps other impacted packages as well. For the PHP errors, the exact same PHP code runs flawlessly on Intel hardware (both Atom processors and i-series and Xeon chips as well as 64-bit ARM hardware such as aarch64). So the natural assumption to draw from that is the core PHP code is likely okay as it is the same. The difference is the hardware it runs on and the binary code produced by the compiler for that hardware.
-
@bmeeks Thanks bmeeks nice to hear your words about it
It's nice to hear that an effort team has been put together once again, they already showed that they can do it.. just like they did with 21.02p1..
I wish I could help, but unfortunately I don't have any skills in developing..
Maybe I'll downgrade to 2.4.5p1, that is always an option too.. It was working perfectly.
What I don't know is if the packages will still work in 2.4.5p1, like pfblockerng 3.0.0_15 for instance, that will be released within a few days.. Do you think it will run in 2.4.5p1? Or the new releases will only be for the 21.02?
-
@mcury said in sg-3100 upgrade failure to 21.02-p1:
@bmeeks Thanks bmeeks nice to hear your words about it
It's nice to hear that an effort team has been put together once again, they already showed that they can do it.. just like they did with 21.02p1..
I don't mean to imply a team exists for that work. I have no idea. I am not affiliated with Netgate. I am just a volunteer package creator/maintainer. I don't even know if the Snort and Suricata tickets are being actively worked. I did the initial troubleshooting, but then hit my limit.
What I don't know is if the packages will still work in 2.4.5p1, like pfblockerng 3.0.0_15 for instance, that will be released within a few days.. Do you think it will run in 2.4.5p1? Or the new releases will only be for the 21.02?
No, updated package code will not generally be backported into the older repos. The whole point of the older repo is it contains older versions of everything such as shared libraries and such used by the older packages.
So you will find whatever Snort, Suricata and pfBlockerNG-devel versions were "current" when 2.4.5 was current if you downgrade. And there will be no updates from there, unless some extraordinary circumstance occured and the pfSense team elected to backport something.
-
@bmeeks Nice, thanks again