SG-3100 automatically upgraded itself from 22.05 to 23.01, now in broken state
-
The title of this post sounds bizarre, but my team and I have been struggling to come up with an explanation as to how this happened. We have close to 30 Netgate pfSense appliances of various types deployed and have been using pfSense for > 10 years. I've tried to ensure complete due diligence before posting here. Has anyone else experienced anything similar?
General symptoms:
Our monitoring alerted us that yesterday March 4 4:15PM (EST) this gateway stopped responding.
Upon investigation, I could log onto the GUI but was presented with obviously broken PHP. Logging on via SSH was successful but also showed broken PHP. We discovered the upgrade had been performed.
We 'fixed' the PHP issues, but after rebooting again have lost the GUI (nothing- no login screen, I believe still related to PHP)
Management access is restricted to our source IPs - management of the unit is not globally accessible. Besides that, there are no logged admin sign-ins. There are three of us with access and we follow pretty strict change control procedures.
We also have an automated backup process that grabs the config directory nightly using ssh keys and a read-only account. That confirmed to have run 1:30am each night- not saturday afternoon.I do have remote OOB serial console access but after spending most of my Sunday trying to recover normal access it's looking like it's going to need a re-install of 22.05, which needs an on-site visit or hands.
For reference, here's the initial PHP issue and our workaround:
Before:
[23.01-RELEASE][admin@my.pfsense]/usr/local/lib/php/build: php --version Warning: Failed loading Zend extension 'opcache.so' (tried: /usr/local/lib/php/20190902/opcache.so (Cannot open "/usr/local/lib/php/20190902/opcache.so"), /usr/local/lib/php/20190902/opcache.so.so (Cannot open "/usr/local/lib/php/20190902/opcache.so.so")) in Unknown on line 0 Warning: PHP Startup: Unable to load dynamic library 'session.so' (tried: /usr/local/lib/php/20190902/session.so (Cannot open "/usr/local/lib/php/20190902/session.so"), /usr/local/lib/php/20190902/session.so.so (Cannot open "/usr/local/lib/php/20190902/session.so.so")) in Unknown on line 0
softlink workaround:
[23.01-RELEASE][admin@my.pfsense]/usr/local/lib/php: ln -s 20210902/ 20190902 [23.01-RELEASE][admin@my.pfsense]/usr/local/lib/php: php --version PHP 8.1.11 (cli) (built: Feb 17 2023 16:24:39) (NTS) Copyright (c) The PHP Group Zend Engine v4.1.11, Copyright (c) Zend Technologies with Zend OPcache v8.1.11, Copyright (c), by Zend Technologies [23.01-RELEASE][admin@my.pfsense]/usr/local/lib/php:
After restarting PHP-FPM we could log in normally at the GUI, until it disappeared completely after the reboot.
There are other subtle ways the system seems broken and inconsistent (besides the missing web GUI), for example:
- after the above workaround uname -a showed 22.05 still
- now it shows:
FreeBSD my.pfsense 14.0-CURRENT FreeBSD 14.0-CURRENT #0 plus-RELENG_23_01-n256037-6e914874a5e: Fri Feb 10 20:27:02 UTC 2023 root@freebsd:/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/obj/armv7/W6AbNMMs/var/jenkins/workspace/pfSense-Plus-snapshots-23_01-main/sources/FreeBSD-src-plus-RELENG_23_01/arm.armv7/sys/pfSense-3100 arm
This system is configured for pfsense+ automatic configuration backups.
The upgrade logs contain content that seem consistent with a manually invoked upgrade.
Any ideas as to a possible explanation? Any more info anyone would like from the system?
-
@avr-it I’ve seen people here claim auto upgrades in the past but afaik there’s no mechanism for that and I’ve never experienced it at any clients.
Does the upgrade log show anything?
Did someone connect in and start doing things while it was upgrading? (First 15 minutes give or take) Restarting during the upgrade tends to break things.
-
@steveits I've attached the upgrade log, it doesn't seem to show anything unusual in terms of what invoked the upgrade.
I've accepted that 23.01 would break things: it was never our intention to upgrade. I've have failed over to the backup pfsense SG-1100 out there.
No-one was connected during the upgrade period. The off-line alert came in yesterday, Sat Mar 4 @ 16:15 and we started investigating this morning Sun Mar 5, which was the first interactive login in at least a few weeks.
Also I wonder how it even got 23.01 - did Netgate not pull it temporarily? Or was that only for 1100's and 2100's.. certainly the kernel reported by uname is also pretty strange.
[23.01-RELEASE][admin@my.pfsense]/cf/conf: stat upgrade_log.latest.txt 79 29826 -rw-r--r-- 1 root wheel 69688 273358 "Mar 4 16:22:33 2023" "Mar 4 16:22:33 2023" "Mar 4 16:22:33 2023" "Mar 4 16:22:33 2023" 32768 536 0 upgrade_log.latest.txt[upgrade_log.latest.txt](/assets/uploads/files/1678072786894-upgrade_log.latest.txt)
-
@avr-it They only paused 1100/2100.