24.03 Stuck at Not Ready Yet.
-
from 23.09.01 -> 24.03 upgrade on 2100 now seems to be stuck at "not ready yet" screen has been well over 10 minutes after the console displays the menu.
clearly the system is "running" as I'm able to connect here without re-routing traffic
At this point I can ssh in and as stated the console displays the menu. but there is no web access, from this or another system.
would be nice if I could tell exactly what is "not ready yet" my guess is the web interface but there is no indication of anything being wrong.
no errors I can see on the console or updating system window where it sits now - just watching the 20 second retry count loop
How long do I wait ? and what are the next steps.. restart the GUI from the ssh or console?
>>> Removing unnecessary packages...done. >>> Cleanup pkg cache...done. >>> Deferring package installation scripts...done. >>> Upgrading boot code... System Configuration Architecture: arm64 Boot Devices: /dev/ada0 Boot Method: uefi Filesystem: zfs Platform: Netgate 2100 Updating boot code... /usr/local/sbin/../libexec/install-boot.sh -b uefi -d /tmp/be_mount.rUGy -f zfs -s mbr ada0 ESP /dev/ada0s1 mounted on /tmp/stand-test.gZXMRB 203824KB space remaining on ESP: renaming old bootaa64.efi file /efi/boot/bootaa64.efi /efi/boot/bootaa64-old.efi Copying loader.efi to /EFI/freebsd on ESP Copying bootaa64.efi to /efi/boot on ESP Unmounting and cleaning up temporary mount point Finished updating ESP Done. >>> Copying upgrade log...done. >>> Unmounting upgraded boot environment...done. >>> Activating default for the next boot only...done. System is going to be upgraded. Rebooting in 10 seconds. >>> Unlocking package pkg...done. Success
-
If you're running ZFS the new upgrade system does most of the work before rebooting so it should not take anywhere near 10mins to come back.
However there are some packages that try to update and need to timeout that can extend that. pfBlocker, Snort, Suricata.
If you can ssh in and see the console menu though it sounds like maybe the gui is just not available for some reason.
The upgrade screen in the gui just attempts to reconnect and if it cannot it shows 'not ready yet'. It can't display a reason for that because it has no further into at that point.
Did it eventually complete?
Steve
-
@stephenw10 said in 24.03 Stuck at Not Ready Yet.:
Did it eventually complete?
not without me giving it a good swift kick. .
the problem seems to be pfblocker and more importantly the certificate that is used when pfb is pulling some files from another local system. This was causing each of the downloads to retry and take exponentially longer than normal. (ie it likely would have competed some day, but all the downloads would have failed)
SSL certificate problem: unable to get local issuer certificate
this is odd because the certificate in question is set as "Add this certificate to the Operating Systems Trust Store" and was working fine
It is an Acme certificate (not that it should matter) for some reason the OS is no longer seeing it for its own "OS" use.
Step next is to uncheck it from the trust store reboot and re-add it.
-
Ah interesting. So that's a custom list in pfBlocker it pulls from some local server? How many lists?
That looks like something that would be solved by
certctl rehash
but that likely hasn't run yet at the point pfBlocker is running it's install script.However it would likely still fail to pull it at that point anyway because of the lack of connectivity. https://redmine.pfsense.org/issues/15396
What did you do to allow it to complete?
-
@stephenw10 said in 24.03 Stuck at Not Ready Yet.:
What did you do to allow it to complete?
allow what to complete ? the pfblocker pulls?
I just stopped it to prevent pfblocker from pulling anything at this point while I figure out the cert issue.there was nothing really wrong with the upgrade as such that I can see in the logs. just the cert issue. which I wouldn't have expected to be a problem.
-
well this is interesting -- in the pfblocker error.log
timestamped right around the time the upgrade had completed.
But also seems this has nothing to do with my certificate[1713961838] unbound[84846:0] error: Error for server-cert-file: /var/unbound/unbound_server.pem [1713961838] unbound[84846:0] error: Error in SSL_CTX use_certificate_chain_file crypto error:80000002:system library::No such file or directory [1713961838] unbound[84846:0] error: and additionally crypto error:10080002:BIO routines::system lib [1713961838] unbound[84846:0] error: and additionally crypto error:0A080002:SSL routines::system lib [1713961838] unbound[84846:0] fatal error: could not set up remote-control
edit:
I don't see any errors like those displayed in the redmine you referenced, just the ones above and they were referenced in the pfblocker error.logStrange also that now after this latest reboot (removing/re-adding the local trust and certctl rehash, I see the "Verify question" at the top of the dashboard.
Still seems to be running 24.03
-
@jrey Nothing new in those unbound errors, had and reported them already in BETA.
-
I run into the same error with my 6100 yesterday:
Unbound stopped in 1 sec. Additional mounts (DNSBL python): Starting Unbound Resolver Not completed. [1713888880] unbound[71102:0] error: Error for server-cert-file: /var/unbound/unbound_server.pem [1713888880] unbound[71102:0] error: Error in SSL_CTX use_certificate_chain_file crypto error:80000002:system library::No such file or directory [1713888880] unbound[71102:0] error: and additionally crypto error:10080002:BIO routines::system lib [1713888880] unbound[71102:0] error: and additionally crypto error:0A080002:SSL routines::system lib [1713888880] unbound[71102:0] fatal error: could not set up remote-control
Unbound is broken and couldn't be started.
Apr 23 18:19:13 php-fpm 29637 /services_unbound.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1713889153] unbound[49797:0] error: Error for server-cert-file: /var/unbound/unbound_server.pem [1713889153] unbound[49797:0] error: Error in SSL_CTX use_certificate_chain_file crypto error:80000002:system library::No such file or directory [1713889153] unbound[49797:0] error: and additionally crypto error:10080002:BIO routines::system lib [1713889153] unbound[49797:0] error: and additionally crypto error:0A080002:SSL routines::system lib [1713889153] unbound[49797:0] fatal error: could not set up remote-control' Apr 23 18:19:05 php-fpm 645 /status_services.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1713889145] unbound[27355:0] error: Error for server-cert-file: /var/unbound/unbound_server.pem [1713889145] unbound[27355:0] error: Error in SSL_CTX use_certificate_chain_file crypto error:80000002:system library::No such file or directory [1713889145] unbound[27355:0] error: and additionally crypto error:10080002:BIO routines::system lib [1713889145] unbound[27355:0] error: and additionally crypto error:0A080002:SSL routines::system lib [1713889145] unbound[27355:0] fatal error: could not set up remote-control' Apr 23 18:18:56 php-fpm 48050 /services_unbound.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1713889136] unbound[47160:0] error: Error for server-cert-file: /var/unbound/unbound_server.pem [1713889136] unbound[47160:0] error: Error in SSL_CTX use_certificate_chain_file crypto error:80000002:system library::No such file or directory [1713889136] unbound[47160:0] error: and additionally crypto error:10080002:BIO routines::system lib [1713889136] unbound[47160:0] error: and additionally crypto error:0A080002:SSL routines::system lib [1713889136] unbound[47160:0] fatal error: could not set up remote-control'
I use the boot environment to go back to 23.09.1 and install the patches.
-
@NOCling Manual reboot at that point would have resolved unbound issue, I have done it many times since BETA.
-
Thanks, but this is not at the end of the day the issue causing the problem for me.
Everything is working except the CA for the let's encrypt certificate, when adding it to the local store so pfblocker can use it when talking to another local server.
there are two certificates
- the one with CN=ISRG Root X1
- and the second with CN=R3
both created by the let's encrypt process and both working previously
if I uncheck the "add this certificate authority to the operating system trust store"
and look at the /etc/ssl "stuff" neither are there (expected)
then if I check the "add this.. " again, only the ISRG Root X1 cert shows up in the "stuff" the R3 one no longer seems to want to add to the local store.I've recreated that cert per let's encrypt info (the data is exactly the same as what was there) but it still won't go to the local store.
as a result now when pfblocker attempts to pull from the local server using that cert it now returns
curl: (60) SSL certificate problem: unable to get local issuer certificate
makes senses it is not there anymore !debugging the curl connection does indeed say the cert needed in the CA is not there
* TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS alert, unknown CA (560): * SSL certificate problem: unable to get local issuer certificate * Closing connection
the cert is valid works from other machines, and of course is not expired. Just doesn't seem to want to go to the local cert -- and therefore curl can't find it. If the R3 cert where there it would then chain up to the ISRG which is there.
-
@pfsjap
Thx, its resolved.
So simple, but I didn't think of it. -
So I can confirm that the CA GUI is not adding the certificate to the OS Trust Store. but that the Certificate is working
curl -Iv https://123.sample.com
fails
* TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (OUT), TLS alert, unknown CA (560): * SSL certificate problem: unable to get local issuer certificate * Closing connection
export the certificate from the GUI, put it in a pem file in the tmp directory
curl -Iv --cacert /tmp/lets-encrypt-r3.pem https://123.sample.com
works like a charm
* TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20):
So I guess the question is why isn't the GUI getting this to the local trust store when saved? I can see the certs directory rebuilding on save, but clearly when finished this cert is not there. Curl can't find it and neither can I
I suspect I can just drop a copy of the file in /usr/share/certs/trusted and then link the file to /etc/ssl/certs and it would fine, but that seems like something that should be happening on save?
there is nothing "fun" in logs that I can see - general log php-frm (process) just says there was a configuration change when I hit save to toggle the option on/off
Updated Certificate Authority Acmecert: O=Lets Encrypt, CN=R3, C=US
Guess I need to look at system_camanager.php to see exactly what it is (not) doing
-
Have you opened a bug for this?
-
yup is broken for sure. I fired up my 2.7.2 test virtual. (had no certs on it at all)
CA added (by cutting and pasting the values in the cert on the 24.03 (24.03_1) box into a new certificate - check the box and hit save..Immediately the hashed file name appeared in /etc/ssl/certs (the file name in green is the cert "Acmecert: O=Lets Encrypt, CN=R3, C=US" (data in the file looks fine)
and curl on the test 2.7.2 box immediately recognized it..
* TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20):
Works as expected, no issues
-
Nope - have not -- too busy trying to validate the issue and make it go ! (this is a production box that is borked, so getting it back in production is my number 1 priority at this time)
I'll likely either go back to 23.09.1 through boot environments and wait for the system to be ready for prime time.
two updates in one day (24.03 and 24.03_1) and hours of down time is actually enough for me.
Seems in order to create a bug report, I'd have to have yet another account on the bug tracker. That's not happening today.
-
@stephenw10 said in 24.03 Stuck at Not Ready Yet.:
Have you opened a bug for this?
have now,
https://redmine.pfsense.org/issues/15440 -
jimp Rebel Alliance Developer Netgatelast edited by jimp Apr 25, 2024, 4:51 PM Apr 25, 2024, 4:50 PM
Looks like something changed in
certctl
and now it wipes the directory when writing the CAs which also wipes out the CA files from the cert manager.Try this diff and see if it helps, and keep in mind that
certctl rehash
can take up to a few minutes to run, so checkps uxaww | grep certctl
and make sure it's done before you try running any commands.diff --git a/src/etc/inc/certs.inc b/src/etc/inc/certs.inc index be5a0de777..5b7735d191 100644 --- a/src/etc/inc/certs.inc +++ b/src/etc/inc/certs.inc @@ -2371,6 +2371,8 @@ function ca_setup_trust_store() { safe_mkdir($trust_store_directory); unlink_if_exists("{$trust_store_directory}/*.0"); + mwexec_bg('/usr/sbin/certctl rehash'); + foreach (config_get_path('ca', []) as $ca) { /* If the entry is invalid or is not trusted, skip it. */ if (!is_array($ca) || @@ -2382,7 +2384,6 @@ function ca_setup_trust_store() { ca_setup_capath($ca, $trust_store_directory); } - mwexec_bg('/usr/sbin/certctl rehash'); } /****f* certs/ca_setup_capath
If that doesn't help there are more options just needs some experimentation to figure out the best path forward.
EDIT: It's also worth noting that Let's Encrypt CA chains are in the OS default trust store already so any custom entries that are duplicating Let's Encrypt CAs in the chain may be conflicting with the built-in copies.
-
@jimp said in 24.03 Stuck at Not Ready Yet.:
and now it wipes the directory when writing the CAs
this patch didn't help.
I watched the /etc/ssl/certs directory repopulate and then tried - still
unable to get local issuer certificate errorWhat does work, in the short term, is to add the .pem named file into /usr/share/certs/trusted (this is the certificate that is regenerated every 60 days and the one causing the issue.)
I just exported it from the GUI CA screen, renamed the file as "Lets_Encrypt_R3.pem" and copied it to /usr/share/certs/trusted.now when the /etc/ssl/certs gets cleared and rehashed the system thinks it is part of all root certificates and it gets processed.
it will work for about 50 more days until the next renewal.
I'll leave the "patch" in place for now - let me know if there is something else you want me to try.
-
Revert the previous diff and use the System Patches package and then create an entry for
27fc5a3020fe981b7a5bc98fc9b1660e8773fc7d
to apply the fix I committed. -
@jimp said in 24.03 Stuck at Not Ready Yet.:
Revert the previous diff
Of Course revert, then I was already reading your note about the /usr/local/etc/ssl/certs directory, and was going to comment that the directory does not exist. -- but I see you have changed the reference for the path in the new commit id --
testing that as I type.. ..so I now see the file in /usr/local/etc/ssl/certs
total 5 -rw-r--r-- 1 root wheel 1856 Apr 25 14:04 8d33f237.crt
i see the /etc/ssl/certs rebuilding
and after the rehash the file is now in the mix-r--r--r-- 2 root wheel 2629 Apr 18 20:47 8cb5ee0f.0 -rw-r--r-- 2 root wheel 1856 Apr 25 14:04 8d33f237.0 < < -- -r--r--r-- 2 root wheel 4616 Apr 18 20:47 8d86cdd1.0 -r--r--r-- 1 root wheel 875 Apr 25 14:04 8d89cda1.0
Works..
i take it since the change is in certs.inc that an auto renew will be handled by this change as well ?
Thanks so much.
-
It should be handled properly any time the CA trust store gets refreshed now.
I'll get that into the system patches package in the near future as well.
-