ACME certificate PHP Fatal Error
-
Do you have an exact set of steps to reproduce this error?
I've tried a few different things, creating a new cert from scratch, having an update fail and then retrying, etc. So far I haven't been able to make it fail here.
Before anything else, please try uninstalling and reinstalling the package to ensure the files are all completely up-to-date.
If you can find a reliable way to reproduce the error condition, please note the steps with as much detail as possible. Including any necessary starting conditions, like if a cert entry exists in the cert manager and/or in ACME, settings for the ACME cert like key length, and that sort of stuff.
-
In order to reproduce I need to actually find a way to make it work again and then a way to make it not work. I simply cannot get ANY certificate to renew, I gave up completely.
I tried clearing the config in Cert. Manager and ACME, uninstalled ACME package, reboot the router, installed ACME again, configuring everything from scratch, still no go.
The only thing to mention is that the previous setup was done on 23.01 and the renewal happened on 23.05. The renewal never worked (always with an error) and all subsequent clearing, uninstalling and reinstalling also never worked.
-
So you get the same error every time, and it's the identical error posted above?
If so, check the following:
- Look under System > Certificates, on the Certificates tab and find the entry that corresponds to the ACME entry. See what it looks like there, does it have any properties listed like other certs? If you edit it, does it have anything in the certificate and/or private key data fields?
- What settings do you have for the Private Key size/type on the ACME entry? If it is set to Custom, is there a valid key with armor lines filled in the Custom Private Key box?
- If you remove the entry from System > Certificates, on the Certificates tab and then try to issue/renew the cert from the ACME package, is the error the same?
-
I may have managed to reproduce it by manually editing out the private key from an existing entry. I still don't know how it could have ended up in that state but this diff should help if anyone can test it:
index 52665279978cb..0a31eb794b413 100644 --- a/a/usr/local/pkg/acme/acme.inc +++ b/b/usr/local/pkg/acme/acme.inc @@ -1728,12 +1728,22 @@ function & get_certificate($name) { echo "\n getCertificatePSK updating custom key"; $cert['prv'] = $certificate['keypaste']; // (already base64-encoded) } elseif ($certificate['keylength'] != 'custom') { - $res_key = openssl_pkey_get_private(base64_decode($cert['prv'])); - $key_details = openssl_pkey_get_details($res_key); - if ($key_details['type'] == OPENSSL_KEYTYPE_RSA) { - $keybits = $key_details['bits']; - } else { - $keybits = $key_details['ec']['curve_name']; + /* Default value to ensure it gets updated if missing/invalid */ + $keybits = -1; + /* Only attempt to fetch the private key if one is present. */ + if (!empty($cert['prv'])) { + $res_key = @openssl_pkey_get_private(base64_decode($cert['prv'])); + } + /* Only check the key details if the private key could be fetched */ + if ($res_key) { + $key_details = @openssl_pkey_get_details($res_key); + if (is_array($key_details)) { + if ($key_details['type'] == OPENSSL_KEYTYPE_RSA) { + $keybits = $key_details['bits']; + } else { + $keybits = $key_details['ec']['curve_name']; + } + } } if (is_numeric($certificate['keylength'])) { $newkeybits = $certificate['keylength'];
You can install the System Patches package and then create an entry for that diff to apply the fix.
-
I'm pretty sure the above change should take care of it. I also found some other problems with ACME private key handling.
I opened https://redmine.pfsense.org/issues/14592 to track this and pushed a fix, it's building now and will be available soon in ACME package v0.7.5 on Plus 23.05.1 and CE 2.7.0.
-
Thanks for this. I will test as soon as I see the v0.7.5 update pushed.
-
Well I've upgraded to v0.7.5 and unfortunately there's no change. The error is still there, I really don't know what's going on.
I cleared everything out again in Cert Manager, I've deleted everything, including the Let's Encrypt CA and intermediate CA, cleared the expired certificate that I wanted to renew. I've removed all items in ACME certs and private keys. Uninstalled the package, rebooted the router, installed the package again, created a staged account key, added a new certificate, 2048-bit RSA, I'm using DNS-Gandi LiveDNS challenge, and when I click on Issue/Renew, I get the same error.
Any other ideas on what I could try. My config is now blank, there's nothing in my Cert Manager page, pfSense itself is not back to using self-signed certificate.
I've tried using different private key lengths, RSA or ECDSA, tried staging, production, tried switching to a different domain (I have multiple domains for which I wanted to create a certificate), tried single domain, wildcard, everything I could think of. Nothing works.
To answer a few questions in your previous reply:
- Look under System > Certificates, on the Certificates tab and find the entry that corresponds to the ACME entry. See what it looks like there, does it have any properties listed like other certs? If you edit it, does it have anything in the certificate and/or private key data fields?
- What settings do you have for the Private Key size/type on the ACME entry? If it is set to Custom, is there a valid key with armor lines filled in the Custom Private Key box?
Used 384-bit ECDSA before and tried to renew like that, but I've since tried 256-bit ECDSA as well as 4096-bit RSA and 2048-bit RSA. Nothing worked
- If you remove the entry from System > Certificates, on the Certificates tab and then try to issue/renew the cert from the ACME package, is the error the same?
Yes, I now only have the ability to issue a new certificate since I've cleared everything and am trying to issue a brand new certificate. Error is the same.
-
@IonutIT The error has to be different or your package didn't actually update. At the very least the line numbers had to have changed, but even so it's not possible for the error to be identical unless you are somehow still running the old code.
What is the exact text of the error message you receive now?
-
Actually now there's no error displayed on the ACME Certificate page anymore. I click on Issue/Renew and after 5-6 minutes of waiting the buton switches from a checkmark to a broken link icon (I think) and nothing happens.
Looking at the general system logs I also came across this:
2023/07/20 16:06:38 [error] 14494#100378: *843 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.40.10.10, server: , request: "POST /acme/acme_certificates.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "<REDACTED>", referrer: "https://<REDACTED>/acme/acme_certificates.php"
In this case 10.40.10.10 is the IP of the device I'm using to browse pfsense UI and issue the certificate, and <REDACTED> is my pfsense's FQDN
-
There will be a complete log of the process at
/tmp/acme/<cert name>/acme_issuecert.log
that you can check and see what happened in ACME. It sounds like maybe something is taking too long (usually DNS) but the log will hopefully give you a better idea about what is happening. -
Yeah, I think you're right, the issue is now completely different from before.
It seems now I'm dealing with a DNS check issue. It's trying to check the TXT record and fails somehow.
-
@IonutIT said in ACME certificate PHP Fatal Error:
It seems now I'm dealing with a DNS check issue. It's trying to check the TXT record and fails somehow.
The TXT adding was succesfull.
But a dnssleep of "20" : That's a bit to fast. Use "120".
-
The DNS-Sleep field in the ACME certificate setting is empty. I don't have any time set there.
Also, the log shows it waits 20 seconds only the first time then another 10 seconds, then another 10 seconds, does that for at least 12-14 times in a row and still fails at the end..
Also, looking at my actual DNS records it seem that the TXT string it's looking for is there about 20-30 times (probably it got added again and again every time I clicked the Issue/Renew button). So the TXT record should match instantly since it's there for weeks now.
-
Well it seems I found the issue. It was the DNS-Sleep setting.
I set that manually to 120s (ie not leaving it blank) and it magically started working. Tried clearing it, and it again failed to renew.
So something is broken with automatic DNS pooling. When you disable it (by manually inputting a value in the DNS-Sleep field) everything starts working again.
When automatic DNS pooling is enabled, it seems that acme has a curl issue that blocks stuff (see the above acme_issuecert.log.zip)
-
It depends on your local setup/environment. It works fine for me with it unset, but I have heard of others who need to set that as well. When you leave it blank it defaults to using DoH/DoT queries to cloudflare and quad9 IIRC and if your setup (or upstream) blocks those then it would constantly fail.
-
@IonutIT said in ACME certificate PHP Fatal Error:
Well it seems I found the issue. It was the DNS-Sleep setting.
Yep.
If there is no DNS-Sleep value set, you somewhat presume that DNS is avaible with the correct, updated zone data right after the update.
This might be possible if you have (your own !?) DNS domain name 'master', running also locally, and at your zone DNS domain name 'slave' also close and that syncing between the two happen "right away".
A DNS-Sleep=0, this is a special case situation, implies that DOH is used against a known public DNS resolver (cloudflare) and no classic resolving (initiated by Letenscrypt, to check the TXT records of your zone) is done.
My question always was : how should CloudFlare be aware of a change of domain name zone that fast ? If the resolving request hits one of the slaves of your domain name, it might not be synced yet ...
But now I see in - in your logs : it testing several times, waiting 10 seconds more. But surprise, DoH to CloudFlaire is a free service with no guaranteed result ;) And there was no result, even after minutes : Letsencrypt bails out => acme.sh bails out => your new fails.@IonutIT said in ACME certificate PHP Fatal Error:
(probably it got added again and again every time I clic
@IonutIT said in ACME certificate PHP Fatal Error:
(probably it got added again and again every time I clicked the Issue/Renew button)
That's not a real issue.
What's get added, into the "_acme-challenge" sub domaine, is a random known file name. This file has to contain a random, but known "number". Both the file name and number are generated by Letsencrypt, handed over to acme.sh, and acme.sh uses it to add the file using the DNS method you chose.
Done or fail, the TXT record (filename and content) are deleted afterwards. -
@Gertjan said in ACME certificate PHP Fatal Error:
A DNS-Sleep=0, this is a special case situation, implies that DOH is used against a known public DNS resolver (cloudflare) and no classic resolving (initiated by Letenscrypt, to check the TXT records of your zone) is done.
Actually, a DNS-Sleep set to empty (not 0) made the DoH/DoT Cloudflare check.
What's get added, into the "_acme-challenge" sub domaine, is a random known file name. This file has to contain a random, but known "number". Both the file name and number are generated by Letsencrypt, handed over to acme.sh, and acme.sh uses it to add the file using the DNS method you chose.
Yeah, but in my case, when DNS-Sleep was empty and the Cloudflare DoH/DoT failed, it never actually got around to removing the added TXT file. The whole script failed. So I ended up with 20-30 instances of "_ace-challenge" TXT with the same random code (it was always the same code) that was never actually removed.
Done or fail, the TXT record (filename and content) are deleted afterwards.
It seems that in this case, a fail in the script did not delete the TXT record afterwards.
As I see it, I don't understand why Cloudflare DoH/DoT fails to check my external DNS record? The master NS for that domain is hosted on Gandi. Especially since it seem to work just fine 2 months ago (last time the certificate renewed successfully on it's own)
-
@jimp said in ACME certificate PHP Fatal Error:
When you leave it blank it defaults to using DoH/DoT queries to cloudflare and quad9 IIRC
Aha ... the log tells me just that : it's the local acme.sh that is checking regularly - like some kind of 'active waiting'.
And when found, then it informs Letencrypt to do the file domain name zone TXT verification.If a local policy forbids DoH activity then 'acme.sh' will fail.