ACME certificate PHP Fatal Error
-
Actually now there's no error displayed on the ACME Certificate page anymore. I click on Issue/Renew and after 5-6 minutes of waiting the buton switches from a checkmark to a broken link icon (I think) and nothing happens.
Looking at the general system logs I also came across this:
2023/07/20 16:06:38 [error] 14494#100378: *843 upstream timed out (60: Operation timed out) while reading response header from upstream, client: 10.40.10.10, server: , request: "POST /acme/acme_certificates.php HTTP/2.0", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "<REDACTED>", referrer: "https://<REDACTED>/acme/acme_certificates.php"
In this case 10.40.10.10 is the IP of the device I'm using to browse pfsense UI and issue the certificate, and <REDACTED> is my pfsense's FQDN
-
There will be a complete log of the process at
/tmp/acme/<cert name>/acme_issuecert.log
that you can check and see what happened in ACME. It sounds like maybe something is taking too long (usually DNS) but the log will hopefully give you a better idea about what is happening. -
Yeah, I think you're right, the issue is now completely different from before.
It seems now I'm dealing with a DNS check issue. It's trying to check the TXT record and fails somehow.
-
@IonutIT said in ACME certificate PHP Fatal Error:
It seems now I'm dealing with a DNS check issue. It's trying to check the TXT record and fails somehow.
The TXT adding was succesfull.
But a dnssleep of "20" : That's a bit to fast. Use "120".
-
The DNS-Sleep field in the ACME certificate setting is empty. I don't have any time set there.
Also, the log shows it waits 20 seconds only the first time then another 10 seconds, then another 10 seconds, does that for at least 12-14 times in a row and still fails at the end..
Also, looking at my actual DNS records it seem that the TXT string it's looking for is there about 20-30 times (probably it got added again and again every time I clicked the Issue/Renew button). So the TXT record should match instantly since it's there for weeks now.
-
Well it seems I found the issue. It was the DNS-Sleep setting.
I set that manually to 120s (ie not leaving it blank) and it magically started working. Tried clearing it, and it again failed to renew.
So something is broken with automatic DNS pooling. When you disable it (by manually inputting a value in the DNS-Sleep field) everything starts working again.
When automatic DNS pooling is enabled, it seems that acme has a curl issue that blocks stuff (see the above acme_issuecert.log.zip)
-
It depends on your local setup/environment. It works fine for me with it unset, but I have heard of others who need to set that as well. When you leave it blank it defaults to using DoH/DoT queries to cloudflare and quad9 IIRC and if your setup (or upstream) blocks those then it would constantly fail.
-
@IonutIT said in ACME certificate PHP Fatal Error:
Well it seems I found the issue. It was the DNS-Sleep setting.
Yep.
If there is no DNS-Sleep value set, you somewhat presume that DNS is avaible with the correct, updated zone data right after the update.
This might be possible if you have (your own !?) DNS domain name 'master', running also locally, and at your zone DNS domain name 'slave' also close and that syncing between the two happen "right away".
A DNS-Sleep=0, this is a special case situation, implies that DOH is used against a known public DNS resolver (cloudflare) and no classic resolving (initiated by Letenscrypt, to check the TXT records of your zone) is done.
My question always was : how should CloudFlare be aware of a change of domain name zone that fast ? If the resolving request hits one of the slaves of your domain name, it might not be synced yet ...
But now I see in - in your logs : it testing several times, waiting 10 seconds more. But surprise, DoH to CloudFlaire is a free service with no guaranteed result ;) And there was no result, even after minutes : Letsencrypt bails out => acme.sh bails out => your new fails.@IonutIT said in ACME certificate PHP Fatal Error:
(probably it got added again and again every time I clic
@IonutIT said in ACME certificate PHP Fatal Error:
(probably it got added again and again every time I clicked the Issue/Renew button)
That's not a real issue.
What's get added, into the "_acme-challenge" sub domaine, is a random known file name. This file has to contain a random, but known "number". Both the file name and number are generated by Letsencrypt, handed over to acme.sh, and acme.sh uses it to add the file using the DNS method you chose.
Done or fail, the TXT record (filename and content) are deleted afterwards. -
@Gertjan said in ACME certificate PHP Fatal Error:
A DNS-Sleep=0, this is a special case situation, implies that DOH is used against a known public DNS resolver (cloudflare) and no classic resolving (initiated by Letenscrypt, to check the TXT records of your zone) is done.
Actually, a DNS-Sleep set to empty (not 0) made the DoH/DoT Cloudflare check.
What's get added, into the "_acme-challenge" sub domaine, is a random known file name. This file has to contain a random, but known "number". Both the file name and number are generated by Letsencrypt, handed over to acme.sh, and acme.sh uses it to add the file using the DNS method you chose.
Yeah, but in my case, when DNS-Sleep was empty and the Cloudflare DoH/DoT failed, it never actually got around to removing the added TXT file. The whole script failed. So I ended up with 20-30 instances of "_ace-challenge" TXT with the same random code (it was always the same code) that was never actually removed.
Done or fail, the TXT record (filename and content) are deleted afterwards.
It seems that in this case, a fail in the script did not delete the TXT record afterwards.
As I see it, I don't understand why Cloudflare DoH/DoT fails to check my external DNS record? The master NS for that domain is hosted on Gandi. Especially since it seem to work just fine 2 months ago (last time the certificate renewed successfully on it's own)
-
@jimp said in ACME certificate PHP Fatal Error:
When you leave it blank it defaults to using DoH/DoT queries to cloudflare and quad9 IIRC
Aha ... the log tells me just that : it's the local acme.sh that is checking regularly - like some kind of 'active waiting'.
And when found, then it informs Letencrypt to do the file domain name zone TXT verification.If a local policy forbids DoH activity then 'acme.sh' will fail.