Daily rc.update_bogons.sh results in zombie procs
-
Jep it's the fetch that is kinda "hanging":
root 92573 0.0 0.0 6368 2296 - Is 30Apr20 0:03.89 |-- /usr/sbin/cron -s root 91981 0.0 0.0 8416 2316 - I 03:01 0:00.00 | `-- cron: running job (cron) root 72702 0.0 0.0 0 0 - Z 11:52 0:00.00 | |-- <defunct> root 92534 0.0 0.0 6968 2828 - INs 03:01 0:00.00 | `-- /bin/sh /etc/rc.update_bogons.sh root 87274 0.0 0.0 9264 6536 - IN 17:13 0:00.01 | `-- /usr/bin/fetch -a -w 600 -T 30 -q -o /tmp/bogonsv6 https://files.pfsense.org/lists/fullbogons-ipv6.txt
Problems with the "files" server perhaps? I'll try running it manually...
Edit: before running the RC manually, I tried the URL per hand - browser takes ages to load, a
wget
from another pfSense instance is taking ages in "connecting to files.pfsense.org..." and times out after multiple minutes[2.5.0-DEVELOPMENT][root@mirage.....to]/root: wget https://files.pfsense.org/lists/fullbogons-ipv6.txt --2020-06-05 17:17:54-- https://files.pfsense.org/lists/fullbogons-ipv6.txt Resolving files.pfsense.org (files.pfsense.org)... 162.208.119.41, 162.208.119.40, 2607:ee80:10::119:40, ... Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.41|:443... failed: Operation timed out. Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.40|:443...
-
The zombie and the bogons update are at the same level, though. But if you kill the fetch do the others go away?
We have had some issues with the files server which we're working to resolve, but I'm not aware of it making anything hang like that repeatedly.
-
See my edit above: seems the fetch/curl/wget takes ages, falls to the next IP, etc.
[2.5.0-DEVELOPMENT][root@mirage.....to]/root: wget https://files.pfsense.org/lists/fullbogons-ipv6.txt --2020-06-05 17:17:54-- https://files.pfsense.org/lists/fullbogons-ipv6.txt Resolving files.pfsense.org (files.pfsense.org)... 162.208.119.41, 162.208.119.40, 2607:ee80:10::119:40, ... Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.41|:443... failed: Operation timed out. Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1841962 (1.8M) [text/plain] Saving to: 'fullbogons-ipv6.txt' fullbogons-ipv6.txt 1%[ ] 23.66K 5.61KB/s eta 5m 17s
That screen took around 6min until it started downloading at all - definetly not normal as normal package updates etc. are way faster and have no problems with failing to another IP?
I guess the whole process takes so long, the PHP process that started it times out or goes zombie. As this only reoccured recently that would fall in line with you having problems on the "files" server?
-
Maybe so. Though there is a problem right this moment, there wasn't one overnight. So the behavior may be different at the moment. It's already being investigated here, so hopefully resolved shortly.
-
Interesting. Download closes half way and breaks, retries and fails to reach the IP4 addresses then switch to v6, fails again and finally uses the v6 ::119:41 with success and instantly hops to ~2MB/s and loads without a hitch:
Resolving files.pfsense.org (files.pfsense.org)... 162.208.119.41, 162.208.119.40, 2607:ee80:10::119:40, ... Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.41|:443... failed: Operation timed out. Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.40|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1841962 (1.8M) [text/plain] Saving to: 'fullbogons-ipv6.txt' fullbogons-ipv6.txt 84%[=========================================> ] 1.48M 3.84KB/s in 6m 12s 2020-06-05 17:26:39 (4.09 KB/s) - Connection closed at byte 1556131. Retrying. --2020-06-05 17:26:40-- (try: 2) https://files.pfsense.org/lists/fullbogons-ipv6.txt Connecting to files.pfsense.org (files.pfsense.org)|162.208.119.40|:443... failed: Connection refused. Connecting to files.pfsense.org (files.pfsense.org)|2607:ee80:10::119:40|:443... failed: Connection refused. Connecting to files.pfsense.org (files.pfsense.org)|2607:ee80:10::119:41|:443... connected. HTTP request sent, awaiting response... 206 Partial Content Length: 1841962 (1.8M), 285831 (279K) remaining [text/plain] Saving to: 'fullbogons-ipv6.txt' fullbogons-ipv6.txt 100%[++++++++++++++++++++++++++++++++++++++++++=======>] 1.76M 496KB/s in 0.6s 2020-06-05 17:26:50 (496 KB/s) - 'fullbogons-ipv6.txt' saved [1841962/1841962]
Another download now also reaches the IPv4 of .41 - seems the 40 is a bit faulty atm? and 41 had some issues but now responds well again. But if that happened while updating the bogons via cron, that could explain the hanging fetch process with all that timeouts, failings, retries etc.
-
Ah so I was running the update process with
sh -x /etc/rc.update_bogons.sh nosleep
(otherwise it goes to sleep for minutes to hours...) and it fails immediatly with an authentication error:
+ /usr/bin/fetch -a -w 600 -T 30 -q -o /tmp/bogons https://files.pfsense.org/lists/fullbogons-ipv4.txt Certificate verification failed for /C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root 34374274104:error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/crypto/openssl/ssl/s3_clnt.c:1269: fetch: https://files.pfsense.org/lists/fullbogons-ipv4.txt: Authentication error
I'll check other systems where the download failed but I assume they could all have that problem.
Funny: the process/script doesn't go further. It won't exit and it won't skip or go away. Fetch just sits there doing nothing at all anymore.
-
Jep confirmed. Other systems (2.4.4-p3 or 2.4.5 equally) have the same problem:
[2.4.5-RELEASE][root@fwl01.....de]/root: sh -x /etc/rc.update_bogons.sh nosleep + proc_error='' + /usr/local/sbin/read_xml_tag.sh boolean system/do_not_send_uniqueid + do_not_send_uniqueid=false + [ false '!=' true ] + /usr/sbin/gnid + uniqueid=1c3a576e6ca2d88ad608 + export 'HTTP_USER_AGENT=/:1c3a576e6ca2d88ad608' + echo 'rc.update_bogons.sh is starting up.' + logger + [ nosleep '=' '' ] + echo 'rc.update_bogons.sh is beginning the update cycle.' + logger + [ -f /var/etc/bogon_custom ] + v4url=https://files.pfsense.org/lists/fullbogons-ipv4.txt + v6url=https://files.pfsense.org/lists/fullbogons-ipv6.txt + v4urlcksum=https://files.pfsense.org/lists/fullbogons-ipv4.txt.md5 + v6urlcksum=https://files.pfsense.org/lists/fullbogons-ipv6.txt.md5 + process_url /tmp/bogons https://files.pfsense.org/lists/fullbogons-ipv4.txt + local 'file=/tmp/bogons' + local 'url=https://files.pfsense.org/lists/fullbogons-ipv4.txt' + local 'filename=fullbogons-ipv4.txt' + local 'ext=txt' + /usr/bin/fetch -a -w 600 -T 30 -q -o /tmp/bogons https://files.pfsense.org/lists/fullbogons-ipv4.txt Certificate verification failed for /C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root 34374270280:error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed:/build/ce-crossbuild-245/sources/FreeBSD-src/crypto/openssl/ssl/s3_clnt.c:1269: fetch: https://files.pfsense.org/lists/fullbogons-ipv4.txt: Authentication error
fetch isn't coming back from the auth error and doesn't seem to quit/exit, the cron goes stale and the shell rc.x script goes Zombie after enough waiting.
So it seems the problem is two-fold:
- ssl auth error on files.pfsense.org - things can happen
fetch not exiting after a failure and thus blocking/zombificating the parent processes
correct: fetch is configured to retry with "-a" and has "-w 600" 10min to retry again. But it never stops retrying.
Anything to help there?
-
Well, 1 should be fixed shortly. Not sure about 2.
-
I was a bit off for 2). It seems it's the way fetch works with "-a" and "-w" with "-a" telling it to retry (seemingly infinite!) and "-w 600" makes it wait 10min for the next try. So it throws the auth failure, waits 10min to fail again, and again, and again and somewhere loosing its parent to a Zombie
Only seems that by becoming a zombie the mechanics to detect a running "bogon_update" in the script itself fail to see it still running and thus starting a new one (to become zombie, too). -
My own fix/solution, locate section and replace if commented sections match.
/etc/rc.update_bogons.sh# Set default values if not overriden v4url=${v4url:-"https://files.pfsense.org/lists/fullbogons-ipv4.txt"} v6url=${v6url:-"https://files.pfsense.org/lists/fullbogons-ipv6.txt"} v4urlcksum=${v4urlcksum:-"${v4url}.md5"} v6urlcksum=${v6urlcksum:-"${v6url}.md5"} # process_url /tmp/bogons "${v4url}" # process_url /tmp/bogonsv6 "${v6url}" rm /tmp/bogons rm /tmp/fullbogons-ipv4.txt.md5 rm /tmp/bogonsv6 rm /tmp/fullbogons-ipv6.txt.md5 curl --max-time 120 -k https://files.pfsense.org/lists/fullbogons-ipv4.txt -o /tmp/bogons curl --max-time 120 -k https://files.pfsense.org/lists/fullbogons-ipv4.txt.md5 -o /tmp/fullbogons-ipv4.txt.md5 curl --max-time 120 -k https://files.pfsense.org/lists/fullbogons-ipv6.txt -o /tmp/bogonsv6 curl --max-time 120 -k https://files.pfsense.org/lists/fullbogons-ipv6.txt.md5 -o /tmp/fullbogons-ipv6.txt.md5 if [ "$proc_error" != "" ]; then # Relaunch and sleep sh /etc/rc.update_bogons.sh & exit fi # BOGON_V4_CKSUM=`/usr/bin/fetch -T 30 -q -o - "${v4urlcksum}" | awk '{ print $4 }'` # ON_DISK_V4_CKSUM=`md5 /tmp/bogons | awk '{ print $4 }'` # BOGON_V6_CKSUM=`/usr/bin/fetch -T 30 -q -o - "${v6urlcksum}" | awk '{ print $4 }'` # ON_DISK_V6_CKSUM=`md5 /tmp/bogonsv6 | awk '{ print $4 }'` BOGON_V4_CKSUM=`cat /tmp/fullbogons-ipv4.txt.md5 | awk '{ print $4 }'` ON_DISK_V4_CKSUM=`md5 /tmp/bogons | awk '{ print $4 }'` BOGON_V6_CKSUM=`cat /tmp/fullbogons-ipv6.txt.md5 | awk '{ print $4 }'` ON_DISK_V6_CKSUM=`md5 /tmp/bogonsv6 | awk '{ print $4 }'` if [ "$BOGON_V4_CKSUM" = "$ON_DISK_V4_CKSUM" ] || [ "$BOGON_V6_CKSUM" = "$ON_DISK_V6_CKSUM" ]; then