pfBlockerNG-devel v3.0.0.10 causes Internet outage on SG-3100 at school.
-
After installing the pfSense 21.02-RELEASE-p1 (arm) on the SG-3100 Saturday, and installing pfBlockerNG-devel v3.0.0.10 2 hours later, it seemed that all was good.
However, at 12:22 today, Wednesday, Internet access using DNS stopped working. Error logs show DNS errors that started at 09:58 look like this:
resolver.log:Mar 3 09:58:19 pfsense1-crsdhs unbound[52182]: [52182:1] error: read (in tcp s): Connection refused for 199.249.112.1 port 53
resolver.log:Mar 3 09:58:19 pfsense1-crsdhs unbound[52182]: [52182:1] error: read (in tcp s): Connection refused for 199.249.112.1 port 53
resolver.log:Mar 3 09:58:25 pfsense1-crsdhs unbound[52182]: [52182:0] error: read (in tcp s): Connection refused for 199.249.112.1 port 53I was able to ping the ISP's 2 DNS servers from the SSH command line by using their IP addresses, however I was unable to ping anything from the command line by using domain names.
I attempted to stop pfBlocker by disabling it in the Firewall, pfBlockerNG, General Settings screen and then stopping the service. However, I was unable to stop the service through the GUI, so I used SSH to connect and reset the firewall from the command line.
The firewall restarted, pfBlocker remained disabled, and now everyone has access via DNS.
I have logs that I can send to BBcan177.
I will wait to reinstall a newer version of pfBlocker after I hear that the issue has been resolved.
-
One more bit of info, the unbound service periodically and automatically stopped and restarted during the outage.
Notice the message after the unbound service is stopped:
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: server stats for thread 1: 403 queries, 150 answers from cache, 253 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: server stats for thread 1: requestlist max 59 avg 9.35573 exceeded 0 jostled 0
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: average recursion processing time 0.532239 sec
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: histogram of recursion processing times
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: [25%]=0.0301511 median[50%]=0.107565 [75%]=0.39533
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: lower(secs) upper(secs) recursionsMar 3 10:01:19 pfsense1-crsdhs unbound[52182]: [52182:0] error: read (in tcp s): Connection refused for 199.249.112.1 port 53
Mar 3 10:01:19 pfsense1-crsdhs unbound[52182]: [52182:0] error: read (in tcp s): Connection refused for 199.249.112.1 port 53
Mar 3 10:03:08 pfsense1-crsdhs unbound[52182]: [52182:0] info: service stopped (unbound 1.13.1).
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: server stats for thread 0: 687 queries, 302 answers from cache, 385 recursions, 0 prefetch, 0 rejected by ip ratelimiting
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: server stats for thread 0: requestlist max 55 avg 9.43117 exceeded 0 jostled 0
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: average recursion processing time 0.479928 sec
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: histogram of recursion processing times
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: [25%]=0.029184 median[50%]=0.109773 [75%]=0.407905
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: lower(secs) upper(secs) recursions
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.000000 0.000001 35
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.000512 0.001024 1
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.001024 0.002048 1
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.008192 0.016384 15
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.016384 0.032768 56
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.032768 0.065536 43
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.065536 0.131072 60
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.131072 0.262144 44
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.262144 0.524288 58
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 0.524288 1.000000 27
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 1.000000 2.000000 14
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 2.000000 4.000000 21
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: 4.000000 8.000000 8
Mar 3 10:03:09 pfsense1-crsdhs unbound[52182]: [52182:0] info: server stats for thread 1: 403 queries, 150 answers from cache, 253 recursions, 0 prefetch, 0 rejected by ip ratelimiting -
@gregbinsd Sent you a PM
-
@bbcan177 what was the outcome for this issue? I'm seeing exactly the same in my logs and experiencing internet outage just the same as @GregBinSD
Is the answer to ...reluctantly...disable pfBlockerNG for now?
-
@ctmarsh said in pfBlockerNG-devel v3.0.0.10 causes Internet outage on SG-3100 at school.:
@bbcan177 what was the outcome for this issue? I'm seeing exactly the same in my logs and experiencing internet outage just the same as @GregBinSD
Is the answer to ...reluctantly...disable pfBlockerNG for now?I am sorry that you are having these issues. I would first suggest contacting pfSense/Netgate support about the issues with the SG3100 unit. I don't have first hand knowledge about the cause of the issues, and the steps that are being taken to resolve it.
Other options include installing a previous pfSense version, or disabling the pfB package until its been rectified.
-
CT,
In my case, the source of the problem turned out to be the pfSense 21.02-RELEASE-p1 (arm) OS update on the SG-3100.Two days after I turned off pfBlockerNG, the SG-3100 still had issues with the Unbound resolver stopping and starting with the 21.02-RELEASE-p1 (arm) OS update. The problem was not pfBlockerNG!
I put in a ticket with Netgate and they were very helpful and provided a link to the "recovery" version of 2.4.5_1, made just for the SG-3100. It took me about 3 hours to go through all of the steps, but I was able to restore the SG-3100 and restore the XML configuration.
So, the SG-3100 works perfect, now. I have yet to reinstall pfBlockerNG, since this is a production environment and I have only one SG-3100. I plan on waiting until spring break at the school so that I can install pfBlockerNG and let it run. There will probably be no issues, but I have learned to exercise an "abundance of caution" ;-) with all things after this incident!
BBcan177 got back to me immediately on a PM, and gave me some advice to watch out for SG-3100 issues with the new software.
I plan on installing the pfBlockerNG-devel version during Easter break because it is a good content management system worth testing and verifying.
-
@gregbinsd said in pfBlockerNG-devel v3.0.0.10 causes Internet outage on SG-3100 at school.:
2.4.5_1
Thanks Greg for your response and also @BBcan177 for yours. I however am "just" a home user of a Virtual pfSense box so dont have a support contract and its no way near as important as a school's firewall!
I guess i have to start again with the older version :( but before i do, i will install pfB-devel to see if that fixes the issue.
Thanks both, your responses have been excellent. and quick!
-
@ctmarsh
I didn't have a support contract either. This is something that support will routinely do when an update causes a problem with one of their products.
Google this: How to open a Netgate support ticketYou will see the top response:
To open a ticket, go to the Ticket Portal, create an account, then submit your ticket.It is really that easy if you own an SG-3100.
Good luck.
-
CT,
I'm sorry that I didn't thoroughly read your message, and I see that you are using a pfSense as a VM, something that I also do.
Netgate will support their hardware products like I said, but we are on our own when using CE software.
The best you and I can do with CE is make a backup of your existing pfSense before upgrading, and if the software update crashes and burns, then destroy the new VM and build another one based on the original version, which you should also have "backed up" by saving the ISO you originally downloaded when building the original VM. Then restore the XML config backup file and you are back in business.
With some hypervisors, you could also keep the original VM as a backup, and just turn off the crashed VM and turn on the original VM.
Too bad it didn't work out perfectly for us, but it did help us to see the value of having restoration plans in place ;-)
-
And in a VM world, don't forget about its most useful feature -- snapshots! You can create a snapshot before upgrading, and if the upgrade goes south, just rollback the snapshot in the hypervisor.
-
@bbcan177 I believe the problem is PHP. See this thread and this Redmine bug report which in turn references this likely root cause Redmine bug report.
-
@GregBinSD @CTMarsh @BBcan177 @bldnightowl
Just FYI, after upgrading to 2.5.1, I am seeing exact same issue but I am not running pfBlockerI opened a different thread here:
https://forum.netgate.com/topic/162973/upgraded-to-2-5-1-unbound-dns-stops-working -
@mods @CTMarsh @BBcan177 @bldnightowl
I wish there were some way for me to change the title of this topic. At the time I wrote the original post, pfBlocker seemed to be the culprit, but as we have all learned, it was the OS upgrade on the SG-3100.
I reverted to 2.4.5_p1 and am holding there until something positive happens with the new OS.