Should I be using Unbound Python mode? Is it stable?
-
@code4food23 I have sg-1100, 21.05, pfBlockerNG-devel 3.0.0_16 the only package installed.
Enabled unbound python mode some weeks ago.
This caused disk usage to increase slowly: ~4%/day.
Have reverted to unbound mode, and problem has gone away.
Many others report no problem, so no idea why I did. -
@nocling Nice. If you don't mind what exactly do I lose by disabling the DNS resolver to enable python mode?
-
@anthonys Interesting, I will have to give a try and see if I encounter any issues. My Netgate 5100 seems to be doing just fine with regular unbound, which is why I was wondering if I was missing out.
-
@code4food23 said in Should I be using Unbound Python mode? Is it stable?:
@anthonys Interesting, I will have to give a try and see if I encounter any issues. My Netgate 5100 seems to be doing just fine with regular unbound, which is why I was wondering if I was missing out.
Some installs also has a noticeable sustained disk write issue (between 100 -> 400Kb/s) even though all pfBlockerNG/Unbound logging is disabled. This level of writing is “deadly” for very small eMMC/SSD drives in the 8 to 32Gb range. A sustained rate like that will wear out a typical 8Gb eMMC in about a year.
In my case - a SG-2100 - the sutained write and the “disk filling issue” were both present.
So there are issues, but they seem to be specific to some setups.
-
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Some installs also has a noticeable sustained disk write issue (between 100 -> 400Kb/s) even though all pfBlockerNG/Unbound logging is disabled.
Do you mean the ZFS issue?
The Netgate ARM Appliances use USF.
-
@nocling said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Some installs also has a noticeable sustained disk write issue (between 100 -> 400Kb/s) even though all pfBlockerNG/Unbound logging is disabled.
Do you mean the ZFS issue?
The Netgate ARM Appliances use USF.
No, not the ZFS issue.
I have been forced to disable python mode on three seperate installs I have running because pfBlockerNG’s script interaction with python causes a sustained write to disk (UFS filesystem) of about 100 -> 400Kb/s.
This happens on all my installs, and looking at “top -m io” it’s a Unbound command that causes the IO.I have been unable to prevent it from happening in my setups, regardless of disabling all logging and so forth. It even happens with no active clients and DNS lookups being done.
So it seems to be some kind of loop caused by my pfBlockerNG config/pfSense setup.Stopping pfBlockerNG stops the writing - Stopping unbound does not, so it’s something happening within pfBlockerNG.
Disabling python mode also prevents the issue.
-
@keyser said in Should I be using Unbound Python mode? Is it stable?:
pfBlockerNG’s script interaction with python causes a sustained write to disk
How can I check this? Definitely something I'd like to keep an eye out for
-
@code4food23 said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
pfBlockerNG’s script interaction with python causes a sustained write to disk
How can I check this? Definitely something I'd like to keep an eye out for
To get a hint about your box’es average disk activity you can run the shell command: “iostat -x”
It will return the average IO since boot. If your figure is in the 0.2MB/s and up range, you need to investigate further.
Do a search here on this forum to get further details. -
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Stopping pfBlockerNG stops the writing - Stopping unbound does not, so it’s something happening within pfBlockerNG
pfBlockerNG, by itself, does nothing.
When you add some feeds, it will load them ones, and keeps them updated. That means an initial "write" of a file and (very) little afterwards.The only thing pfBlockerNG actually does, is making unbound more verbose. not by reading what unbound logs i it's log file, but by using internal functionalities it exposes by adding an "addon" (written in Python) to it.
pfBlockerNG handles upon the data it sees flowing through unbound, and handling upon it == accepting or refusing, what makes 'unbound' not really resolving the DNS request = the host name looks like to be blocked.If unbound doesn't run, you have no DNS resolution any more (this is a already bad situation). This means pfBlockerNG stops producing data. Without unbound, pfBlockerNG can't do anything
pfBlockerNG is a DNSBL tool. To make it work, it needs to have access to the DNS activity.
pfBlockerNG also makes nice stats, shart and lists so you can see what it does, now, last hours and yesterday. So it stores and keeps data.This is the graph of my disk space (140 Gb total) - the last day, week, month and year.
I'm using pfBlockerNG with python mode for the last year or so.On my network, just 10 PC and some phones / tablets, we don't tend to visit sites that need to be clocked (why would we visit sites we don't want to look at in the first place ?) so that's my pfBlockerNG does't do (== log !) much.
if your network has devices (users !) that try to visit all the sites YOU try to block, pfBlockerNG will start to log all these events. That's what you want, right ? ;)
-
@gertjan said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Stopping pfBlockerNG stops the writing - Stopping unbound does not, so it’s something happening within pfBlockerNG
pfBlockerNG, by itself, does nothing.
When you add some feeds, it will load them ones, and keeps them updated. That means an initial "write" of a file and (very) little afterwards.The only thing pfBlockerNG actually does, is making unbound more verbose. not by reading what unbound logs i it's log file, but by using internal functionalities it exposes by adding an "addon" (written in Python) to it.
pfBlockerNG handles upon the data it sees flowing through unbound, and handling upon it == accepting or refusing, what makes 'unbound' not really resolving the DNS request = the host name looks like to be blocked.If unbound doesn't run, you have no DNS resolution any more (this is a already bad situation). This means pfBlockerNG stops producing data. Without unbound, pfBlockerNG can't do anything
pfBlockerNG is a DNSBL tool. To make it work, it needs to have access to the DNS activity.
pfBlockerNG also makes nice stats, shart and lists so you can see what it does, now, last hours and yesterday. So it stores and keeps data.This is the graph of my disk space (140 Gb total) - the last day, week, month and year.
I'm using pfBlockerNG with python mode for the last year or so.On my network, just 10 PC and some phones / tablets, we don't tend to visit sites that need to be clocked (why would we visit sites we don't want to look at in the first place ?) so that's my pfBlockerNG does't do (== log !) much.
if your network has devices (users !) that try to visit all the sites YOU try to block, pfBlockerNG will start to log all these events. That's what you want, right ? ;)
Thank you for the detailed explanation. I'm well versed in which components does what, and that's why I have been posting in detail that the issue is with pfBlockerNG:
If I have no clients active at all, and I stop Unbound (No DNS available), I still have a command named "Unbound" in "top -m io" that does about 200 - 300Kb/s writes to disk. It only goes away if i stop pfBlockerNG.
-
@keyser said in Should I be using Unbound Python mode? Is it stable?:
I still have a command named "Unbound" in "top -m io" that does about 200 - 300Kb/s writes to disk. It only goes away if i stop pfBlockerNG.
That's not what I see.
When I
Then unbound isn't running any more.
This :ps ax | grep 'unbound'
should not list any 'unbound' instances.
And when unbound isn't running, the underlying pfBlockerNG python scripts isn't running any more. Which means the python scripts doesn't produce any output any more.
pfBlockerNG could stop and restart unbound as it's internal update procedure, ones or more or less per day.
What do you see when you execute :
cat /var/log/resolver.log | grep 'start'
-
@gertjan said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
I still have a command named "Unbound" in "top -m io" that does about 200 - 300Kb/s writes to disk. It only goes away if i stop pfBlockerNG.
That's not what I see.
When I
Then unbound isn't running any more.
This :ps ax | grep 'unbound'
should not list any 'unbound' instances.
And when unbound isn't running, the underlying pfBlockerNG python scripts isn't running any more. Which means the python scripts doesn't produce any output any more.
pfBlockerNG could stop and restart unbound as it's internal update procedure, ones or more or less per day.
What do you see when you execute :
cat /var/log/resolver.log | grep 'start'
I understand what you are getting at, and for some bizarre reason I can no longer replicate the issue where the writing continues even through "Unbound" is stopped.
I agree with you that the writing should stop if unbound is stopped, but I have tried two different accounts of it not happening. It would seem that has just been impatience from my side, and not giving the residual parts of the unbound shutdown time to complete.
Right now, the writing does stop if I stop unbound (just as it does if I only stop pfBlockerNG).However: The problem with excessive writing is still very much present when using "Unbound python mode".
Right now I have 2 feeds active, and I have them both set no "null blocking (no logging)". As expected I'm no longer seeing new log entries in dnsbl.log. I have disabled DNS reply logging, and no logfiles are seeing new entries apart from the odd hit on my IP block list/log.Yet, Unbound is still sustaining about 350Kb/s writing to my SSD. So in the 2 min it took me to write this reply, some 30MB of data was written. Data that is not present in any logfiles.
Sidenote: There's only 8 clients on my network in total, and they were all in sleep during this little test. -
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Yet, Unbound is still sustaining about 350Kb/s writing to my SSD. So in the 2 min it took me to write this reply, some 30MB of data was written. Data that is not present in any logfiles.
Sidenote: There's only 8 clients on my network in total, and they were all in sleep during this little test.Try to look with this command:
top -SH -o write (and after that "m")
what process is most active in writing.
Just a idea ...
-
@fireodo said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Yet, Unbound is still sustaining about 350Kb/s writing to my SSD. So in the 2 min it took me to write this reply, some 30MB of data was written. Data that is not present in any logfiles.
Sidenote: There's only 8 clients on my network in total, and they were all in sleep during this little test.Try to look with this command:
top -SH -o write (and after that "m")
what process is most active in writing.
Just a idea ...
Unbound by far. It's more or less sustained at the top with between 7 -> 60+ writes pr. refresh. At times there are two unbound lines each with their own writes - but they belong to the same PID.
Syncer is the second most popular with the occational 1 or 2 writes whenever unbound is quiet for one second.
-
@keyser Hmm, just noticed something:
1: If I enable DNS Reply logging it has a huge hit on the sustained write issue - it more or less doubles the sustained write issue.
2: With DNS reply logging active I noticed that Unbound is receiving a noticeable amount of DNS requests every second even though I have no clients.... Tracked it down and it turns out it is my Zabbix monitoring server that is requesting name resolution repeatedly for names of monitored devices (Seems Zabbix on Linux does not cache DNS resolutions by default).
With that in mind - and the writing almost exactly doubling:
Could we be looking at the python integration script is actually temporarily storing DNS replies to disk (1x write) even if DNS reply logging is disabled?
Enabling DNS reply logging would then require another write of the reply (2x write).Just a hunch.....
-
@keyser said in Should I be using Unbound Python mode? Is it stable?:
.....
So it boils down to "you do something I (we) don't".
Then, "do what we do" ;) and you'll see that unbound or unbound related traffic becomes close to nothing.My company network doesn't even show 'unbound' in the top 20 of "top -o write + m" - and most of my colleges are back from holiday and are all trying to give the impression that they work.
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Zabbix monitoring
Is that permanently doing something to gather stats ?
I'm using Munin myself, buth Munin's node script only run every 5 minutes, and never last for 30 seconds or so. -
@gertjan said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
.....
So it boils down to "you do something I (we) don't".
Then, "do what we do" ;) and you'll see that unbound or unbound related traffic becomes close to nothing.My company network doesn't even show 'unbound' in the top 20 of "top -o write + m" - and most of my colleges are back from holiday and are all trying to give the impression that they work.
I would love to if we can make this problem go away - the thing is, I don’t think I have done anything particular custom in my pfBlockerNG Setup. Is more or less vanilla/default settings in DNSBL. In IP I have the one exception of having it make ALIAS lists instead of auto rules.
This just crossed my mind: Are you perhaps using RAMdisk for /var? That could explain why your Unbound python script does not touch the disk.
@keyser said in Should I be using Unbound Python mode? Is it stable?:
Zabbix monitoring
Is that permanently doing something to gather stats ?
I'm using Munin myself, buth Munin's node script only run every 5 minutes, and never last for 30 seconds or so.Yeah its monitoring a bunch af network equipment and servers for bandwidth statistics and utilization statistics.
-
@keyser said in Should I be using Unbound Python mode? Is it stable?:
...I would love to if we can make this problem go away...
I agree.
I have a similar problem with my SG-5100. I changed from Unbound to Unbound Python mode about 3 weeks ago and my disk usage (ufs) has steadily increased from about 45% to 75% currently.
Nothing fancy in my setup: about 30 - 35 clients, a few VLANs, VPN server, and minimal packages (avahi, nut, pfBlockerng-devel, snort, traffic stats).
I am considering changing back to non-Python mode.
-
@azdeltawye said in Should I be using Unbound Python mode? Is it stable?:
@keyser said in Should I be using Unbound Python mode? Is it stable?:
...I would love to if we can make this problem go away...
I agree.
I have a similar problem with my SG-5100. I changed from Unbound to Unbound Python mode about 3 weeks ago and my disk usage (ufs) has steadily increased from about 45% to 75% currently.
Nothing fancy in my setup: about 30 - 35 clients, a few VLANs, VPN server, and minimal packages (avahi, nut, pfBlockerng-devel, snort, traffic stats).
I am considering changing back to non-Python mode.
Yeah, that’s the disk filling issue with pfBlockerNG. I have that as well.
There is definitely something fishy with the python script and how it handles data. The filling issue in my case is related to doing “DNS Reply Logging” in Unbound Python Mode.
When Unbound is under moderate DNS resolution load, the python script fails to respect the default 20.000 linies max setting for the DNS_reply log, and once its passed the 20.000 lines limit, it no longer attempts to delete the old entries, and the log file just keeps growing. Mine had some 900.000+ lines in it when my disk was almost full. Stopping pfBlockerNG and starting it again causes the script to truncate the logfile again (and regain your lost diskspace). But in my case it’s only a matter of time before DNS resultion load causes the script to miss truncation again, and you are on your way to a full disk.Disabling DNS reply logging has so far prevented me from seeing the disk running full again. But who knows if it can happen to some of the other logfiles?
-
As a little followup on my investigations, I have taken measures to “quiet” down my zabbix server so it does not flood my Unbound resolver.
Doing that had a pretty big impact on the sustained write rate which now dropped to somewhere around 150 KB/s (from around 380 KB/s).So there is no doubt the issue is somewhat related to DNS lookup activity in Unbound. The thing is - I have disabled DNS Reply logging, DNSBL block logging and so forth, and none of my logfiles are growing now. What is the Python script doing that touches/writes so much to disk when it’s not logging?
I suspect there is some temporary storing of DNS lookup data going on inside the script.
What DNSBL configuration could cause my script/python integration to store temporary data, that most other installations do not?@ 150KB/s I’m looking at 4TB write IO/Year to my 8Gb eMMc - that will wear through its endurance in about 3 years. Still too fast in my opinion.