Unbound TCP buffer settings not sticky
-
Forum access is with a PC other than what has access to the router logs so it would take me a bit to transfer files around and post here. For now here is the SNMP; http://pastebin.com/embed_js.php?i=kTkuRnzX I hope I binpasted correctly using a hyperlink. As mentioned, a manual restart would do nothing. The service would remain unstarted.
A quick look via the GUI edit file I find php_errors.txt is empty even though the SNMP shows some. The router was rebooted early AM by cron. The unbound log only has CLOG and some odd symbol after the word. The system log has plenty and isn't cooperative by remote login. Large file I suppose. May have to post the system log this evening.
-
Forgot to mention installed packages. Installed and running fine for a few months other than Unbound are;
-
Cron
-
PhpSysInfo
-
Service Watchdog
-
SipProxD
-
Unbound
-
-
Just grab Winscp and use same credentials as ssh. Just like using a FTP server. Unbound should be complaining somewhere about something.
-
Oh it complained all right. I take it you didn't look at the SNMP file. First it installed, then ran normally for a bit, then started to lag on the median time, then started with PHP errors, warning about needing an increase in open files. Then finally that there was no such file or directory. So in short, the wheels eventually fell off. Never seen anything like this. I presume this slow failure occured as more demand hit the unbound package. I didn't get up at 4AM to see if Unbound would start again after the reboot. Cleaning up the XML required that I repopulate the Unbound settings via the GUI. I matched what was in the backup XML so it should have the same configuration as before the upgrade. The SNMP reports that version 1.4.21 installed, not 1.4.23. Presume the version reported isn't an issue.
-
check the unbound-control script and look where it is trying to start the binary from. Is the path correct? What is the path really?
I am going to be flying for some IT work tonight so I won't be around for a bit maybe! Darn this labor market.
-
What happens if you execute "/usr/pbi/unbound-amd64/sbin/unbound-control start" from the command line?
Do you get a warning on 'too many file descriptors requested'?The second issue is that there are two watchdog services running. The Unbound package has its own watchdog (log lines with Unbound_Alarm) called unbound_monitor.sh. So that starts it up Unbound and then so does the watchdog package. Hence why you seeing the "bind: address already in use" error message.
You should see unbound-1.4.21_1-amd64 using 'pbi_info'. Doesnt make a difference with _X bit, however I have bumped that on the package servers to avoid confusion in the future. Dont reinstall just yet as the package builders still need to build and create the updated version.
-
Thanks guys for all your help on this, much appreciated!
Bryan, have a safe flight. I find;
/usr/local/sbin/unbound-control
/usr/pbi/unbound-amd64/sbin/unbound-control
/usr/pbi/unbound-amd64/.sbin/unbound-controlPresume it's the …amd64/sbin...
Within this I have;
unbound
unbound-anchor
unbound-checkconf
unbound-control
unbound-control-setup
unbound-hostNot sure which file is the script. The unbound-control file is not readable. The unbound-control-setup has a line;
DESTDIR=/usr/pbi/unbound-amd64/etc/unboundWagonza,
I didn't know Unbound had its own watchdog, explains a lot. Perhaps certain package choices could be removed from the Wachdog service app list for those that will cause a conflict. If I execute "unbound-control start" I imagine Unbound will halt due to the fact the DNS forwarder is currently enabled and functioning in it's place. The GUI barks if fwd'r is enabled when unbound is enabled. This box is in active service so I have to be careful not to execute anything that will dump states or crash services. Can you assure nothing unexpected will happen except a possible warning?Thanks again,
Mark -
Ah ok so you got dnsmasq running again. No worries running that command will cause an error indicating it would not be able to bind to port 53 as its in use.
Other than that it wont cause any harm.Btw all those binaries you found are correct so nothing looks out of the ordinary.
As for the Unbound watchdog it should probably be removed and leave it up to the Watchdog service to handle. However since Unbound is going into 2.2 a different approach will be looked at.
-
What happens if you execute "/usr/pbi/unbound-amd64/sbin/unbound-control start" from the command line?
Do you get a warning on 'too many file descriptors requested'?Wagonza, I get this;
[1393946424] unbound[43135:0] warning: increased limit(open files) from 11095 to 16418
[1393946424] unbound[43135:0] error: bind: address already in use
[1393946424] unbound[43135:0] fatal error: could not open portsPresume the bind error is because the Dns Forwarder is in use. I can follow up on that by rechecking for open ports when I reattempt to start the Unbound pkg again. But what about the increase open files limit? Never seen this error message before.
-
The limit warning is because the normal process resource limits are being extended by unbound further than system has set them to as default. It has done so successfully.
This is driven by outgoing range and number queries. Though you're unbound is trying to take a ton of resources. 16000+ Try stripping out your custom outgoing range and number query options maybe. If you Google a bit your limit increase is oddball. Might have to do with multiple lines being the same.
Are these settings matching your other stand alone?
If so how much ram does it use? Do you see warnings?
What is your stand alone ulimit set to?You could try to remove the duplicate default settings in the script that generates the unbound configuration file. It may not handle duplicates well and is why we see strange issues.
-
Thanks Bryan for the explaination. I have 8GB ram serving ~200 PC users. An outgoing range of 900 on a single thread works fine on a stand-alone Unbound service in a 2G P4 box w/ 2G ram. The GUI doesn't allow setting the outgoing range but does so automatically at the 8192 number, as I recall. Setting a custom outgoing range in the Unbound Options causes the service to not start, or at least it did. This was my original concern when I first tried the Unbound package last year using my script from the P4, unbound has limited tuning as a package.
If it's automatic then perhaps I shouldn't be concerned. I'll plan an early AM restart of Unbound without a separate watchdog service active and see if the wheels stay on. The only other post-upgrade change made was incoming/outgoing buffers from 10 to 0 now that they reflect in the XML.
-
@Markn62 follow @bryan.paradis' advice. You can try reduce the number of buffers it uses and other resources.
The outgoing-range is a bit high and that value is automatically calculated based on Unbound docs.
So possibly the maths is wrong….ermm nope for some reason it has the value 8192 hard coded o_0.
Will fix that
-
Thanks Bryan for the explaination. I have 8GB ram serving ~200 PC users. An outgoing range of 900 on a single thread works fine on a stand-alone Unbound service in a 2G P4 box w/ 2G ram. The GUI doesn't allow setting the outgoing range but does so automatically at the 8192 number, as I recall. Setting a custom outgoing range in the Unbound Options causes the service to not start, or at least it did. This was my original concern when I first tried the Unbound package last year using my script from the P4, unbound has limited tuning as a package.
If it's automatic then perhaps I shouldn't be concerned. I'll plan an early AM restart of Unbound without a separate watchdog service active and see if the wheels stay on. The only other post-upgrade change made was incoming/outgoing buffers from 10 to 0 now that they reflect in the XML.
Unbound works well for others with 8192 default.
Unbound not working for you.
Try running without changing outgoing.If that works having dupliccate of that setting in the conf file may be the issue. Remedy d be to edit the. Inc
-
Mark - i need to go through a few things again and will update the package. So look for an update sometime to tomorrow.
There was a problem where Unbound wasnt compiled with Libevent and the value of 8192 would not work for those environments.
Afair it was fixed about 3 versions ago.Ill also add the outgoing-range to the advanced section - pop me a message with any other options you would want to see in the GUI.
-
Mark - i need to go through a few things again and will update the package. So look for an update sometime to tomorrow.
There was a problem where Unbound wasnt compiled with Libevent and the value of 8192 would not work for those environments.
Afair it was fixed about 3 versions ago.Ill also add the outgoing-range to the advanced section - pop me a message with any other options you would want to see in the GUI.
It might be the duplicate lines in the configuration from the hardcoded + custom ones.
Mark you should pastebin your resolver.log so we can have a look. It should have all the unbound error information.
-
Could you give us the output of
limits
-
Mark - i need to go through a few things again and will update the package. So look for an update sometime to tomorrow.
Ill also add the outgoing-range to the advanced section - pop me a message with any other options you would want to see in the GUI.
Wogonza,
The following are some of the customs I've needed in other non-pkg'd unbound services. Outgoing has been addressed. PfSense appears to calculate threads and slabs based on the number of cores detected, but some may want to only use one core in a multi-core. Not sure how Unbound pkg currently determines num-queries-per-thread. This may be a good candidate for options detuning. I don't see how ip6 can be set to no. Unbound doesn't seem to take it from elsewhere that IPv6 has been disabled. So either it should or needs to be added to custom. The do-tcp:no is now handled by a value of zero in outgoing-num-tcp and incoming-num-tcp. Private address I believe is handled in the GUI by the third tab "ACL", doing this from memory best I can. And forwards are already in custom. So-rcv/sndbuf is rem'd out in the script and no GUI entry for them. Not sure why they get put in the script and not used.So the asterisk items I offer for your consideration as the most needed ones.
*outgoing-range:
num-threads:
msg-cache-slabs:
rrset-cache-slabs:
infra-cache-slabs:
key-cache-slabs:
*num-queries-per-thread:*do-ip4:
*do-ip6:
*do-udp:
*do-tcp:private-address:
forward-zone:
name:
forward-addr:*so-rcvbuf:
*so-sndbuf: -
Mark you should pastebin your resolver.log so we can have a look. It should have all the unbound error information.
Bryan, The resolver.log only has a few lines of DnsForwarder outputs in it currently. From what I've seen the SNMP log, already pastebinned, has everything the resolver file had in it. Otherwise, I'll have to get this to you after I try restarting Unbound. This may be after Wagonza updates the package unless it's needed sooner.
-
Could you give us the output of limits
Bryan, were you asking this of me? I tried logging into PfSense with WinScp, no go. Tried logging on as root instead of admin using same password and also no go. Using SCP port 22. Ftp and Tftp didn't help. Finally copy/pasted from SSH session.
Limits reports;
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 33554432 kB
stacksize 524288 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 11095
sbsize infinity bytes
vmemoryuse infinity kB
pseudo-terminals infinity
swapuse infinity kB -
Mark you should pastebin your resolver.log so we can have a look. It should have all the unbound error information.
Bryan, The resolver.log only has a few lines of DnsForwarder outputs in it currently. From what I've seen the SNMP log, already pastebinned, has everything the resolver file had in it. Otherwise, I'll have to get this to you after I try restarting Unbound. This may be after Wagonza updates the package unless it's needed sooner.
I get stuff like this in my resolver.log after unbound is installed. Looks about the same so I suppose you are right but I was getting parse errors and other things when my service wouldn't start.
Mar 4 22:12:59 unbound: [24172:0] info: service stopped (unbound 1.4.21). Mar 4 22:12:59 unbound: [24172:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch Mar 4 22:12:59 unbound: [24172:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Mar 4 22:16:22 unbound: [44092:0] notice: init module 0: iterator Mar 4 22:16:22 unbound: [44092:0] info: start of service (unbound 1.4.21). Mar 4 22:18:20 unbound: [44092:0] info: service stopped (unbound 1.4.21). Mar 4 22:18:20 unbound: [44092:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch Mar 4 22:18:20 unbound: [44092:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Mar 4 22:18:26 unbound: [88348:0] notice: init module 0: iterator Mar 4 22:18:26 unbound: [88348:0] info: start of service (unbound 1.4.21). Mar 4 22:19:10 unbound: [88348:0] info: service stopped (unbound 1.4.21). Mar 4 22:19:10 unbound: [88348:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch Mar 4 22:19:10 unbound: [88348:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Mar 4 22:19:13 unbound: [12458:0] notice: init module 0: iterator Mar 4 22:19:13 unbound: [12458:0] info: start of service (unbound 1.4.21). Mar 4 22:19:13 unbound: [12458:0] info: 192.168.55.3 pfsense.localdomain. A IN Mar 4 22:19:13 unbound: [12458:0] info: 192.168.55.3 pfsense.localdomain.dev.localdomain. A IN Mar 4 22:19:17 unbound: [12458:0] info: 192.168.55.3 0.pfsense.pool.ntp.org. AAAA IN Mar 4 22:19:54 unbound: [12458:0] info: service stopped (unbound 1.4.21). Mar 4 22:19:54 unbound: [12458:0] info: server stats for thread 0: 3 queries, 0 answers from cache, 3 recursions, 0 prefetch Mar 4 22:19:54 unbound: [12458:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Mar 4 22:19:54 unbound: [12458:0] info: average recursion processing time 0.241851 sec Mar 4 22:19:54 unbound: [12458:0] info: histogram of recursion processing times Mar 4 22:19:54 unbound: [12458:0] info: [25%]=0 median[50%]=0 [75%]=0 Mar 4 22:19:54 unbound: [12458:0] info: lower(secs) upper(secs) recursions Mar 4 22:19:54 unbound: [12458:0] info: 0.065536 0.131072 1 Mar 4 22:19:54 unbound: [12458:0] info: 0.131072 0.262144 1 Mar 4 22:19:54 unbound: [12458:0] info: 0.262144 0.524288 1 Mar 4 22:20:01 unbound: [44628:0] notice: init module 0: iterator Mar 4 22:20:01 unbound: [44628:0] info: start of service (unbound 1.4.21). Mar 4 22:24:02 unbound: [44628:0] info: service stopped (unbound 1.4.21). Mar 4 22:24:02 unbound: [44628:0] info: server stats for thread 0: 0 queries, 0 answers from cache, 0 recursions, 0 prefetch Mar 4 22:24:02 unbound: [44628:0] info: server stats for thread 0: requestlist max 0 avg 0 exceeded 0 jostled 0 Mar 4 22:24:08 unbound: [15496:0] notice: init module 0: iterator Mar 4 22:24:08 unbound: [15496:0] info: start of service (unbound 1.4.21).
Could you give us the output of limits
Bryan, were you asking this of me? I tried logging into PfSense with WinScp, no go. Tried logging on as root instead of admin using same password and also no go. Using SCP port 22. Ftp and Tftp didn't help.
Winscp should just work the same as SSH. Just use defaults and root then password.
Finally copy/pasted from SSH session.
Limits reports;
Resource limits (current):
cputime infinity secs
filesize infinity kB
datasize 33554432 kB
stacksize 524288 kB
coredumpsize infinity kB
memoryuse infinity kB
memorylocked infinity kB
maxprocesses 5547
openfiles 11095
sbsize infinity bytes
vmemoryuse infinity kB
pseudo-terminals infinity
swapuse infinity kBYes exactly just ssh in and run the command. As you can see unbound is adjusting your openfiles. What is the output of this
sysctl -a | grep file
I can't reproduce unbound wanting to increase openfiles. Either way it looks like it is normal if it thinks it is going to run out. Are you willing to paste your whole unbound.conf? Guess it is probably not there still?
Can you also give us an full output of:
ps aux