Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Tough time with Unbound

    Scheduled Pinned Locked Moved DHCP and DNS
    23 Posts 3 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      Stewart @Gertjan
      last edited by

      @Gertjan Since I can successfully ping and navigate to the IP then the VLANs are set up properly. In the firewall it just shows up as another interface. On the switches you just untag the access VLAN and TAG the rest on the trunks. There really isn't much to it at this level.

      1 Reply Last reply Reply Quote 0
      • S
        Stewart
        last edited by

        So I came out onsite and ran some tests. Now NONE of the dig commands work! Crazy!

        top -aSH
        last pid: 16121;  load averages:  3.07,  3.91,  3.87                                                                                                                                                                 up 3+01:27:52  14:38:09
        201 processes: 7 running, 157 sleeping, 37 waiting
        CPU: 26.3% user,  0.0% nice, 30.6% system,  5.1% interrupt, 38.0% idle
        Mem: 96M Active, 103M Inact, 12K Laundry, 672M Wired, 317M Buf, 3035M Free
        Swap: 4096M Total, 513M Used, 3583M Free, 12% Inuse
        
          PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
        23941 root          102    0  6392K  2140K CPU2    2  27:51  98.63% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
           11 root          155 ki31     0K    64K RUN     0  57.3H  50.41% [idle{idle: cpu0}]
           11 root          155 ki31     0K    64K CPU3    3  55.8H  37.81% [idle{idle: cpu3}]
           11 root          155 ki31     0K    64K RUN     1  53.7H  35.18% [idle{idle: cpu1}]
        39062 unbound        44    0   100M 81476K CPU1    1   0:14  33.34% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        39062 unbound        46    0   100M 81476K umtxn   3   0:14  33.27% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        39062 unbound        45    0   100M 81476K umtxn   3   0:14  33.24% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        39062 unbound        42    0   100M 81476K umtxn   1   5:35  32.93% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        
        

        Logs are empty in the GUI. Enabled advanced logging in the GUI which caused the service to restart I suppose. Now resolution works, but only on the second try. If I try to go to, say, arstechnica.com, it won't load the first time. Wireshark shows me:
        bbe1ac8e-809c-4ebf-8e2d-89e66af89c43-image.png If I reload the page it'll come up, but slowly, sometimes.

        At the moment unbound is taking up most of the CPU:

        last pid: 26202;  load averages:  4.30,  3.64,  3.12                                                                                                                                                                 up 3+01:57:31  15:07:48
        237 processes: 7 running, 193 sleeping, 37 waiting
        CPU: 39.6% user,  0.0% nice, 29.7% system,  6.5% interrupt, 24.2% idle
        Mem: 1437M Active, 209M Inact, 12K Laundry, 745M Wired, 370M Buf, 1516M Free
        Swap: 4096M Total, 511M Used, 3585M Free, 12% Inuse
        
          PID USERNAME      PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
        23941 root           99    0  6392K  2140K CPU0    0  46:36  89.78% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
        13662 unbound        81    0 63396K 51684K CPU2    2   1:31  31.61% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        13662 unbound        52    0 63396K 51684K umtxn   3   1:32  30.94% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        13662 unbound        52    0 63396K 51684K umtxn   2   4:55  30.82% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        13662 unbound        52    0 63396K 51684K umtxn   2   1:31  29.71% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
        
        

        running clog tells me some more:

        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 63vRDCDd mod1  ns5-65.akam.net. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: mesh_run: end 153 recursion states (17 with reply, 84 detached),                                                                                                                         51 waiting replies, 28 recursion replies sent, 0 replies dropped, 0 states jostled out
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug: EDNS lookup known=0 vs=0
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: average recursion processing time 45.800638 sec
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug: serviced query UDP timeout=376 msec
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: histogram of recursion processing times
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug: inserted new pending reply id=89af
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: [25%]=1e-06 median[50%]=44.8 [75%]=72
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug: Need to send query but have no outgoing interfaces of that famil                                                                                                                        y
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info: error sending query to auth server ip6 2620:0:34::53 port 53 (len                                                                                                                         28)
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info:    1.000000    2.000000 1
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug: servselect ip4 192.31.80.30 port 53 (len 16)
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info: processQueryTargets: ns1.msft.net. A IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 70vRDCD mod1  ns-1471.awsdns-55.org. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info:   16.000000   32.000000 2
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug:    rtt=68675
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug: processQueryTargets: targetqueries 0, currentqueries 0 sentcount                                                                                                                         16
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info:   32.000000   64.000000 10
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info: DelegationPoint<msft.net.>: 4 names (0 missing), 8 addrs (4 resul                                                                                                                       t, 0 avail) cacheNS
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info:   64.000000  128.000000 8
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug:    rtt=54538
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info:   ns4.msft.net. * A AAAA
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 73vRDCD mod1  ns-1627.awsdns-11.co.uk. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: 0vRDCD mod1  ns.bahnhof.net. A IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug: servselect ip4 192.33.14.30 port 53 (len 16)
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info:   ns2.msft.net. * A AAAA
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 74vRDCD mod1  ns-1655.awsdns-14.co.uk. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: 1vRDCD mod1  ns1.azprdmig.msft.net. A IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug:    rtt=43107
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info:   ns1.msft.net. * A AAAA
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 75vRDCDd mod1  ns-1870.awsdns-41.co.uk. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: 2vRDCD mod1  ns2.bahnhof.net. A IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug: servselect ip4 192.5.6.30 port 53 (len 16)
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] info:   ns3.msft.net. * A AAAA
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 76vRDCD mod1  ns1-201.azure-dns.com. AAAA IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:3] info: 3vRDCD mod1  ns2.azprdmig.msft.net. A IN
        Feb 17 15:08:41 GateKeeper unbound: [13662:1] debug:    rtt=47876
        Feb 17 15:08:41 GateKeeper unbound: [13662:2] debug:    ip6 2620:0:34::53 port 53 (len 28)
        Feb 17 15:08:41 GateKeeper unbound: [13662:0] info: 77vRDCD mod1  ns1prod.6893.azuredns-prd.org. AAAA IN
        
        

        The "Need to send query but have no outgoing interfaces of that family" concerns me.
        Along with "error sending query to auth server"

        1 Reply Last reply Reply Quote 0
        • johnpozJ
          johnpoz LAYER 8 Global Moderator
          last edited by

          Why would unbound be using that much cpu... That is nuts..

          As to your auth error, that is ns3.msft.net, yeah its an authoritative ns.. for msft.net

          looks like you have some issue for sure.. Why don't you stop unbound.. And then restart it... you should not be using anywhere close to that amount of cpu.. All 4 threads?

          An intelligent man is sometimes forced to be drunk to spend time with his fools
          If you get confused: Listen to the Music Play
          Please don't Chat/PM me for help, unless mod related
          SG-4860 24.11 | Lab VMs 2.8, 24.11

          S 1 Reply Last reply Reply Quote 0
          • S
            Stewart @johnpoz
            last edited by Stewart

            @johnpoz I've restarted the whole firewall with no help. My laptop is currently the only thing on the network besides the 2 switches and the firewall. I don't even know where to look to troubleshoot. There's nothing in the logs. I'm currently disabling everything. Suricata and pfblockerNG are now both off.

            I do have a theory, though. Running sockstat -l | grep unbound showed 386 responses. This is a new laptop and I had uTorrent running in the background from when I was re-downloading all my updated ISOs. Once I closed it it dropped down to 15 responses and things are working normally, at least for now. Am I running out of resources?

            Edit: If that's the case then I'm concerned 1 laptop can take down the whole shebang. I've had clients with old ASAs and RV units that buckle under someone running a torrent. This is the first time I've seen it affect pfSense, if that's the case.

            1 Reply Last reply Reply Quote 0
            • johnpozJ
              johnpoz LAYER 8 Global Moderator
              last edited by johnpoz

              You had your utorrent client looking up every single IP in the swarm it was talking too - hehehe!! While yeah that can be a lot of queries... Shouldn't be too many to be honest..

              Unless your having issues with resolving - but then again PTRs for these IPs that most likely do not resolve will force time outs and retries, etc..

              Yeah trying to resolve the PTR for the hundreds if not 1000's of IPs you might talk to in p2p swarm - yeah that could be problematic...

              edit: its not the torrents.. its the DNS queries that don't resolve and then waiting for timeouts, and then retrying, etc. Not like you get back NX or something... Freaking p2p swarm is going to have a ton of shit ips everyone using vpns, etc. that don't have any ptrs set, etc. etc..

              Yeah doing 1000's of shit dns queries can be problematic ;)

              An intelligent man is sometimes forced to be drunk to spend time with his fools
              If you get confused: Listen to the Music Play
              Please don't Chat/PM me for help, unless mod related
              SG-4860 24.11 | Lab VMs 2.8, 24.11

              S 1 Reply Last reply Reply Quote 0
              • S
                Stewart @johnpoz
                last edited by

                @johnpoz If that's what it is, how do I combat it? We can't have 1 rogue laptop take down the network. I'm looking at the options in the Advanced tab and shifting them all higher, but which will actually help?

                1 Reply Last reply Reply Quote 0
                • johnpozJ
                  johnpoz LAYER 8 Global Moderator
                  last edited by johnpoz

                  No you just need to make sure your dns can handle it... Change the settings, more threads, etc.. Maybe not run dns on pfsense - run it on some other box if your pfsense box is not up to high levels of dns queries..

                  Look maybe you have some bad setup with ipv6 that your trying to use.. ??? Your average recursion time is was horrible..

                  Here is mine
                  Feb 17 07:56:26 unbound 72571:0 info: average recursion processing time 0.274979 sec

                  An intelligent man is sometimes forced to be drunk to spend time with his fools
                  If you get confused: Listen to the Music Play
                  Please don't Chat/PM me for help, unless mod related
                  SG-4860 24.11 | Lab VMs 2.8, 24.11

                  S 1 Reply Last reply Reply Quote 0
                  • S
                    Stewart @johnpoz
                    last edited by

                    @johnpoz I pushed up Advanced settings to:
                    Cache Size: 20MB
                    Outgoing TCP Buffers: 50
                    Incoming TCP Buffers: 50
                    Number of Queries per Thread: 2048
                    I then saved and opened up my uTorrent to see what happens. It's been running for a few minutes now and haven't seen any spikes.

                    clog -f /var/log/resolver.log | grep "average recursion"
                    Feb 17 16:22:37 GateKeeper unbound: [77568:0] info: average recursion processing time 0.133651 sec
                    Feb 17 16:22:37 GateKeeper unbound: [77568:0] info: average recursion processing time 0.140601 sec
                    Feb 17 16:22:37 GateKeeper unbound: [77568:0] info: average recursion processing time 0.163783 sec
                    Feb 17 16:22:37 GateKeeper unbound: [77568:0] info: average recursion processing time 0.149740 sec
                    
                    

                    Recursion time is also very quick now.

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      Stewart @Stewart
                      last edited by Stewart

                      @Stewart While I thought that was the issue, it isn't. The started plugging things in and internet isn't working. I come out and it's the same thing. Unbound periodically high cpu even with nothing plugged in. Running dig from the CLI when it doesn't work shows:

                      /root: dig @127.0.0.1 google.com A
                      
                      ; <<>> DiG 9.12.2-P1 <<>> @127.0.0.1 google.com A
                      ; (1 server found)
                      ;; global options: +cmd
                      ;; connection timed out; no servers could be reached
                      
                      

                      It comes and goes.

                      Edit: When i run clog -f on the resolver.log file I can see it just scrolling through with all the lookups. Then it just stops and no DNS resolution can occur. Then it alll starts again.

                      1 Reply Last reply Reply Quote 0
                      • S
                        Stewart
                        last edited by

                        System log after a reboot shows

                        Mar 19 10:03:54 	php-fpm 	341 	/services_unbound_advanced.php: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1584626634] unbound[3232:0] error: bind: address already in use [1584626634] unbound[3232:0] fatal error: could not open ports'
                        Mar 19 10:03:54 	dhcpleases 		kqueue error: unknown
                        Mar 19 10:09:30 	dhcpleases 		Could not deliver signal HUP to process because its pidfile (/var/run/unbound.pid) does not exist, No such process. 
                        
                        1 Reply Last reply Reply Quote 0
                        • S
                          Stewart
                          last edited by

                          @Stewart said in Tough time with Unbound:

                          /usr/local/sbin/unbound -c /var/unbound/unbound.conf

                          Removed pfBlockerNG-devel and rebooted but that didn't help. Still the same errors. Went into services and restarted unbound and it started working. Rebooted and it's working. This is the first time I haven't seen error messages. Maybe some setting in pfBlockerNG-devel?

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.