DNS Resolver saturated bandwidth causing no internet



  • I have recently run into an issue where if I am downloading a file and that download is maxing out my internet speed nothing else works at all. If I try and bring up any webpage it just times out like the page doesn't exist, it's obviously not resolving the host name. I'm not using any traffic shaping so I realize things will slow down if I'm maxing my connection somewhere else but I've never seen it completely kill the connection for everything but the download.
    If I disable the Resolver and enable DNS Fowarder the problem goes away. Saturating my internet speed still allows other things to function like loading a webpage, obviously a little slower but it does still work. I have pfBlockerNG installed using DNSBL but disabled pfBlockerNG and the problem remains. I have been using the Resolver for since using DNSBL but don't ever remember noticing this issue till recently. Must have been after an update but I'm not sure which.



  • Disabling pfBlockerNG doesn't disable DNSBL, you have to disable DNSBL by itself.

    What does the system logs shows ? pfblockerng log?  What did you configure in pfblockerNG and or DNSBL?



  • Nothing telling in the logs I can see.
    I completely uninstalled pfblocker, unchecking the keep settings box. Still same problem.
    I just completely wiped and reloaded 2.3.2 from scratch. Made no config changes except for interface ip's. The problem still happens.
    pfsense was showing very high latency on the wan interface during heavy downloading. Changed the wan to a different network adapter. Same problem.

    I'm currently downloading an older 2.2.6 version to test with. Why are there no links on the pfsense website anymore for older versions? Had to do some googling to find it on a different site.



  • I'm not sure what the issue is but it seems to be related to using DNS Resolver instead of Forwarder.
    Going back to 2.2.6 didn't help anything. Again switching to DNS Forwarder seems to fix it.
    I connected my pc directly to my cable modem temporarily, bypassing pfsense. Saturated my download bandwidth and was still able to load 5 webpages simultaneously.

    Doing a continuous ping to google gives me average ~30ms response time, with bandwidth saturated that jumps to 1500-2000ms but still responds without any timeouts. With that much delay is it possible the resolver gives up before getting a response?



  • How did you configure the Resolver ? Did you disable DHCP Registration and Static DHCP ?
    Is there a custom Options line```
    server:include: /var/unbound/pfb_dnsbl.conf

    Do you see anything in the Resolver Log? You may have to restart unbound in order for it to log.
    
    Before restoring the config to a new install you should edit it and disable pfBlockerNG, DNSBL
    For unbound to start without DNSBL modification change the config unbound section
    

    <unbound><active_interface>lan,lo0</active_interface>
    <outgoing_interface>wan</outgoing_interface>
        <custom_options>c2VydmVyOmluY2x1ZGU6IC92YXIvdW5ib3VuZC9wZmJfZG5zYmwuY29uZg==</custom_options></unbound>

    to
    
    

    <unbound><active_interface>lan,lo0</active_interface>
    <outgoing_interface>wan</outgoing_interface></unbound>

    
    or in a shell run```
    touch /var/unbound/pfb_dnsbl.conf
    ```.
    
    When you installed on 2.2.6 did it install pfblockerNG ? If /var is running low, the installation might fail when downloading the MaxMind database.


  • thanks for replying.

    I didn't specify in the last post but when going back to 2.2.6 it was a totally stock install, no pfblocker or any other other packages not installed by the firewall itself and no custom options in the resolver. I had also done a complete fresh install of 2.3.2 without any packages installed. In both cases the resolver was configured just how it comes by default. I have also had dhcp reg and static reg enabled and disabled. Should it be one way or the other?

    I tested it again recreating the problem and watching the log and like you mentioned nothing showed up during that time. I then went back to the resolver and restarted the service. This is what showed up in the log. Not sure what it all means but nothing really looks out of place.

    Aug 8 20:41:14 	unbound 	25924:0 	info: start of service (unbound 1.5.9).
    Aug 8 20:41:14 	unbound 	25924:0 	notice: init module 1: iterator
    Aug 8 20:41:14 	unbound 	25924:0 	notice: init module 0: validator
    Aug 8 20:41:13 	unbound 	32784:0 	info: 64.000000 128.000000 6
    Aug 8 20:41:13 	unbound 	32784:0 	info: 32.000000 64.000000 63
    Aug 8 20:41:13 	unbound 	32784:0 	info: 16.000000 32.000000 132
    Aug 8 20:41:13 	unbound 	32784:0 	info: 8.000000 16.000000 122
    Aug 8 20:41:13 	unbound 	32784:0 	info: 4.000000 8.000000 79
    Aug 8 20:41:13 	unbound 	32784:0 	info: 2.000000 4.000000 55
    Aug 8 20:41:13 	unbound 	32784:0 	info: 1.000000 2.000000 51
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.524288 1.000000 44
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.262144 0.524288 90
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.131072 0.262144 166
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.065536 0.131072 166
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.032768 0.065536 99
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.016384 0.032768 94
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.008192 0.016384 2
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.004096 0.008192 3
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.002048 0.004096 1
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.000256 0.000512 1
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.000032 0.000064 1
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.000000 0.000001 220
    Aug 8 20:41:13 	unbound 	32784:0 	info: lower(secs) upper(secs) recursions
    Aug 8 20:41:13 	unbound 	32784:0 	info: [25%]=0.041622 median[50%]=0.218322 [75%]=6.6962
    Aug 8 20:41:13 	unbound 	32784:0 	info: histogram of recursion processing times
    Aug 8 20:41:13 	unbound 	32784:0 	info: average recursion processing time 6.040819 sec
    Aug 8 20:41:13 	unbound 	32784:0 	info: server stats for thread 1: requestlist max 58 avg 11.2079 exceeded 0 jostled 0
    Aug 8 20:41:13 	unbound 	32784:0 	info: server stats for thread 1: 2333 queries, 938 answers from cache, 1395 recursions, 0 prefetch
    Aug 8 20:41:13 	unbound 	32784:0 	info: 32.000000 64.000000 13
    Aug 8 20:41:13 	unbound 	32784:0 	info: 16.000000 32.000000 24
    Aug 8 20:41:13 	unbound 	32784:0 	info: 8.000000 16.000000 26
    Aug 8 20:41:13 	unbound 	32784:0 	info: 4.000000 8.000000 20
    Aug 8 20:41:13 	unbound 	32784:0 	info: 2.000000 4.000000 9
    Aug 8 20:41:13 	unbound 	32784:0 	info: 1.000000 2.000000 9
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.524288 1.000000 12
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.262144 0.524288 46
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.131072 0.262144 72
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.065536 0.131072 53
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.032768 0.065536 20
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.016384 0.032768 25
    Aug 8 20:41:13 	unbound 	32784:0 	info: 0.000000 0.000001 15
    Aug 8 20:41:13 	unbound 	32784:0 	info: lower(secs) upper(secs) recursions
    Aug 8 20:41:13 	unbound 	32784:0 	info: [25%]=0.0976857 median[50%]=0.238478 [75%]=3.33333
    Aug 8 20:41:13 	unbound 	32784:0 	info: histogram of recursion processing times
    Aug 8 20:41:13 	unbound 	32784:0 	info: average recursion processing time 4.845544 sec
    Aug 8 20:41:13 	unbound 	32784:0 	info: server stats for thread 0: requestlist max 44 avg 4.40407 exceeded 0 jostled 0
    Aug 8 20:41:13 	unbound 	32784:0 	info: server stats for thread 0: 442 queries, 98 answers from cache, 344 recursions, 0 prefetch
    Aug 8 20:41:13 	unbound 	32784:0 	info: service stopped (unbound 1.5.9).
    Aug 8 19:07:37 	unbound 	32784:0 	info: start of service (unbound 1.5.9). 
    


  • From this point, unbound should log when it restart (DNSBL update will restart unbound)

    About the registration :
    @BBcan177:

    Some recommendations:

    • The DNS Resolver can also be used in 'Forwardering mode'; however its best to not use this 'Forwarding mode' and keep it in 'resolver mode' as this will query the Root DNS servers for the DNS queries instead of relying on an ISPs DNS etc…

    • If you use the 'DNS Resolver Forwarder mode', only configure 'DNSSEC' if the configured DNS servers support DNSSEC. The enabling of 'DNSSEC' to harden your DNS security is highly recommended.

    • Disable the two "DHCP registrations" checkboxes, unless you really require those options.



  • So what is on the system now ? no pfblockerng installation? or did you install and removed it ?
    What does Diagnostic / System activity shows? Is the system busy?
    How much RAM, disk space? what kind of CPU?
    Anything weird in the Firewall Logs ?



  • At the moment I've restored my backup image of my original 2.3.2 install with pfblockerng and all my nat/rules since even a completely raw install without any of that made no difference. When testing with 2.2.6 and clean 2.3.2, no rules or packages were added. It was just install pfsense from flash drive set lan/wan interfaces and then test the problem.

    hardware is:
    Core2Duo 6420
    4GB RAM
    160GB HD

    Here is what System Activity shows while saturating my bandwidth:

    last pid: 31177;  load averages:  0.09,  0.04,  0.02  up 0+03:18:47    21:43:55
    158 processes: 3 running, 122 sleeping, 33 waiting
    
    Mem: 42M Active, 99M Inact, 196M Wired, 283M Buf, 3588M Free
    Swap: 8192M Total, 8192M Free
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
       11 root     155 ki31     0K    32K RUN     1 190:55  94.97% [idle{idle: cpu1}]
       11 root     155 ki31     0K    32K RUN     0 191:39  94.29% [idle{idle: cpu0}]
       12 root     -92    -     0K   528K WAIT    0   2:10   9.18% [intr{irq21: skc0 uhci}]
       12 root     -92    -     0K   528K WAIT    1   1:45   8.25% [intr{irq16: skc1 uhci}]
     7503 root      21    0   262M 31908K piperd  0   0:00   0.39% php-fpm: pool nginx (php-fpm)
       12 root     -60    -     0K   528K WAIT    0   0:43   0.10% [intr{swi4: clock}]
        0 root     -16    -     0K   192K swapin  0   0:37   0.00% [kernel{swapper}]
        5 root     -16    -     0K    16K pftm    0   0:05   0.00% [pf purge]
    74270 root      20    0   224M 33304K nanslp  0   0:03   0.00% /usr/local/bin/php -f /usr/local/pkg/pfblo
    74759 root      20    0   224M 33324K nanslp  1   0:03   0.00% /usr/local/bin/php -f /usr/local/pkg/pfblo
    25924 unbound   20    0 51036K 25616K kqread  0   0:01   0.00% /usr/local/sbin/unbound -c /var/unbound/un
       15 root     -16    -     0K    16K -       1   0:01   0.00% [rand_harvestq]
    43973 root      52   20 17000K  2560K wait    0   0:01   0.00% /bin/sh /var/db/rrd/updaterrd.sh
    85333 root      20    0 14516K  2316K select  1   0:01   0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/v
    28401 root      20    0 39136K  7204K kqread  1   0:01   0.00% nginx: worker process (nginx)
     8130 root      20    0 19108K  2376K nanslp  1   0:01   0.00% [dpinger{dpinger}]
      271 root      22    0   262M 24928K kqread  0   0:01   0.00% php-fpm: master process (/usr/local/lib/ph
    28632 root      20    0 39136K  7164K kqread  0   0:01   0.00% nginx: worker process (nginx)
    


  • You have plenty of free memory

    From the look of it, the system looks idle except for these 2 processes
    9.18% [intr{irq21: skc0 uhci}]
    8.25% [intr{irq16: skc1 uhci}]

    % looks like very high too me, maybe the slowdown is related to interrupt processing of you NIC?

    So this is with DNSBL running?

    In Diagnostics / Command Prompt execute

    ps -axwwwll | grep pfb
    

    this is was I get on my system

       0 18599     1   0  20  0   12856   4224 kqread   S     -      0:12.46 /usr/local/sbin/lighttpd_pfb -f /var/unbound/pfb_dnsbl_lighty.conf
       0 53399     1   0  20  0   38376  10324 nanslp   S     -      6:07.15 /usr/local/bin/php -f /usr/local/pkg/pfblockerng/pfblockerng.inc dnsbl
       0 99209 79568   0  22  0   10460   2084 wait     S     -      0:00.00 sh -c ps -axwwwll | grep pfb 2>&1
       0 99758 99209   0  22  0   10264   1868 piperd   S     -      0:00.00 grep pfb
    


  • Yes that was with DNSBL running.

    Here's the result of 'ps -axwwwll | grep pfb'

       0  1768     1   0  20  0  40260  6164 kqread   S     -    0:00.20 /usr/local/sbin/lighttpd_pfb -f /var/unbound/pfb_dnsbl_lighty.conf
       0 60590 98301   0  21  0  17000  2508 wait     S     -    0:00.00 sh -c ps -axwwwll | grep pfb 2>&1
       0 60988 60590   0  21  0  18740  2244 piperd   S     -    0:00.00 grep pfb
       0 74270     1   0  20  0 229352 33308 nanslp   S     -    0:03.94 /usr/local/bin/php -f /usr/local/pkg/pfblockerng/pfblockerng.inc dnsbl
       0 74759     1   0  20  0 229352 33324 nanslp   S     -    0:03.91 /usr/local/bin/php -f /usr/local/pkg/pfblockerng/pfblockerng.inc dnsbl
    

    I may have a couple old linksys or netgear cards laying around that I'll try putting in the system tomorrow and see how it responds. Right now they are gigabit D-link cards.



  • Well it's weird that you have 2 /usr/local/pkg/pfblockerng/pfblockerng.inc dnsbl running
    Try disabling DNSBL, ps should have no pfblockerng at all



  • I've kind of given up on making this work.
    I've swapped network cards twice with old linksys cards and with old 3com cards. Done complete clean install of pfSense with no added packages. The results are still the same. I have to enable forwarding otherwise DNS queries just don't work if my internet bandwidth is near saturated. However when saturated if forwarding is enabled dns works and pages will load, again slower but they still work. Hopefully the next release will have some improvement.


Log in to reply