Dns stopped working this morning?



  • Strangest thing ever? Everything was working fine last night, woke up this morning and dns was not working for any machines. Blamed it on comcast, went to work, came back and rebooted pfsense and same deal. The craziest thing about this is dns resolves fine when I do it from the pfsense box itself but not from any of the dhcp clients?
    Any ideas guys?



  • anybody? This is getting really frustrating? I've unset, and re-set pretty much every option for dns and dhcp, but for some reason none of my client machines are getting any dns resolution what so ever? this is getting ridiculous because it SHOULD work



  • I have the same issues, beginning 2 weeks ago, up to version 2.0-RC2 (i386) built on Tue Jun 7 20:52:21 EDT 2011 nanobsd on a Soekris.

    If the DNS forwarder is activated, everything is fine and DNS queries are resolved via the provider DNS servers.

    With unbound in recursive mode, after some time (can be minutes to hours), queries get sent out, but no replies come in. (see attachment for example). Already asked my provider if they are filtering/changing/throttling dns requests, the assured me they aren't.

    A restart of unbound solves the problem nearly everytime.

    When I configure unbound as forwarder the behaviour doesn't change - unbound still resolves itself, i.e. no forwarding happens at all (the config file contains no forwarding zone at all though the option is ticked in the GUI (don't know if that is known)).

    Any ideas?

    [_pfsense dns tcpdump auswahl 2.txt](/public/imported_attachments/1/_pfsense dns tcpdump auswahl 2.txt)


  • Rebel Alliance Global Moderator

    Ok Im a bit confused at where exactly you were doing those captures.

    
    18:23:06.810908 IP 192.168.0.10.47027 > 88.198.112.130.53: 50474% [1au] A? www.oneview.de. (43)
    18:23:06.811326 IP 192.168.0.10.10121 > 195.47.235.130.53: 28764% [1au] A? ns1.adminsystem.eu. (47)
    
    

    You show a private talking to public address.

    But if your pfsense box was doing the queries that is not how it would work.

    so here I captured some dns queries on the wan interface of my pfsense box

    
         1 0.000000000    24.13.xx.xx          192.42.93.30          DNS      Standard query A www.lsjdflsdf.com
          2 0.087617000    192.42.93.30          24.13.xx.xx          DNS      Standard query response, No such name
          3 6.287323000    24.13.xx.xx          192.12.94.30          DNS      Standard query A www.sljdfljsfs.net
          4 6.407314000    192.12.94.30          24.13.xx.xx          DNS      Standard query response, No such name
          5 16.452224000   24.13.xx.xx          192.35.51.30          DNS      Standard query A www.lsjdflsdfjs.com
          6 16.536029000   192.35.51.30          24.13.xx.xx          DNS      Standard query response, No such name
    
    

    I snipped out my pubic IP there - but notice its public IP to public IP - my pfsense box doing the queries via unbound.

    now I captured dns on the lan side of my pfsense box and you notice its the local boxes IP and the pfsense box local ip 192.168.1.253

    
      1 0.000000    192.168.1.4           192.168.1.253         DNS      Standard query PTR 118.127.201.41.in-addr.arpa
          2 0.332987    192.168.1.253         192.168.1.4           DNS      Standard query response, No such name
          3 1.690004    192.168.1.100         192.168.1.253         DNS      Standard query A www.billybob8.com
          4 1.690369    192.168.1.253         192.168.1.100         DNS      Standard query response, No such name
          5 4.837649    192.168.1.4           192.168.1.253         DNS      Standard query PTR 38.160.121.216.in-addr.arpa
          6 5.037194    192.168.1.253         192.168.1.4           DNS      Standard query response PTR d216-121-160-38.home3.cgocable.net
    
    

    How exactly is your pfsense box setup where you would capture traffic trying to go from a private to a public?  I would say no shit your not going to get an answer ;)

    I have been running unbound on my 2.0 pfsense box pretty much since I went to 2.0 betas, rc - and other than having to tweak it a bit to be able to do ipv6, I would have to say its worked great.  I normally update every couple of weeks or so on build, currently running

    2.0-RC2-IPv6 (i386)
    built on Fri May 27 18:34:57 EDT 2011

    And do a gitsync every few days or so between when I notice a commit that I think is worthwhile or a merge with the master and to be honest have been few issues with unbound working.  I just don't understand where you captured that traffic showing private –> public like that though??  Very odd.



  • Johnpoz,

    thanks for answering.

    I forgot to mention that a cable modem sits in front of the Soekris. The modem (Thomson) does the NAT and the Soekris is exposed as DMZ host so that it receives all incoming packets.

    The 192.168.0/24 network is between the two boxes.


  • Rebel Alliance Global Moderator

    cable "modems" don't do nat ;)

    You mean a cable gateway?  Why would you not just use it as actual "modem" and let your pfsense box do the nat?

    So is your pfsense box routing other private networks or setup as a bridge?

    I could see you having issues with getting the answer back via UDP through a nat.  Most dns is udp, so I guess that could be the problem.

    If your going to be running a recursive dns behind a nat, I would forward both udp and tcp 53 to that IP then.  Is it possible you had a forward setup on your "modem" and now your pfsense wan interface changed.. That would your issue.



  • Yeah, I know it is more than a modem. :)

    Unfortunately I can't deactivate the NAT on the Thomson, the provider doesn't allow it. 
    The IP address on the WAN side of the pfsense box is fixed, no joy there either.

    The pfsense box is a routing setup - and really everything is working, even IPSEC.

    I'm completely baffled after looking at this problem for a week or so. ???


  • Rebel Alliance Global Moderator

    Can you forward 53 to your pfsense wan IP that 192.168.0.10 address and see if that fixes it?

    If not I would suggest you do some sniffing on the outside of your nat device to verify that the dns query is going out and with the correct public IP as the source and also will allow to see if your getting answers and for whatever reason your nat device is not sending it through.

    Hub or switch with monitor/span port feature would allow you to sniff traffic on the public side of your nat device.

    Other than that you could just setup unbound to forward to your gateway for dns, and let your "modem" either forward to your ISP dns, or other public dns or have it look things up directly if it supports that feature.

    The nat outside your pfsense box does add another variable to what could be the issue.  Since your not seeing any answers to your queries it makes sense to verify the traffic is actually leaving your gateway and if your not seeing responses or you are.  If you not, its possible that your ISP is only allowing dns to leave their network only from their dns servers?

    But in general I have good success with unbound, and I generate LOTS of dns traffic due to ptr requests for p2p traffic :)  And no issues, I wish pfsense would just have built in support to run bind, but I can work with unbound - its way better than that tinydns shit ;)



  • I can't sniff the traffic after the cable modem/gateway - the outgoing side is on an antenna cable (DOCSIS 3.0 standard).

    Also the admin interface on the Thomson is severly restricted by the cable provider, can't really do any debugging.

    Although every port is forwarded to the pfsense box, nothing is listening on the WAN-side on port 53 - packets destined there would be denied by the ruleset anyhow.

    I may be able to change to a real modem (i.e. one that assigns the public IP to the Soekris). Will see what next week brings.

    I really appreciate you helping me out here and getting me to see the problem from another point of view. :)



  • The error occured 3 times during the last hour. Restart of unbound fixed it every time.

    I'm beginning to think it is either a problem w/ my unbound config or w/ unbound itself.

    Anyone else got some pointers/ideas?


  • Rebel Alliance Global Moderator

    Well I am not having any issues with unbound at all..

    And as you can see per the logs from my logs its seeing a decent amount of queries

    total.num.queries=197011

    Are you logging stats and extended stats?  Feel free to post up your config.. here is mine.

    
    #########################
    # Unbound configuration #
    #########################
    
    ###
    # Server config
    ###
    server:
    num-threads: 1
    msg-cache-slabs: 4
    rrset-cache-slabs: 4
    infra-cache-slabs: 4
    key-cache-slabs: 4
    msg-cache-size: 10m
    rrset-cache-size: 20m
    outgoing-range: 974
    #so-rcvbuf: 4m
    num-queries-per-thread: 1024
    verbosity: 1
    port: 53
    do-ip4: yes
    do-ip6: yes
    do-udp: yes
    do-tcp: yes
    do-daemonize: yes
    statistics-interval: 300
    extended-statistics: yes
    statistics-cumulative: yes
    # Interface IP(s) to bind to
    interface: 192.168.1.253
    interface: 2001:470:snipped:b85::1
    
    chroot: ""
    username: "unbound"
    directory: "/usr/local/etc/unbound"
    pidfile: "/var/run/unbound.pid"
    root-hints: "root.hints"
    harden-dnssec-stripped: yes
    harden-referral-path: no
    prefetch: yes
    prefetch-key: yes
    use-syslog: yes
    module-config: "validator iterator"
    unwanted-reply-threshold: 10000000
    auto-trust-anchor-file: /usr/local/etc/unbound/root-trust-anchor
    #### Access Control ####
    # Local attached networks allowed to utilize service and any user added ACLs
    access-control: 127.0.0.0/8 allow
    access-control: 192.168.1.0/24 allow
    access-control: 2001:470:snipped:b85::/64 allow
    #allow
    access-control: 192.168.1.0/24 allow_snoop
    access-control: 10.0.200.0/24 allow_snoop
    
    # For DNS Rebinding prevention
    private-address: 10.0.0.0/8
    private-address: 172.16.0.0/12
    private-address: 192.168.0.0/16
    private-address: 192.254.0.0/16
    # private-address: fd00::/8
    # private-address: fe80::/10
    # Set private domains in case authorative name server returns a RFC1918 IP address
    
    # Host entries
    local-zone: "local.lan" transparent
    local-data-ptr: "127.0.0.1 localhost"
    local-data: "localhost A 127.0.0.1"
    local-data: "localhost.local.lan A 127.0.0.1"
    local-data-ptr: "192.168.1.253 pfsense.local.lan"
    local-data: "pfsense.local.lan A 192.168.1.253"
    local-data: "pfsense A 192.168.1.253"
    local-data-ptr: "192.168.1.97 dvr1.local.lan"
    local-data: "dvr1.local.lan IN A 192.168.1.97"
    local-data-ptr: "192.168.1.98 dvr2.local.lan"
    local-data: "dvr2.local.lan IN A 192.168.1.98"
    local-data-ptr: "192.168.1.4 p4-28g.local.lan"
    local-data: "p4-28g.local.lan IN A 192.168.1.4"
    local-data-ptr: "192.168.1.99 pch.local.lan"
    local-data: "pch.local.lan IN A 192.168.1.99"
    local-data-ptr: "192.168.1.128 qs108t.local.lan"
    local-data: "qs108t.local.lan IN A 192.168.1.128"
    local-data-ptr: "192.168.1.100 quad-w7.local.lan"
    local-data: "quad-w7.local.lan IN A 192.168.1.100"
    local-data: 'quad-w7.local.lan TXT "quad-w7"'
    local-data-ptr: "192.168.1.50 samsung.local.lan"
    local-data: "samsung.local.lan IN A 192.168.1.50"
    local-data: 'samsung.local.lan TXT "printer"'
    local-data-ptr: "192.168.1.7 ubuntu.local.lan"
    local-data: "ubuntu.local.lan IN A 192.168.1.7"
    local-data-ptr: "192.168.1.252 wrt54g.local.lan"
    local-data: "wrt54g.local.lan IN A 192.168.1.252"
    
    # Domain overrides
    
    ###
    # Remote Control Config
    ###
    remote-control:
    control-enable: yes
    control-interface: 127.0.0.1
    control-port: 953
    server-key-file: "/usr/local/etc/unbound/unbound_server.key"
    server-cert-file: "/usr/local/etc/unbound/unbound_server.pem"
    control-key-file: "/usr/local/etc/unbound/unbound_control.key"
    control-cert-file: "/usr/local/etc/unbound/unbound_control.pem"
    
    

    I snipped out a section of my IPv6 address for privacy reasons, but other than that is my full config.



  • Thanks for posting your stats & config.

    Some of my stats, the average recursion time is bothering me (and shows the 'hiccups'):

    
    total.num.queries=565
    total.recursion.time.avg=30.053324
    total.recursion.time.median=0.0239458
    
    

    My config looks like this, 192.168.100.1 is the LAN side of pfsense:

    
    #########################
    # Unbound configuration #
    #########################
    
    ###
    # Server config
    ###
    server:
    num-threads: 1
    msg-cache-slabs: 4
    rrset-cache-slabs: 4
    infra-cache-slabs: 4
    key-cache-slabs: 4
    msg-cache-size: 20m
    rrset-cache-size: 40m
    outgoing-range: 974
    #so-rcvbuf: 4m
    num-queries-per-thread: 1024
    verbosity: 1
    port: 53
    do-ip4: yes
    do-ip6: no
    do-udp: yes
    do-tcp: yes
    do-daemonize: yes
    statistics-interval: 0
    extended-statistics: no
    statistics-cumulative: no
    # Interface IP(s) to bind to
    interface: 192.168.100.1
    
    chroot: ""
    username: "unbound"
    directory: "/usr/local/etc/unbound"
    pidfile: "/var/run/unbound.pid"
    root-hints: "root.hints"
    harden-dnssec-stripped: yes
    harden-referral-path: no
    prefetch: yes
    prefetch-key: yes
    use-syslog: yes
    module-config: "validator iterator"
    unwanted-reply-threshold: 10000000
    auto-trust-anchor-file: /usr/local/etc/unbound/root-trust-anchor
    #### Access Control ####
    # Local attached networks allowed to utilize service and any user added ACLs
    access-control: 127.0.0.0/8 allow
    access-control: 192.168.100.0/24 allow
    #InterneClients
    access-control: 192.168.100.0/24 allow
    
    # For DNS Rebinding prevention
    private-address: 10.0.0.0/8
    private-address: 172.16.0.0/12
    private-address: 192.168.0.0/16
    private-address: 192.254.0.0/16
    # private-address: fd00::/8
    # private-address: fe80::/10
    # Set private domains in case authorative name server returns a RFC1918 IP address
    
    # Host entries
    local-zone: "mein-netz.at" transparent
    local-data-ptr: "127.0.0.1 localhost"
    local-data: "localhost A 127.0.0.1"
    local-data: "localhost.mein-netz.at A 127.0.0.1"
    local-data-ptr: "192.168.100.1 fw.mein-netz.at"
    local-data: "fw.mein-netz.at A 192.168.100.1"
    local-data: "fw A 192.168.100.1"
    local-data-ptr: "192.168.100.2 gk.mein-netz.at"
    local-data: "gk.mein-netz.at IN A 192.168.100.2"
    local-data: 'gk.mein-netz.at TXT "xxx"'
    local-data-ptr: "192.168.100.3 tsp.mein-netz.at"
    local-data: "tsp.mein-netz.at IN A 192.168.100.3"
    local-data: 'tsp.mein-netz.at TXT "xxx"'
    local-data-ptr: "192.168.100.49 DS-1.mein-netz.at"
    local-data: "DS-1.mein-netz.at IN A 192.168.100.49"
    local-data: 'DS-1.mein-netz.at TXT "xxx"'
    
    # Domain overrides
    
    ###
    # Remote Control Config
    ###
    remote-control:
    control-enable: yes
    control-interface: 127.0.0.1
    control-port: 953
    server-key-file: "/usr/local/etc/unbound/unbound_server.key"
    server-cert-file: "/usr/local/etc/unbound/unbound_server.pem"
    control-key-file: "/usr/local/etc/unbound/unbound_control.key"
    control-cert-file: "/usr/local/etc/unbound/unbound_control.pem"
    
    

    I just did a diff of our respective configs, only IPs, IPv6, statistics and the static entries differ.
    Maybe I'll go and update to the current snapshot, otherwise I'm all out of ideas/things to triee.


  • Rebel Alliance Global Moderator

    Yeah your clearly having some major issues there with an average of 30 seconds?

    here are my stats
    total.num.recursivereplies=179191
    total.recursion.time.avg=1.002341
    total.recursion.time.median=0.0679367

    To be honest I don't see how it could be an issue with unbound though if your showing the queries go out, and just not getting responses back.

    Do you mind actually posting up the pcap version of the capture vs text.  So we can see if something misformed with the packet going on the wire??

    So you say it works fine for a while and then just stops working?

    BTW you are running 1.4.8 of unbound?

    ; <<>> DiG 9.7.3 <<>> @192.168.1.253 version.bind chaos txt
    ;; ANSWER SECTION:
    version.bind.           0       CH      TXT     "unbound 1.4.8"

    I show package version as
    1.4.8_131



  • Thanks again, johnpoz. Your stats look what I'd like to have! :)

    I'm running the same version as you, both unbound and the unbound package.

    I started the capture again, hopefully it will happen again tomorrow and I'll have more data.


  • Rebel Alliance Global Moderator

    So works for awhile, and then just stops - ie its working now.

    You know since your behind a nat router, could it be some form of flood protection?  Ie your "modem" sees a bunch of traffic outbound on 53 all at the same time or specific number within certain time frame all from the same IP (your routers wan IP) so it blocks that IP from sending any more traffic to that port?

    I have seen this before with some routers and p2p causing a spike in DNS traffic, and then nothing works because user can no longer do dns.

    Whats the specific "modem" you have so can lookup its manual to see if its got some kind of feature like that.

    Are you running anything that would cause spikes in your dns traffic?  I know for sure that p2p can cause a huge amount of PTR traffic, depending on the client.



  • I don't do anything really DNS query heavy, no p2p. Browsing forums sometimes leads to a real storm of queries though (inline images, etc). Good idea to check for that!

    At the moment I have a Thomson TWG 870 - and behold, there is a flood protection on the "Web Filter" subpage. I turned it off, crossing fingers now!


  • Rebel Alliance Global Moderator

    well been a few hours since your post – did that fix it?



  • Couldn't really do much testing til now, but it seems that fixed it!

    Many thanks!


  • Rebel Alliance Global Moderator

    Well thats is good news..  Looks like nothing wrong with unbound after all ;) heheh

    Not a fan of most of these soho routers at all, to be honest if I were you I would really push to just get a standard cable modem.. Can't you just buy your own?

    I just bought a SB6120 about a year ago when they came out - comcast had not updated mine in years, and was still docsis 2.  And they were charging me rent on thing every month.  So vs asking them to update it, I just bought my own and prob already paid for itself vs the per month rent.

    I just had my son by one as well vs comcast providing it - they were going to charge him $7 a month rent.. WTF, you can just buy a SB6120 for like $70, less then a year its paid for itself, etc.

    Why will they not let you put it into just bridge mode so its not doing nat, to be honest that makes no freaking sense - what does the ISP care if your box has the public IP or their device does??



  • I'm definitely going to ask for the modem next week - the combo modem/gateway just sucks.

    Unfortunately we can't just supply our own cable modems, the provider has to have access to the 'cable part' of it to configure their own settings which aren't changeable by us customers. That's life over the pond for you. ;)

    And you were right, no problem with unbound at all! :D


  • Rebel Alliance Global Moderator

    Just because you buy the modem does not mean your provider does not have access to it..  Once you register it on their network they have access..  So do they charge you rent for the device?



  • No, they don't charge anything extra at all.



  • hi… i had a same problem before, try putting a dns ip at dns entry in DHCP server (pfsense). may be your LAN client lost auto dns during pc restarted, hope this help.


  • Rebel Alliance Global Moderator

    The issue was resolved quite some time ago, and it was flood type protection on his router in front of the pfsense box..  It helps to read a thread before posting in it ;)


Locked