circuit bouncing and DNS



  • Hello, so now that everyone is home from school and work from home Comcast is having some real capacity issue in my area. With that being said it takes down my WAN link when that happens DNS traffic seems to bind up. I reset unbound and DHCP services but I cannot get traffic to work over FQDN/URL until I give it a full reboot.

    I have also cleared ARP, reset states, reset unbound.

    I can ping out the WAN via IP but DNS seems to be bust. I have made a number of changes to WAN DNS servers and gateway monitor IP's (wish you should monitor an FQDN vs an IP)

    I have done wiresharks to see and it looks like the DNS is request hit the firewall but do not forward out.

    Does anyone know of a less invasive way to get traffic back up?



  • Hi,

    The last 30 or 40 lines from the log Status > System > Logs > System > DNS Resolver ?

    @ryno5514 said in circuit bouncing and DNS:

    I have made a number of changes to WAN DNS

    None are needed. Why use someone else ? A resolver like unbound can tap into, the original Internet sources, the root servers, the list is build into unbound.



  • This is what is showing in the log outside my Aliases attempting to load
    Apr 9 07:23:46 filterdns with my host list all of them look normal
    Apr 9 07:23:37 filterdns Adding host
    Apr 9 07:23:42 unbound 1629:0 info: generate keytag query _ta-4f66. NULL IN
    Apr 9 07:23:40 unbound 1629:0 info: start of service (unbound 1.9.6).
    Apr 9 07:23:40 unbound 1629:0 notice: init module 1: iterator
    Apr 9 07:23:40 unbound 1629:0 notice: init module 0: validator
    Apr 9 07:23:37 filterdns merge_config: configuration reload
    As for the DNS changes that was suggested by others to try seems like a normal attempt step to try and trouble shoot.



  • That's all ?

    You said you :

    @ryno5514 said in circuit bouncing and DNS:

    I reset unbound

    and nothing shows up in the unbound log ..... that's alarming.

    This is what I see when I restart Unbound :

    Apr 9 16:04:44 	unbound 	72602:0 	info: start of service (unbound 1.9.6).
    Apr 9 16:04:44 	unbound 	72602:0 	notice: init module 2: iterator
    Apr 9 16:04:44 	unbound 	72602:0 	notice: init module 1: validator
    Apr 9 16:04:44 	unbound 	72602:0 	info: pythonmod: aaaa script loaded
    Apr 9 16:04:44 	unbound 	72602:0 	notice: init module 0: python
    Apr 9 16:04:41 	unbound 	18315:0 	info: 1.000000 2.000000 7
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.524288 1.000000 47
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.262144 0.524288 200
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.131072 0.262144 231
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.065536 0.131072 206
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.032768 0.065536 109
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.016384 0.032768 78
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.008192 0.016384 2
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.004096 0.008192 1
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.002048 0.004096 1
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.000000 0.000001 193
    Apr 9 16:04:41 	unbound 	18315:0 	info: lower(secs) upper(secs) recursions
    Apr 9 16:04:41 	unbound 	18315:0 	info: [25%]=0.0314552 median[50%]=0.11437 [75%]=0.253775
    Apr 9 16:04:41 	unbound 	18315:0 	info: histogram of recursion processing times
    Apr 9 16:04:41 	unbound 	18315:0 	info: average recursion processing time 0.170475 sec
    Apr 9 16:04:41 	unbound 	18315:0 	info: server stats for thread 1: requestlist max 30 avg 1.69591 exceeded 0 jostled 0
    Apr 9 16:04:41 	unbound 	18315:0 	info: server stats for thread 1: 12834 queries, 11759 answers from cache, 1075 recursions, 2447 prefetch, 0 rejected by ip ratelimiting
    Apr 9 16:04:41 	unbound 	18315:0 	info: 16.000000 32.000000 3
    Apr 9 16:04:41 	unbound 	18315:0 	info: 8.000000 16.000000 3
    Apr 9 16:04:41 	unbound 	18315:0 	info: 4.000000 8.000000 1
    Apr 9 16:04:41 	unbound 	18315:0 	info: 2.000000 4.000000 2
    Apr 9 16:04:41 	unbound 	18315:0 	info: 1.000000 2.000000 11
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.524288 1.000000 74
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.262144 0.524288 437
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.131072 0.262144 576
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.065536 0.131072 504
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.032768 0.065536 408
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.016384 0.032768 196
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.008192 0.016384 14
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.004096 0.008192 11
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.002048 0.004096 2
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.001024 0.002048 1
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.000512 0.001024 2
    Apr 9 16:04:41 	unbound 	18315:0 	info: 0.000000 0.000001 869
    Apr 9 16:04:41 	unbound 	18315:0 	info: lower(secs) upper(secs) recursions
    Apr 9 16:04:41 	unbound 	18315:0 	info: [25%]=8.95857e-07 median[50%]=0.0725577 [75%]=0.205824
    Apr 9 16:04:41 	unbound 	18315:0 	info: histogram of recursion processing times
    Apr 9 16:04:41 	unbound 	18315:0 	info: average recursion processing time 0.163470 sec
    Apr 9 16:04:41 	unbound 	18315:0 	info: server stats for thread 0: requestlist max 44 avg 3.13454 exceeded 0 jostled 0
    Apr 9 16:04:41 	unbound 	18315:0 	info: server stats for thread 0: 41905 queries, 38791 answers from cache, 3114 recursions, 5530 prefetch, 0 rejected by ip ratelimiting
    Apr 9 16:04:41 	unbound 	18315:0 	info: service stopped (unbound 1.9.6).
    


  • Here is the details of it in the failed status.

    0f238c0b-b21c-4d98-9d65-73ab9480ee85-image.png



  • So, it needs more then 80 seconds to start ....
    You've been feeding unbound with pfBlocherNG food ?



  • Nope I am not using PfBlocker at all, I am only doing Aliases to route filter traffic (this goes to VPN, this goes to default route and this does not get any route unless its this time). I only have issues when Comcast started having capacity issues and it bounces my connection.

    Everything works fine until the comcast link bounces and it requires me to reboot.

    And mind you, my IP phone starts working and my comcast roku app, I can route traffic via IP but no URL/FQDN's.



  • @ryno5514 said in circuit bouncing and DNS:

    And mind you, my IP phone starts working and my comcast roku app, I can route traffic via IP but no URL/FQDN's.

    Be aware : traffic is always using IP. But, if you use an URL/FQDN somewhere, and it isn't in the local DNS device cache, the upstream DNS (pfSEnse) is asked for it. If pfSense has no clue, it will look it up for you even further upstream, by resolving or forwarding the request.
    If unbound (DNS on pfSense) isn't answering, the program or process on device will wait, and eventually time out.

    I advise you to figure out why it takes so long for unbound to start. On a Netgate SG1100, or a device like mine, an Intel processor from 2007 it takes only a second or two, three.

    If the WAN isn't a aviable, this will delay of the startup of unbound.
    If unbound restarts, and WAN isn't a viable, all traffic will stop.
    Also, take one day of unbound logs. How many times does it start in a day ? Ones in an hour ? Ones per day ?

    Keep in mind :
    595871fa-9626-49ef-a4ee-48a0ff70f95a-image.png

    Every new DHCP lease will restart unbound if the first check is set. Prefer Static DHCP leaqses where ever you can = the second cehck.

    Also :

    @ryno5514 said in circuit bouncing and DNS:

    and it looks like the DNS is request hit the firewall

    What firewall ? Incoming, on LAN type interfaces ? If so, these are your rules.
    When DNS requested made it 'into' pfSEnse, there is no firewall any more (except for floating rules - these are rarely used.).

    Check your Status > Monitoring graphs.
    Select Quality for the left axis, and your WAN (or WAN_DHCP, or WAN_VPNxx). Can you see drop outs = bad Internet connection ?



  • I am aware that in the end all traffic is IP traffic, I statement is simply to point out there is something wrong with the Firewalls ability to do DNS.

    I have an CPU Type: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
    Current: 3200 MHz, Max: 3201 MHz
    4 CPUs: 1 package(s) x 4 core(s)
    AES-NI CPU Crypto: Yes (active)
    With 8GB of Ram

    I dont think its a resource issue, like I said it only happens after the Comcast circuit bounces. The WAN interface will get large packet loss, go down come back up and some select trafffic will route but not any other traffic until I hard reboot.

    I have made no changes to the units config the only change is comcast is not having capacity issues in the area now.

    I statement of " and it looks like the DNS is request hit the firewall" is simply to state the device seems to not know how to handle DNS requests.

    If check that box and the host I changes on an aliases will that impact that route?



  • @ryno5514 said in circuit bouncing and DNS:

    The WAN interface will get large packet loss, go down come back u

    The WAN actually goes down ?

    When the dpinger test scores to many ping failure - a bad connection -, it could take down the interface .... check your the settings.
    Which resets the interface,
    Which restart the resolver ...
    Which explains DNS isseus - added to the connection issues.

    45b94d6c-6187-4224-b48c-6934f724b1f3-image.png



  • Yes the interface will go down, it also sometimes bounces the modem so hard it gives me a NAT IP over a bridged IP sometimes.

    Are you saying I should have those unchecked? I do have them as follows

    1cdff44d-8341-4fd4-80a0-ac6f4c9ac9f9-image.png

    The WAN does come back up and my SIP phone, Roku comcast app, and I can ping 151.101.129.67 but not Ping cnn.com [151.101.129.67]



  • Keep in mind :
    595871fa-9626-49ef-a4ee-48a0ff70f95a-image.png

    I am not 100% sure but this might have fixed it, waiting for a bounce again. Will this cause issues with IPs changing on the an aliases updates?



  • @ryno5514 said in circuit bouncing and DNS:

    Keep in mind :
    595871fa-9626-49ef-a4ee-48a0ff70f95a-image.png

    I am not 100% sure but this might have fixed it, waiting for a bounce again. Will this cause issues with IPs changing on the an aliases updates?

    Nope still not working



  • @ryno5514 said in circuit bouncing and DNS:

    Keep in mind :
    595871fa-9626-49ef-a4ee-48a0ff70f95a-image.png

    I am not 100% sure but this might have fixed it, waiting for a bounce again. Will this cause issues with IPs changing on the an aliases updates?

    The "DHCP registration" unchecked results in unbound being restarted less often.

    In the system logs you can see for yourself if dpinger restarts the WAN connection/interface. Normally, this is a good thing, but it can also make things worse, and transforms pfSense entirely in some sort of network on/off switch.
    Whats happing, I guess, is : upstream you have a lot of traffic congestion. The regular dpinger ping starts to notice this (can be seen in the logs) and it restarts the WAN, which will restart other services like unbound.



  • @Gertjan Thats correct the link at the areas local hub is "well over 80%" this is my home internet and this issue is causing so many issues I am running off a cradle point most of the day. That being said my company is putting in business class circuit so we can escalate the capacity issue.

    In the mean time I really hate having to reboot my firewall 10+ times a day. Is there anything you can think of that can kick start the DNS "unbound" into working again without reboot?

    Really trying to avoid moving all my traffic over to my lab Velocould and FE60.



  • To be sure, check this option :

    b9aaa8df-b7dd-480e-8b01-fe8c259ce0e0-image.png

    if the WAN still goes bad, your issue is most probably upstream.



  • @Gertjan said in circuit bouncing and DNS:

    To be sure, check this option :

    b9aaa8df-b7dd-480e-8b01-fe8c259ce0e0-image.png

    if the WAN still goes bad, your issue is most probably upstream.

    Turning that off also.


  • Netgate Administrator

    Are you policy routing traffic from clients out of the WAN gateway? You mentioned you're using aliases to route traffic.

    If the default route from the firewall in System > Routing > Gateways is set to auto still it may be choosing a bad gateway when the WAN goes down. Make sure it's set to WANGW or a valid failover group etc.

    In that situation Unbound will not be able to resolve as won't have a route but clients that are hitting policy routing rules will still be able to connect by IP.

    Steve



  • @stephenw10

    The edits @Gertjan had be make seems to helped a good amount. I am only having this issue when the Comcast link goes down for more than a few minutes. I updated the "Disable Gateway Monitoring Action" this morning so waiting on a bounce to happen.

    Yesterday I only needed to hard reboot 2 times so this is much better.

    b8de5f71-ff6f-4d97-9f24-da3a57bc38d5-image.png


  • Netgate Administrator

    You should, obviously, not need to reboot at all.

    Was the default gateway already set to WAN_DHCP?

    Steve



  • @stephenw10 Yes it was.

    So after the edit to "Disable Gateway Monitoring Action" and the static DHCP it seems that all is much better now. I bounced 7 times yesterday and recovered each time all services.

    I only worry about static DHCP is that going to mess any of my lookups up? or is that only for the LAN side?



  • @ryno5514 said in circuit bouncing and DNS:

    I only worry about static DHCP is that going to mess any of my lookups up?

    All the "Static DHCP" lease details are written to the system hosts file, so they are known 'for live'.
    Lookups for a device will work, even if the device isn't present in the network, and the last recent lease expired.

    @ryno5514 said in circuit bouncing and DNS:

    s that only for the LAN side?

    "Static DHCP" are leases that the DHCP server hands out to devices on LAN's.
    Has nothing to do with the WAN side, where a DHCP-client might be setup, so it can ask an IP/Gateway/DNS/etc from the upstream DHCP server, probably your ISP router .... which has a ... DHCP server on board.



  • @Gertjan Fantastic sir. Looks a lot better, I have my second circuit being installed tomorrow and might put a Velo for a 3rd WAN link to be safe.


Log in to reply