Intermittent connection issue
-
Right, that was the thing that I couldn't figure out as well. Even if unbound was restarting and DNS was not working, I would still expect pings to 8.8.8.8 to work.
Do you have "Register DHCP static mappings" enabled in the DNS Resolver settings? I would expect that will cause unbound to restart as well. To be clear, I don't think the DHCP/Static IP requests are the cause of the problem, but I think they seem to be triggering the problem. You pretty much have it isolated down to unbound. Enabling forwarding bypasses unbound.
Bypassing unbound = working.
Not bypassing unbound = not working
Is that correct?It sounds like you haven't actually checked whether the action of connecting the PC was causing unbound to restart. Check the DNS Resolver log under system logs. It could be that unbound is not behaving as expected so look into that. Is unbound actually restarting quickly as expected? Is it hanging and causing pfSense to hang as well? pfSense itself will use unbound if that's the only option available for DNS resolution. There is even a setting to disable that default behavior in the general setup.
I'm not an expert on the topic of unbound but I would suggest looking into that though. Make sure that unbound is running and there aren't multiple instances of it or something crazy like that. I'm not being sarcastic here, I honestly do not know how pfSense would behave if the DNS server it was relying on to do its job was not working. -
@Raffi_ said in Intermittent connection issue:
Right, that was the thing that I couldn't figure out as well. Even if unbound was restarting and DNS was not working, I would still expect pings to 8.8.8.8 to work.
Do you have "Register DHCP static mappings" enabled in the DNS Resolver settings? I would expect that will cause unbound to restart as well. To be clear, I don't think the DHCP/Static IP requests are the cause of the problem, but I think they seem to be triggering the problem. You pretty much have it isolated down to unbound. Enabling forwarding bypasses unbound.
Bypassing unbound = working.
Not bypassing unbound = not working
Is that correct?It sounds like you haven't actually checked whether the action of connecting the PC was causing unbound to restart. Check the DNS Resolver log under system logs. It could be that unbound is not behaving as expected so look into that. Is unbound actually restarting quickly as expected? Is it hanging and causing pfSense to hang as well? pfSense itself will use unbound if that's the only option available for DNS resolution. There is even a setting to disable that default behavior in the general setup.
I'm not an expert on the topic of unbound but I would suggest looking into that though. Make sure that unbound is running and there aren't multiple instances of it or something crazy like that. I'm not being sarcastic here, I honestly do not know how pfSense would behave if the DNS server it was relying on to do its job was not working.Yes, that's pretty much correct. Since your last post, I decided to keep forwarding enabled in unbound and things were smooth similar to how things were when I used my Asus router. And then I decided to turn on my desktop at 11/6/2019 6:51 AM today and surprisingly the monitoring graph caught the issue:
These were the system logs at that time:
Nov 6 06:53:13 php-fpm 77800 /index.php: Successful login for user 'kevindd992002' from: 192.168.20.21 (Local Database) Nov 6 06:53:18 rc.gateway_alarm 54269 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:21.129ms RTTsd:.260ms Loss:21%) Nov 6 06:53:18 check_reload_status updating dyndns WAN_DHCP Nov 6 06:53:18 check_reload_status Restarting ipsec tunnels Nov 6 06:53:18 check_reload_status Restarting OpenVPN tunnels/interfaces Nov 6 06:53:18 check_reload_status Reloading filter Nov 6 06:53:20 php-fpm 380 /rc.dyndns.update: MONITOR: WAN_DHCP is down, omitting from routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.139ms|0.274ms|21%|down Nov 6 06:53:20 php-fpm 77800 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 6 06:53:20 php-fpm 77800 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Nov 6 06:53:20 php-fpm 380 /rc.dyndns.update: 380MONITOR: WAN_DHCP is available now, adding to routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.141ms|0.276ms|20%|loss Nov 6 06:53:20 php-fpm 73145 /rc.filter_configure_sync: MONITOR: WAN_DHCP is down, omitting from routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.139ms|0.3ms|21%|down Nov 6 06:53:21 php-fpm 380 /rc.dyndns.update: 380MONITOR: WAN_DHCP is available now, adding to routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.142ms|0.296ms|20%|loss Nov 6 06:53:22 php-fpm 380 /rc.dyndns.update: phpDynDNS (Condo): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Nov 6 06:53:30 rc.gateway_alarm 83271 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:0 RTT:21.140ms RTTsd:.302ms Loss:20%) Nov 6 06:53:30 check_reload_status updating dyndns WAN_DHCP Nov 6 06:53:30 check_reload_status Restarting ipsec tunnels Nov 6 06:53:30 check_reload_status Restarting OpenVPN tunnels/interfaces Nov 6 06:53:30 check_reload_status Reloading filter Nov 6 06:53:31 rc.gateway_alarm 87686 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:1 RTT:21.144ms RTTsd:.318ms Loss:21%) Nov 6 06:53:31 check_reload_status updating dyndns WAN_DHCP Nov 6 06:53:31 check_reload_status Restarting ipsec tunnels Nov 6 06:53:31 check_reload_status Restarting OpenVPN tunnels/interfaces Nov 6 06:53:31 check_reload_status Reloading filter Nov 6 06:53:31 php-fpm 73145 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 6 06:53:31 php-fpm 73145 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Nov 6 06:53:31 php-fpm 380 /rc.filter_configure_sync: MONITOR: WAN_DHCP is down, omitting from routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.144ms|0.318ms|21%|down Nov 6 06:53:32 php-fpm 380 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 6 06:53:32 php-fpm 380 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Nov 6 06:53:33 php-cgi notify_monitor.php: Message sent to kevindd992002@yahoo.com OK Nov 6 06:53:36 php-fpm 381 /rc.dyndns.update: phpDynDNS (Condo): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Nov 6 06:53:40 php-fpm 77800 /rc.dyndns.update: phpDynDNS (Condo): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Nov 6 06:53:57 php-cgi notify_monitor.php: Message sent to kevindd992002@yahoo.com OK Nov 6 06:54:11 rc.gateway_alarm 30595 >>> Gateway alarm: WAN_DHCP (Addr:8.8.8.8 Alarm:0 RTT:21.204ms RTTsd:.290ms Loss:20%) Nov 6 06:54:11 check_reload_status updating dyndns WAN_DHCP Nov 6 06:54:11 check_reload_status Restarting ipsec tunnels Nov 6 06:54:11 check_reload_status Restarting OpenVPN tunnels/interfaces Nov 6 06:54:11 check_reload_status Reloading filter Nov 6 06:54:12 php-fpm 33202 /rc.openvpn: Gateway, none 'available' for inet6, use the first one configured. '' Nov 6 06:54:12 php-fpm 33202 /rc.openvpn: OpenVPN: One or more OpenVPN tunnel endpoints may have changed its IP. Reloading endpoints that may use WAN_DHCP. Nov 6 06:54:12 php-fpm 380 /rc.dyndns.update: 380MONITOR: WAN_DHCP is available now, adding to routing group Failover 8.8.8.8|192.168.100.2|WAN_DHCP|21.195ms|0.296ms|18%|loss Nov 6 06:54:14 php-fpm 380 /rc.dyndns.update: phpDynDNS (Condo): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. Nov 6 06:54:32 php-cgi notify_monitor.php: Message sent to kevindd992002@yahoo.com OK
I don't see anything in the DNS Resolver logs too. I mean, the unbound service restarted as expected but it's just 1 second and that shouldn't be the issue:
Nov 6 06:46:22 unbound 28470:0 notice: Restart of unbound 1.9.1. Nov 6 06:46:22 unbound 28470:0 notice: init module 0: validator Nov 6 06:46:22 unbound 28470:0 notice: init module 1: iterator Nov 6 06:46:23 unbound 28470:0 info: start of service (unbound 1.9.1). Nov 6 06:46:29 unbound 28470:1 info: generate keytag query _ta-4f66. NULL IN Nov 6 06:53:21 filterdns merge_config: configuration reload Nov 6 06:53:21 filterdns Adding Action: pf table: HostsToTunnel host: plex.tv Nov 6 06:53:31 filterdns merge_config: configuration reload Nov 6 06:53:31 filterdns Adding Action: pf table: HostsToTunnel host: plex.tv Nov 6 06:53:33 filterdns merge_config: configuration reload Nov 6 06:53:33 filterdns Adding Action: pf table: HostsToTunnel host: plex.tv Nov 6 06:54:13 filterdns merge_config: configuration reload Nov 6 06:54:13 filterdns Adding Action: pf table: HostsToTunnel host: plex.tv Nov 6 07:43:46 unbound 28470:0 info: service stopped (unbound 1.9.1). Nov 6 07:43:46 unbound 28470:0 info: server stats for thread 0: 740 queries, 216 answers from cache, 524 recursions, 0 prefetch, 0 rejected by ip ratelimiting Nov 6 07:43:46 unbound 28470:0 info: server stats for thread 0: requestlist max 2 avg 0.0877863 exceeded 0 jostled 0 Nov 6 07:43:46 unbound 28470:0 info: average recursion processing time 0.065975 sec Nov 6 07:43:46 unbound 28470:0 info: histogram of recursion processing times Nov 6 07:43:46 unbound 28470:0 info: [25%]=0.0055808 median[50%]=0.0104123 [75%]=0.0318903 Nov 6 07:43:46 unbound 28470:0 info: lower(secs) upper(secs) recursions
I also don't see anything unusual in the DHCP logs:
Nov 6 06:51:31 dhcpd DHCPDISCOVER from <MAC> via igb1 Nov 6 06:51:31 dhcpd DHCPOFFER on 192.168.20.21 to <MAC> via igb1 Nov 6 06:51:34 dhcpd DHCPREQUEST for 192.168.20.21 (192.168.20.1) from <MAC> via igb1 Nov 6 06:51:34 dhcpd DHCPACK on 192.168.20.21 to <MAC> via igb1 Nov 6 07:00:41 dhcpd DHCPREQUEST for 192.168.20.21 from <MAC>via igb1 Nov 6 07:00:41 dhcpd DHCPACK on 192.168.20.21 to <MAC> via igb1 Nov 6 07:00:42 dhcpd DHCPREQUEST for 192.168.20.21 from <MAC> via igb1 Nov 6 07:00:42 dhcpd DHCPACK on 192.168.20.21 to <MAC> via igb1
So at this point, it's a mix of unbound and the desktop causing the problem? I'm scratching my head hard here. I can definitely say that with forwarding enabled, it was stable for 4 straight days. Turning on my laptop did not do anything to the network compared to when it also caused issues when using just unbound.
-
@kevindd992002 said in Intermittent connection issue:
Loss:21%)
That triggered bunch of different things... Since you prob have it set to reset on loss of gateway.
Why are you monitoring 8.8.8.8 and not pfsense gateway?
-
@johnpoz said in Intermittent connection issue:
@kevindd992002 said in Intermittent connection issue:
Loss:21%)
That triggered bunch of different things... Since you prob have it set to reset on loss of gateway.
Why are you monitoring 8.8.8.8 and not pfsense gateway?
Right, but is it just a big coincidence that the loss happened right after I turned on the desktop?
I'm monitoring 8.8.8.8 because pfsense's gateway is 192.168.100.1 (modem IP because of CGNAT). So if Internet is down, that private IP will always be up and this obviously would not be accurate for gateway monitoring, is it?
-
That is really weird. Now it would seem like unbound isn't an issue, maybe? This doesn't make a lot of sense.
8.8.8.8 should be fine for a monitor IP. I use it and have not had issues with it so far. I have other issues with my network but that isn't one :)
I have no idea why pfSense is unable to ping 8.8.8.8 when that PC is plugged in. It doesn't make a lot of sense. This may be a band-aid, but if you get around 21% loss when that PC is connected, maybe increase the packet loss threshold.
This is how mine is currently setup because I didn't want my main WAN being taken down from packet loss that was able to correct itself after several seconds.
Sorry if you already mentioned this, but is this issue is ONLY happening when that specific PC is plugged in? If so, I would suggest double checking the settings on that PC. For example, does the network interface on the PC have any additional IP aliases which could be conflicting when it is first plugged in? In Windows, they do a pretty good job of hiding these settings in deeper menus. I may be reaching here, but everything I'm reading does seem to boil down to that PC being connected to the network. -
@Raffi_ said in Intermittent connection issue:
That is really weird. Now it would seem like unbound isn't an issue, maybe? This doesn't make a lot of sense.
8.8.8.8 should be fine for a monitor IP. I use it and have not had issues with it so far. I have other issues with my network but that isn't one :)
I have no idea why pfSense is unable to ping 8.8.8.8 when that PC is plugged in. It doesn't make a lot of sense. This may be a band-aid, but if you get around 21% loss when that PC is connected, maybe increase the packet loss threshold.
This is how mine is currently setup because I didn't want my main WAN being taken down from packet loss that was able to correct itself after several seconds.
Sorry if you already mentioned this, but is this issue is ONLY happening when that specific PC is plugged in? If so, I would suggest double checking the settings on that PC. For example, does the network interface on the PC have any additional IP aliases which could be conflicting when it is first plugged in? In Windows, they do a pretty good job of hiding these settings in deeper menus. I may be reaching here, but everything I'm reading does seem to boil down to that PC being connected to the network.It really doesn't make any sense. But like I said in my last reply to johnpoz, it could be just a coincidence that when I turned on my desktop there was really a packet loss to 8.8.8.8 causing the gateway to be down. If you remember, in my past tests I do not have any gateway down indications when the issue is happening. I turned on the desktop again just now and it did not have any issues. I'm still going towards unbound being the issue. I'll test more later.
That's indeed a band-aid and would not work for me.
When pfsense was still set to use unbound, I thought it was only this desktop causing the issue but after a few days my laptop also caused the same issue. This is what made me try and use forwarding with unbound. So I don't think this desktop PC is the issue.
-
Your issue is you had 21% packet loss that triggered a gateway down event.. So no shit unbound would not be able to resolve during that period..
-
@johnpoz said in Intermittent connection issue:
Your issue is you had 21% packet loss that triggered a gateway down event.. So no shit unbound would not be able to resolve during that period..
Yes, I agree 100%. Like I said, this could just be a coincidence with the power-up event of my desktop PC.
Do you have any comments on my observation when using unbound w/ forwarding vs. without? I have not experienced a single occurrence of the issue (except when the gateway went down) when I enabled forwarding. So that tells us that unbound w/o forwarding is the issue here but I can't point out why because I have another pfsense box on the same ISP that uses unbound w/o forwarding flawlessly.
-
As I have already gone over - if your line is having packet loss, then yes you can have an issue with resolving something more than say a forward. You have to talk to multiple servers all over the internet to resolve.. With a forward your just asking that 1 guy for what the answer is..
Upping your logging level in unbound (in the advanced section of unbound), also logging queries and answers will give you some insight to what might be the problem with resolving specific sites.
In the options box
server: log-queries: yes log-replies: yes
Look at your cache for your unbound for any problem sites that are not resolving.. If unbound is restarting you loose your cache.. Just because you haven't seen packet loss issues in the past, doesn't mean your not having them.. Your path to 8.8.8.8 is not the whole internet... It's an anycast address.. There are MULTIPLE paths to get to that address.
If your having issues with unbound resolving something - you have to troubleshoot the resolving issue.. Which is why setup your log to log more info, and log the queries.. and the answers..
On a problematic connection forwarding can be less likely to see problems than resolving. Especially if your restarting unbound and loosing your local cache. Especially if you have issues to talking to specific NS, which unbound keeps track of and doesn't try to use via its infra info.. But when the cache is lost on a restart, all of that info is lost as well..
-
@johnpoz said in Intermittent connection issue:
As I have already gone over - if your line is having packet loss, then yes you can have an issue with resolving something more than say a forward. You have to talk to multiple servers all over the internet to resolve.. With a forward your just asking that 1 guy for what the answer is..
Upping your logging level in unbound (in the advanced section of unbound), also logging queries and answers will give you some insight to what might be the problem with resolving specific sites.
In the options box
server: log-queries: yes log-replies: yes
Look at your cache for your unbound for any problem sites that are not resolving.. If unbound is restarting you loose your cache.. Just because you haven't seen packet loss issues in the past, doesn't mean your not having them.. Your path to 8.8.8.8 is not the whole internet... It's an anycast address.. There are MULTIPLE paths to get to that address.
If your having issues with unbound resolving something - you have to troubleshoot the resolving issue.. Which is why setup your log to log more info, and log the queries.. and the answers..
On a problematic connection forwarding can be less likely to see problems than resolving. Especially if your restarting unbound and loosing your local cache. Especially if you have issues to talking to specific NS, which unbound keeps track of and doesn't try to use via its infra info.. But when the cache is lost on a restart, all of that info is lost as well..
Like I said though, it's not even really the resolving part that's the issue. When I'm using unbound and the issue is present, I don't receive responses when I ping the IP address of www.google.com (so no resolution involved here, but unbound is the one causing it). When I switch to a forwarder, I don't encounter this issue.
So the problem is when using unbound, somehow pfsense cannot reach the multiple NS that it's trying to query from. But yeah, upping the logging level wouldn't hurt for me to try.
Is there such thing where an ISP doesn't work well with unbound setup on their customer's premises? Maybe bad routes to some NS servers or something? I'm just trying to think out of the box here.
Also, is it recommended to disable all these options so that unbound will not restart? If so, how can I resolve my local clients by their FQDN?
-
@kevindd992002 said in Intermittent connection issue:
(so no resolution involved here, but unbound is the one causing it)
unbound has ZERO!!! Let me repeat that ZERO!!! to do with you pinging some IP.. 8.8.8.8 is not resolved, so you resolver has ZERO to do with it... If you can not ping 8.8.8.8 then you have a connectivity issue and ZERO!!! Again ZERO to do with any forwarder or resolver you would be running..
Yes I would recommend you turn off registering dhcp or vpn in unbound - that causes a restart of it. Static is fine.
-
@johnpoz said in Intermittent connection issue:
@kevindd992002 said in Intermittent connection issue:
(so no resolution involved here, but unbound is the one causing it)
unbound has ZERO!!! Let me repeat that ZERO!!! to do with you pinging some IP.. 8.8.8.8 is not resolved, so you resolver has ZERO to do with it... If you can not ping 8.8.8.8 then you have a connectivity issue and ZERO!!! Again ZERO to do with any forwarder or resolver you would be running..
Yes I would recommend you turn off registering dhcp or vpn in unbound - that causes a restart of it. Static is fine.
Again, I know that pinging an IP address DOES NOT involve DNS resolution, I'm not a beginner here. I don't know how else to say what I'm observing but like I said when I switch to unbound the issue randomly shows itself but when I use forwarding I do not experience the issue. So as I see it, unbound is bugging the whole pfsense box that it's acting up in intermittently in reaching external servers.
If I turn off dhcp registration, how do I resolve my internal clients?
-
So I switched to unbound again while using my laptop and NOTHING ELSE. It was working for maybe around 30 mins until I experienced the issue again. Here's what I see in my cache:
When it was working, I had 0 Timeout A's for all those servers. And then it just started happening.
As for the logging level for DNS Resolver, what level should I put it? Increase it to level 5 right away? I've noticed that when I do that, I cannot see all of the logs under System Logs even though I increase the log filter quantity to an insane amount. That just means that there's too much data.
-
See those timeouts - that means your having issues talking to those.. That is going to cause you problems with resolution!! Those should pretty much be all ZEROS..
Looking at mine with 417 I have 1 entry with 2, and 1 other with 1
157.55.133.11 o365filtering.com. 449 0 94 376 752 2 2 0 156.154.64.10 amazonaws.com. 441 0 94 376 752 1 0 0
As to too much data - the data is there it has just rolled over.. Send it to a syslog if you want to be able to parse it easier.. But yeah having that many timeouts is going to be problematic..
-
@johnpoz said in Intermittent connection issue:
See those timeouts - that means your having issues talking to those.. That is going to cause you problems with resolution!! Those should pretty much be all ZEROS..
Looking at mine with 417 I have 1 entry with 2, and 1 other with 1
157.55.133.11 o365filtering.com. 449 0 94 376 752 2 2 0 156.154.64.10 amazonaws.com. 441 0 94 376 752 1 0 0
As to too much data - the data is there it has just rolled over.. Send it to a syslog if you want to be able to parse it easier.. But yeah having that many timeouts is going to be problematic..
Exactly! So it looks like my ISP's network is dropping packets when I'm using unbound but everything works properly when I'm forwarding to their DNS servers. Does that make sense? Or is there something wrong with the unbound service on my pfsense box?
-
Yeah its quite possible for your ISP to have issues with sending 53 or or just plain bad peering, etc.
How do you know your not having to retrans to 8.8.8.8 for your dns to work, and again 8.8.8.8 is anycast.. So you could be getting answers from any of them on that anycast network.. So it could be less noticeable... Or their dns..
If your having issues with unbound - use dnsmasq and forward...
Not sure what pfsense is suppose to do about your shitty isp?
-
@johnpoz said in Intermittent connection issue:
Yeah its quite possible for your ISP to have issues with sending 53 or or just plain bad peering, etc.
How do you know your not having to retrans to 8.8.8.8 for your dns to work, and again 8.8.8.8 is anycast.. So you could be getting answers from any of them on that anycast network.. So it could be less noticeable... Or their dns..
If your having issues with unbound - use dnsmasq and forward...
Not sure what pfsense is suppose to do about your shitty isp?
Problem is, how do I tell this to them? Or is this even a valid concern? All they will be telling me is that I need to use their own DNS servers.
What do you mean? I'm not using 8.8.8.8 for DNS to work. 8.8.8.8 is simply a monitor IP in my pfsense box, for my modem's gateway. How is that significant here? I'm curious.
What is the difference between using unbound with forwarding and dnsmasq with forwarding?
-
Ok, so I decided to disable DHCP registration and OpenVPN client registration from unbound since 4 days ago and everything seemed to be working perfectly now! I tried turning on my desktop and no issues at all! Weird thing is that when I had this issue I did not get a lot of unbound restarts as shown in the logs above so I still don't know the cause. There could be a device (when DHCP registration was enabled) in my network that constantly renewing its lease, I don't know, but I should've seen the restarts in the unbound logs. As for the Openvpn client registration, I think that's only for server-to-client OpenVPN connections which I do not use because my OpenVPN connection is server-to-server (site-to-site).
But yeah, so far so good. Now I'm thinking of implementing a pi-hole DNS server (using the container in docker hub) in my environment. What advantages will I get with using it? Is it just the beautiful interface/graphs that it offers? From what I've read so far, the pi-hole will be the DNS server for the clients (assigned via DHCP) and then it just forwards the requests to pfsense's unbound, right?
-
The different with dnsmasq and unbound for forwarding is dnsmasq out of the box forwards to ALL of the forwarders you have set at the same time, and uses just uses the first answer. I don't think you can tell unbound to do that.. Would have to check the unbound docs - but there is no way to set that in the gui of pfsense. This can be good for when you have bad peering isp or issues talking via some paths, etc.
Well restarting unbound never helps because it flushes your cache. But you do have a lot of timeouts you showed - so something not great with your isp either which could be problems.
For all we know your isp connections has gotten better, and has zero to do with unbound restarting.
Again when you have issues with dns - you can not just assume the problem is X, you need to troubleshoot the exact issue your seeing... Not just dns not working sometimes.. Pick something that didn't work and find out why.. Are you still seeing a lot of timeouts in your infra info?
You can also make sure you setup prefetch with unbound, this can help with problematic issues because it will look up stuff in the background before the ttl expire and it flushes out of the cache. Also setting serve 0 ttl can really help as well. Since now even if the cache expired, it would serve up the last entry when client asks for it - and then look it up in the background again.
-
@johnpoz said in Intermittent connection issue:
The different with dnsmasq and unbound for forwarding is dnsmasq out of the box forwards to ALL of the forwarders you have set at the same time, and uses just uses the first answer. I don't think you can tell unbound to do that.. Would have to check the unbound docs - but there is no way to set that in the gui of pfsense. This can be good for when you have bad peering isp or issues talking via some paths, etc.
Well restarting unbound never helps because it flushes your cache. But you do have a lot of timeouts you showed - so something not great with your isp either which could be problems.
For all we know your isp connections has gotten better, and has zero to do with unbound restarting.
Again when you have issues with dns - you can not just assume the problem is X, you need to troubleshoot the exact issue your seeing... Not just dns not working sometimes.. Pick something that didn't work and find out why.. Are you still seeing a lot of timeouts in your infra info?
You can also make sure you setup prefetch with unbound, this can help with problematic issues because it will look up stuff in the background before the ttl expire and it flushes out of the cache. Also setting serve 0 ttl can really help as well. Since now even if the cache expired, it would serve up the last entry when client asks for it - and then look it up in the background again.
I see. I thought both dnsmasq and unbound forwarding works that way.
For some reason, I have 0 timeouts now. The one I showed you with a lot of timeouts was when I had DHCP and OpenVPN client registrations in unbound enabled.
Yeah, that's possible. It could be that the ISP connection has gotten better and is another coincidence. I really don't know but I was trying to pinpoint the issue to one specific module on my whole infra but I still can't isolate it.
Ok, so I'll enable these three, I guess:
How does the Serve Expired setting help though? So if the record has a TTL of already, it will still serve it to the client and update the cache in the background. What if the record is really no longer valid, how would serving an invalid record to the client help? I'm trying to understand how that setting works.
-
Nevermind! I enabled those three options for both my infras and this problematic infra started getting DNS timeouts again! Tried to ping 8.8.8.8 from Diagnostics -> Ping using the WAN interface and 100% packet loss. When I think that one isolation step solves the problem, it bites me in the back after a few days of trying it. This is insane.
-
And lets go over this again - restarting unbound to change a feature has ZERO to do you with pinging anything by IP.. But if your having packetloss then resolving anything or even forwarding is going to be problematic at best.
-
@johnpoz said in Intermittent connection issue:
And lets go over this again - restarting unbound to change a feature has ZERO to do you with pinging anything by IP.. But if your having packetloss then resolving anything or even forwarding is going to be problematic at best.
Yes, I completely agree. I just stated what I did and observed, I didn't say that the unbound restart directly caused me having packet losses.
I still don't get it though. I don't know what to tell these ISP people because all along I still believe that the issue is with their network. I just can't prove it.
-
@kevindd992002, this is a very interesting thread. I've very similar experience with my pfsense. Some of my testing could give input to further investigations.
The problem is almost identical to what you have described. Occationally I cannot access webpages. This typically happens first time starting to browse Internet. It is any type of website, it could be a frequently accessed site or an one time visit. But It also occurs in the middle of when I'm browsing e.g. after a break.
But if I open up several tabs and start to access different websites, and constantly refresh and try to access them I finally start to connect to one or several of the websites. However there could still be one of them that is not loading.For e.g. when watching Netflix I could have and issue at the beginning when I start loading the first movie or even connecting to Netflix. However once the movie has started I cannot recall I have ever experienced a loading or connection issues. The same is true for Citrix when logging on to a remote desktop. I could have issues to connect to begin with but once connected I cannot recall any loading/connection issues.
I replaced my old Netgear SRX5308 with pfSense on a box from Protecli (amd64) Intel Celeron CPU J3160 @ 1.60GHz 4 CPUs: 1 package x 4 core, AES-NI CPU Crypto: Yes (active).
The old SRX5308 never had any connection issues however it was slow and capped my fiber connection. I have installed pfSense 3 times to make sure I have a default installation. Last 3 months I have been searching and reading Internet for trouble shoooting tips. I have tested several tips and none have solved the issue so far. -
I've had spare time and proceeded with the reinstallation of pfsense from scratch and since then I had 0 problems with unbound and no packet loss, so far. I'm not entirely sure if the issue was caused by some bug in pfsense (doubtful) but like @johnpoz said it could be just my ISP fixing something in the backend and was just coincidental to when I reinstalled pfsense. So far so good though, I'll definitely post back if I encounter the same issue (which I certainly hope I won't).
-
@johnpoz said in Intermittent connection issue:
The different with dnsmasq and unbound for forwarding is dnsmasq out of the box forwards to ALL of the forwarders you have set at the same time, and uses just uses the first answer. I don't think you can tell unbound to do that.. Would have to check the unbound docs - but there is no way to set that in the gui of pfsense. This can be good for when you have bad peering isp or issues talking via some paths, etc.
Well restarting unbound never helps because it flushes your cache. But you do have a lot of timeouts you showed - so something not great with your isp either which could be problems.
For all we know your isp connections has gotten better, and has zero to do with unbound restarting.
Again when you have issues with dns - you can not just assume the problem is X, you need to troubleshoot the exact issue your seeing... Not just dns not working sometimes.. Pick something that didn't work and find out why.. Are you still seeing a lot of timeouts in your infra info?
You can also make sure you setup prefetch with unbound, this can help with problematic issues because it will look up stuff in the background before the ttl expire and it flushes out of the cache. Also setting serve 0 ttl can really help as well. Since now even if the cache expired, it would serve up the last entry when client asks for it - and then look it up in the background again.
Does this mean that unbound, by default, when set to forwarding mode queries the DNS servers you set in General sequentially and that you cannot change this behavior?
-
@kevindd992002 and @johnpoz
This is my understanding from the documentation. "unbound will use the system DNS servers from System > General Setup or those received from a dynamic WAN, rather than using the root servers directly"When I changed my DNS resolver setting to "Enable Forwarding Mode" most of my intermittent connection issues have disappeared. Sometimes I still have to wait a sec or two before the webpage loads completely, but In general the pages load immediately and there are no long periods of total non-connection.
I have not tried using the DNS Forwarder (dnsmasq). The documentation on “DNS Query Forwarding” mentions that DNSSEC need to be disabled but I have not done this. Maybe it works for me as I did check that the DNS servers I choose were supporting DNSSEC.
-
I don't see any GUI setting for DNSSEC in dnsmasq so I'm not sure if it's supported. But yeah, if my upstream DNS server is just the ISP modem anyway (double NAT config), I won't use the "simultaneous query of DNS servers" capability of dnsmasq. In that case, would staying with unbound and forwarding be advisable since it has DNSSEC support and other settings that you can customize?
-
@johnpoz Do you still have any ideas on my question?
-
Just an update to this issue:
My ISP had my account migrated to their VLAN for static (public) IP configs for two weeks testing and I had 0 problems with unbound (resolver). As soon as they transferred me back to using dynamic IP with CGNAT, the issue went back right away when using unbound (resolver). I had to go back to using unbound (forwarder) again as a workaround.
They wanted me to switch over to using static IP but to do that I would have to upgrade my subscription with them and pay the extra static IP subscription. That won't happen. I'm trying to fight with them now and convince them that the issue is with their dynamic IP network.
-
Not being able to use the default resolver, unbound, in resolving mode means a severe connection problem.
Very standard DNS data traffic is one of the basic Internet minimal requirements. -
Exactly. That's what I've been telling them. DNS traffic is nothing special. It's like they're forcing their customers to only forward DNS requests.
-
@kevindd992002 said in Intermittent connection issue:
It's like they're forcing their customers
They discovered, like many before, that that info is worth a max. People are using already themself's massively the "let bring everything to 8.8.8.8".
And lets face it : if you are an ISP, and you have to make this deal with nearby Google's data centre to invest in a very costly 50 / 50 % fiber POP between the Google data centre and the ISP centre (ISP users consume a LOT of Youtube traffic !!) then what should this ISP do ? They cash cash out or they 'make this another deal'.
So, yes, DNS is manly 'visible' to they can grab it, and do what they want with it.Btw : it's the technical point of views that interests me here. I don't care what Google does, neither my ISP. They can have it, I don't care.
-
Get a different ISP? Clearly you pointed out them that their carrier grade nat is broken..
-
@Gertjan said in Intermittent connection issue:
@kevindd992002 said in Intermittent connection issue:
It's like they're forcing their customers
They discovered, like many before, that that info is worth a max. People are using already themself's massively the "let bring everything to 8.8.8.8".
And lets face it : if you are an ISP, and you have to make this deal with nearby Google's data centre to invest in a very costly 50 / 50 % fiber POP between the Google data centre and the ISP centre (ISP users consume a LOT of Youtube traffic !!) then what should this ISP do ? They cash cash out or they 'make this another deal'.
So, yes, DNS is manly 'visible' to they can grab it, and do what they want with it.Btw : it's the technical point of views that interests me here. I don't care what Google does, neither my ISP. They can have it, I don't care.
But I don't forward to 8.8.8.8. I'm trying to resolve. Sorry, not sure what you mean?
@johnpoz said in Intermittent connection issue:
Get a different ISP? Clearly you pointed out them that their carrier grade nat is broken..
I can't, I'm still locked in with them (contract) and I have a point-to-point VPN connection with my other house that's using the same ISP (so best connection quality).
-
@kevindd992002 said in Intermittent connection issue:
But I don't forward to 8.8.8.8
8.8.8 is just an example. And I was talking about 8.8.8.8, choosen by your ISP, the DNs where they are forwarding to.
@kevindd992002 said in Intermittent connection issue:
I had to go back to using unbound (forwarder) again as a workaround.
To who ?
Sending DNS request to
Hostname IP address IPv4 / IPv6 Organization a.root-servers.net 198.41.0.4, 2001:503:ba3e::2:30 VeriSign, Inc. b.root-servers.net 199.9.14.201, 2001:500:200::b University of Southern California (ISI) c.root-servers.net 192.33.4.12, 2001:500:2::c Cogent Communications d.root-servers.net 199.7.91.13, 2001:500:2d::d University of Maryland e.root-servers.net 192.203.230.10, 2001:500:a8::e NASA f.root-servers.net 192.5.5.241, 2001:500:2f::f Internet Systems Consortium, Inc. g.root-servers.net 192.112.36.4, 2001:500:12::d0d US Department of Defense (NIC) h.root-servers.net 198.97.190.53, 2001:500:1::53 US Army (Research Lab) i.root-servers.net 192.36.148.17, 2001:7fe::53 Netnod j.root-servers.net 192.58.128.30, 2001:503:c27::2:30 VeriSign, Inc. k.root-servers.net 193.0.14.129, 2001:7fd::1 RIPE NCC l.root-servers.net 199.7.83.42, 2001:500:9f::42 ICANN m.root-servers.net 202.12.27.33, 2001:dc3::35 WIDE Project
or, the DNS you choose to forward to, what is the difference ? Yet you said that the first 13 are not possible.
Read, for example https://securitytrails.com/blog/dns-root-servers and understand something is very wrong.
Reset pfSense to default, and see if it works. If not, take a look at your ISP contract.
@kevindd992002 said in Intermittent connection issue:
I have a point-to-point VPN connection
Aha. That changes a lot.
Use pfSense with default settings (again !) and the resolver will work.
Adding a VPN and suddenly it stops. That make the solution rather simple. : remove things that break things.
Or setup correctly the new things. -
@Gertjan said in Intermittent connection issue:
@kevindd992002 said in Intermittent connection issue:
But I don't forward to 8.8.8.8
8.8.8 is just an example. And I was talking about 8.8.8.8, choosen by your ISP, the DNs where they are forwarding to.
@kevindd992002 said in Intermittent connection issue:
I had to go back to using unbound (forwarder) again as a workaround.
To who ?
Sending DNS request to
Hostname IP address IPv4 / IPv6 Organization a.root-servers.net 198.41.0.4, 2001:503:ba3e::2:30 VeriSign, Inc. b.root-servers.net 199.9.14.201, 2001:500:200::b University of Southern California (ISI) c.root-servers.net 192.33.4.12, 2001:500:2::c Cogent Communications d.root-servers.net 199.7.91.13, 2001:500:2d::d University of Maryland e.root-servers.net 192.203.230.10, 2001:500:a8::e NASA f.root-servers.net 192.5.5.241, 2001:500:2f::f Internet Systems Consortium, Inc. g.root-servers.net 192.112.36.4, 2001:500:12::d0d US Department of Defense (NIC) h.root-servers.net 198.97.190.53, 2001:500:1::53 US Army (Research Lab) i.root-servers.net 192.36.148.17, 2001:7fe::53 Netnod j.root-servers.net 192.58.128.30, 2001:503:c27::2:30 VeriSign, Inc. k.root-servers.net 193.0.14.129, 2001:7fd::1 RIPE NCC l.root-servers.net 199.7.83.42, 2001:500:9f::42 ICANN m.root-servers.net 202.12.27.33, 2001:dc3::35 WIDE Project
or, the DNS you choose to forward to, what is the difference ? Yet you said that the first 13 are not possible.
Read, for example https://securitytrails.com/blog/dns-root-serversand understand something is very wrong.
Reset pfSense to default, and see if it works. If not, take a look at your ISP contract.
@kevindd992002 said in Intermittent connection issue:
I have a point-to-point VPN connection
Aha. That changes a lot.
Use pfSense with default settings (again !) and the resolver will work.
Adding a VPN and suddenly it stops. That make the solution rather simple. : remove things that break things.
Or setup correctly the new things.I'm currently forwarding to another local ISP's DNS servers that's known to be more stable. This works fine. As soon as I use resolver (querying against root hints servers), I get random drops. Are you saying ISP themselves just forward to Google for example? I was under the impression that they act as a resolver.
The link you gave is not found.
If you remember, we've been over the resetting of things to defaults :) It doesn't work. What I've been able to deduce/conclude is what I've explained just recently (resolver vs forwarder).
Not sure how a point to point VPN connection affects this? DNS traffic isn't routing through the tunnel. Also, like I said I did try pfsense with default settings already, to no avail. With a static IP from the ISP, everything works as expected. So there's really something wrong with their dynamic IP VLAN.
-
@kevindd992002 said in Intermittent connection issue:
I was under the impression that they act as a resolver.
You have no idea what they do.. They could resolve, they could forward... You threw your dns over the fence to them - what they do with it is out of your control... You just hope they give you back an answer, and you trust them to give you good info... They could give you whatever they want..
This is one of the big advantages to resolving - you control the dns.. You ask the authoritative NS directly... Not just trust someone else to have the right answer.
For all you know they forward, and the person they forward to forwards ;) Yes at some point there has to be a resolver.. But it could be a couple of forwarders in there for sure..
-
@johnpoz said in Intermittent connection issue:
@kevindd992002 said in Intermittent connection issue:
I was under the impression that they act as a resolver.
You have no idea what they do.. They could resolve, they could forward... You threw your dns over the fence to them - what they do with it is out of your control... You just hope they give you back an answer, and you trust them to give you good info... They could give you whatever they want..
This is one of the big advantages to resolving - you control the dns.. You ask the authoritative NS directly... Not just trust someone else to have the right answer.
For all you know they forward, and the person they forward to forwards ;) Yes at some point there has to be a resolver.. But it could be a couple of forwarders in there for sure..
Exactly! I want to have my own resolver. I told them that I don't want to rely on their DNS servers.
-
@kevindd992002 said in Intermittent connection issue:
Are you saying ISP themselves just forward to Google for example?
Well. Yes.
I don't know what they do, as they don't tell me, and I did not asked them.
But what would you do, being an ISP - and you have to pay the POP's ?
Resolve ? Or use Google DNS and billing them with thousands every month ? or not paying that POP thousands a month ?Again, you should be able to 'contact' the main 13 first core DNS servers. If not, something is very wrong.