Gateway monitor down
-
I recently switched from one ISP to another because my former ISP does not allow high number of outbound connections so I cannot use unbound. With the new ISP, I have a bigger problem. I have 8.8.4.4 as a WAN monitor IP and it's very unstable:
I have a public IP assigned to my pfsense's WAN interface, so my ONU is bridged. Aside from the gateway monitor being unstable, there's a couple of times a day that it goes totally offline (100% packet loss) and to fix it I either release/renew a DHCP lease in pfsense or restart the ONU (which has the same effect of releasing and renewing the DHCP lease).
It's very hard to communicate this issue to the ISP here in the Philippines as they're really incompetent. They already replaced my ONU without any success. I've provided them graphs and showed them live how pinging 8.8.4.4 and 8.8.8.8 is very unstable. Even the landline phone connected to the ONU (VOIP) is getting "choppy" signal.
At this point, what can I take as a next step here? Their default setup is CGNAT so they told me that their next step would be to bring back the ONU to route mode and that I can still have a public dynamic IP after that. I'm assuming they'll do 1:1 NAT so all traffic from external to internal goes to my ONU interface. I can then setup a DMZ IP on my ONU to forward all to my pfsense WAN interface so I can manage all port forwarding from there. At least this is how I assume they will do it but I'm still taking this with a grain of salt because of their incompetency.
-
The worst thing is here when people in my household are having work meetings and they just suddenly get disconnected. At that exact point, I see that the gateway goes down.
Why do you think renewing the DHCP lease solves the problem temporarily? Doesn't DHCP automatically informs my pfsense's WAN interface to get a new lease when the lease expires?
-
@kevindd992002 said in Gateway monitor down:
I recently switched from one ISP to another because my former ISP does not allow high number of outbound connections so I cannot use unbound. With the new ISP, I have a bigger problem. I have 8.8.4.4 as a WAN monitor IP and it's very unstable:
You mean 'unbound' the DNS resolver, creates to many connections ???
You yourself a favour, and look for other proclaiming the same thing - on this forum, or where ever on the Internet.
True, if all your LAN clients try to resolve a huge number of host names in a short true span, unbound will try to keep up. But it resolves, so it will use it's favourite (one of the 13) root DNS servers, then a favourite (close by) TLS server, and then the host domain server.
That process never crippled a WAN connection, not that I know of.Using 8.8.4.4, yeah, spot on : just don't .... because : why would you need to use them ??. Just use the 'resolver' mode, do not forward, and you'll be fine.
@kevindd992002 said in Gateway monitor down:
I have a public IP assigned to my pfsense's WAN interface, so my ONU is bridged. Aside from the gateway monitor being unstable, there's a couple of times a day that it goes totally offline (100% packet loss) and to fix it I either release/renew a DHCP lease in pfsense or restart the ONU (which has the same effect of releasing and renewing the DHCP lease).
Spot-on.
IP renewall doesn't work well. Resolve that issue, and you'll be fine.@kevindd992002 said in Gateway monitor down:
It's very hard to communicate this issue to the ISP here in the Philippines as they're really incompetent.
Same reason here in France : with a classic 'home' type Internet connection, you should use their box. They know it will work, change it for free if needed, and don't loose any time using question that they can never ever asnwer : the help desk people don't know anything about 'DHCP' or protocol etc. people that know this stuff are known to be expensive. Which means you will pay for this. etc etc.
The rule is global : you get what you pay for.@kevindd992002 said in Gateway monitor down:
Doesn't DHCP automatically informs my pfsense's WAN interface to get a new lease when the lease expires?
This subject is very often discussed.
You should 'wireshark' your WAN interface, to see what happens when half the lease time is over. The pfSense DHCP client should start to send DHCP renewal packets. The ISP DHCP server should reply. etc. -
@gertjan said in Gateway monitor down:
@kevindd992002 said in Gateway monitor down:
I recently switched from one ISP to another because my former ISP does not allow high number of outbound connections so I cannot use unbound. With the new ISP, I have a bigger problem. I have 8.8.4.4 as a WAN monitor IP and it's very unstable:
You mean 'unbound' the DNS resolver, creates to many connections ???
You yourself a favour, and look for other proclaiming the same thing - on this forum, or where ever on the Internet.
True, if all your LAN clients try to resolve a huge number of host names in a short true span, unbound will try to keep up. But it resolves, so it will use it's favourite (one of the 13) root DNS servers, then a favourite (close by) TLS server, and then the host domain server.
That process never crippled a WAN connection, not that I know of.Using 8.8.4.4, yeah, spot on : just don't .... because : why would you need to use them ??. Just use the 'resolver' mode, do not forward, and you'll be fine.
Yes, unbound the DNS resolver. We have local forums here specific to each ISP. This is a very common issue with my past ISP. They somehow block DNS requests when using unbound and limit TCP connections. I know DNS is UDP so I don't really know what is being limited. The common workaround for this issue is to use DoT or any encrypted DNS. But that means I can't use unbound so I decided to switch. This doesn't cripple a WAN connection anyway, I know. The limit is just there, for some unknown reason to me. I've been troubleshooting this stupid issue for more than a year until I realized that it's been an issue for others too.
For my current ISP, I'm using 8.8.4.4 just as a gateway monitor. I'm not forwarding to them. I'm able to use unbound also.
@kevindd992002 said in Gateway monitor down:
I have a public IP assigned to my pfsense's WAN interface, so my ONU is bridged. Aside from the gateway monitor being unstable, there's a couple of times a day that it goes totally offline (100% packet loss) and to fix it I either release/renew a DHCP lease in pfsense or restart the ONU (which has the same effect of releasing and renewing the DHCP lease).
Spot-on.
IP renewall doesn't work well. Resolve that issue, and you'll be fine.Right. Is this usually a problem on my side or the ISP's? Or only wireshark can tell?
Also, this is one problem. The other problem is the unstable ping to external servers which I don't think is related with the DHCP renewal issue.
@kevindd992002 said in Gateway monitor down:
It's very hard to communicate this issue to the ISP here in the Philippines as they're really incompetent.
Same reason here in France : with a classic 'home' type Internet connection, you should use their box. They know it will work, change it for free if needed, and don't loose any time using question that they can never ever asnwer : the help desk people don't know anything about 'DHCP' or protocol etc. people that know this stuff are known to be expensive. Which means you will pay for this. etc etc.
The rule is global : you get what you pay for.Ahh, you're right. Oh well. You would think that they would at least try and escalate the case to higher tier engineers when they don't know what's going on. I agree, they don't even know how basic networking works.
@kevindd992002 said in Gateway monitor down:
Doesn't DHCP automatically informs my pfsense's WAN interface to get a new lease when the lease expires?
This subject is very often discussed.
You should 'wireshark' your WAN interface, to see what happens when half the lease time is over. The pfSense DHCP client should start to send DHCP renewal packets. The ISP DHCP server should reply. etc.Can I just use the built-in packet capture in pfsense for this?
-
Yes, just run a pcap on WAN when it fails and see what's happening.
If the DHCP lease has expired you should see pfSense requesting a new lease.
Steve
-
@kevindd992002 said in Gateway monitor down:
I know DNS is UDP
Get ready to know something more. When do DNS queries use TCP instead of UDP?
And because we are in the 2020ties, DNS, known since 1980, evolved, and DNS queries and answer became bigger. To big. UDP becomes useless, and DNS traffic shifts over to TCP.
A good example is DNSSEC, DNS traffic becomes a pure TCP thing, as answers just don't fit any more into "one packet" UDP.Btw : it's a known issue : pfSense is installed, and outgoing traffic is all blocked, with some exceptions : they allowed DNS ... over UDP only. Suddenly it looks like 'something' failed. It's of course 'the firewall' - or, no, wait ; the ISP !! or no, it's some one but not me !!
That's a classic newbie fail : one should understand things before using them. -
@gertjan said in Gateway monitor down:
@kevindd992002 said in Gateway monitor down:
I know DNS is UDP
Get ready to know something more. When do DNS queries use TCP instead of UDP?
And because we are in the 2020ties, DNS, known since 1980, evolved, and DNS queries and answer became bigger. To big. UDP becomes useless, and DNS traffic shifts over to TCP.
A good example is DNSSEC, DNS traffic becomes a pure TCP thing, as answers just don't fit any more into "one packet" UDP.Right, that's as much as I know, it mainly uses UDP and sometimes uses TCP. Thanks for the link that explains when it uses TCP.
Btw : it's a known issue : pfSense is installed, and outgoing traffic is all blocked, with some exceptions : they allowed DNS ... over UDP only. Suddenly it looks like 'something' failed. It's of course 'the firewall' - or, no, wait ; the ISP !! or no, it's some one but not me !!
That's a classic newbie fail : one should understand things before using them.
Not sure what your point is here. The only thing that's being blocked is DNS resolving and things like torrenting with multiple TCP connections. DNS forwarding was not blocked. For reference, here's my past thread:https://forum.netgate.com/topic/159232/dns-resolver-timeouts
It's pointless to discuss about that here though as it is out-of-topic. My main issue here is regarding the packet losses I experience with my current ISP, not the DNS resolving issues I had with my past ISP. It's not like I'll be going back to my past ISP anytime soon because of the lockup period I have with my current one.
-
@stephenw10 said in Gateway monitor down:
Yes, just run a pcap on WAN when it fails and see what's happening.
If the DHCP lease has expired you should see pfSense requesting a new lease.
Steve
Gotcha, I'll make sure to get a pcap when it happens again. Do you have any ideas regarding the unstable ping WAN monitor results though?
-
Not really. Is anything logged?
Do you see the same loss if you choose a different IP?
If you use the ISP gateway IP directly?
Steve
-
@stephenw10 said in Gateway monitor down:
Not really. Is anything logged?
Do you see the same loss if you choose a different IP?
If you use the ISP gateway IP directly?
Steve
Yes, so far I tried 8.8.8.8, 8.8.4.4, 1.1.1.1, and my ISP gateway IP. It's less prevalent on the ISP gateway IP but it still fluctuates. With my past ISP, here's how it looks with 8.8.4.4:
You can just see the difference with the graph in my original post.
-
Mmm, that sounds like an actual WAN issue. Especially since it hit's your VoIP too and that's not going through pfSense as I understand it.
Steve
-
@stephenw10 said in Gateway monitor down:
Mmm, that sounds like an actual WAN issue. Especially since it hit's your VoIP too and that's not going through pfSense as I understand it.
Steve
Yeah, probably. What I noticed is that those peaks in the standard deviation are mostly consistent with a 14-15 mins interval which is kinda weird.
For the DHCP lease issue, could this setting in the WAN interface potentially be what's causing the issue?
I'm just thinking out loud as their DHCP server can be a private IP address.
-
@kevindd992002 said in Gateway monitor down:
I'm just thinking out loud as their DHCP server can be a private IP address.
"pcap"ing will tell you that.
(during testing, remove that block rule on the WAN). -
@kevindd992002 said in Gateway monitor down:
could this setting in the WAN interface potentially be what's causing the issue?
No, that will only prevent incoming connections from a private IP on WAN. The DHCP client initiates the connections outbound.
If it was exactly 15min or some other exact interval I'd be looking at some ARP problem perhaps. But I'm not seeing a pattern that matches that.
Steve
-
@stephenw10 said in Gateway monitor down:
@kevindd992002 said in Gateway monitor down:
could this setting in the WAN interface potentially be what's causing the issue?
No, that will only prevent incoming connections from a private IP on WAN. The DHCP client initiates the connections outbound.
If it was exactly 15min or some other exact interval I'd be looking at some ARP problem perhaps. But I'm not seeing a pattern that matches that.
Steve
That's my exact hunch. I know dhclient from pfsense is the initiating the connection to the ISP DHCP server so it's got to be an outbound connection.
The ISP is still troubleshooting the issue from their end and even though I told them to not touch anything on my ONU about it being in "bridge" mode, they did. Surprise surprise. And now I'm currently at route mode where my pfsense WAN interface is getting a private IP from the ONU DHCP server (NAT).
What I notice is that I don't get the DHCP lease issue when on route mode. The problem now is that unbound as a resolver won't work. All it gives my clients are THROWAWAY/SERVFAIL results. When I turn it to a forwarder, it works. So there's got to be something with route mode that's hijacking DNS for some reason. I'm still pushing them to put back everything to bridge mode as that is very important for me and told them about the issue being potentially caused by the DHCP lease.
-
If they are hijacking DNS it would still fail in forwarding mode unless you're forwarding to their DNS servers perhaps. Or maybe you are disabling DNSSec in forwarding mode allowing them to.
-
Or maybe they have famous DNS servers (like Google, Cloudfare, etc.) whitelisted or something? I tried disabling DNSSEC for both cases and it just doesn't work in resolver mode. Do you have any other ideas? What I know is that this happened exactly the same time they put my ONU in route mode.
Snippet of unbound logs:
Dec 3 21:41:55 unbound 6032 [6032:0] info: control cmd: dump_infra Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 202.12.27.33#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:1::53 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 202.12.27.33#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:2d::d port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 192.112.36.4#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:503:c27::2:30 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 199.9.14.201#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:200::b port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:7fe::53 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 198.97.190.53#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 198.41.0.4#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:503:c27::2:30 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 192.36.148.17#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:2::c port 53
-
Mmm, well you seem to have to v6 servers configured that are not responding or you cannot reach so the first thing I would do is disable those.
-
@stephenw10 said in Gateway monitor down:
Mmm, well you seem to have to v6 servers configured that are not responding or you cannot reach so the first thing I would do is disable those.
How do I disable those? I tried removing the IPv6 link-local in "Network Interfaces" of the DNS Resolver settings and nothing changed.
That screenshot above is when DNS Resolver is NOT FORWARDING. So not sure how to instruct unbound to not query ipv6 DNS servers.
-
Do you actually have a routable IPv6 address on that firewall?