Gateway monitor down
-
@stephenw10 said in Gateway monitor down:
Not really. Is anything logged?
Do you see the same loss if you choose a different IP?
If you use the ISP gateway IP directly?
Steve
Yes, so far I tried 8.8.8.8, 8.8.4.4, 1.1.1.1, and my ISP gateway IP. It's less prevalent on the ISP gateway IP but it still fluctuates. With my past ISP, here's how it looks with 8.8.4.4:
You can just see the difference with the graph in my original post.
-
Mmm, that sounds like an actual WAN issue. Especially since it hit's your VoIP too and that's not going through pfSense as I understand it.
Steve
-
@stephenw10 said in Gateway monitor down:
Mmm, that sounds like an actual WAN issue. Especially since it hit's your VoIP too and that's not going through pfSense as I understand it.
Steve
Yeah, probably. What I noticed is that those peaks in the standard deviation are mostly consistent with a 14-15 mins interval which is kinda weird.
For the DHCP lease issue, could this setting in the WAN interface potentially be what's causing the issue?
I'm just thinking out loud as their DHCP server can be a private IP address.
-
@kevindd992002 said in Gateway monitor down:
I'm just thinking out loud as their DHCP server can be a private IP address.
"pcap"ing will tell you that.
(during testing, remove that block rule on the WAN). -
@kevindd992002 said in Gateway monitor down:
could this setting in the WAN interface potentially be what's causing the issue?
No, that will only prevent incoming connections from a private IP on WAN. The DHCP client initiates the connections outbound.
If it was exactly 15min or some other exact interval I'd be looking at some ARP problem perhaps. But I'm not seeing a pattern that matches that.
Steve
-
@stephenw10 said in Gateway monitor down:
@kevindd992002 said in Gateway monitor down:
could this setting in the WAN interface potentially be what's causing the issue?
No, that will only prevent incoming connections from a private IP on WAN. The DHCP client initiates the connections outbound.
If it was exactly 15min or some other exact interval I'd be looking at some ARP problem perhaps. But I'm not seeing a pattern that matches that.
Steve
That's my exact hunch. I know dhclient from pfsense is the initiating the connection to the ISP DHCP server so it's got to be an outbound connection.
The ISP is still troubleshooting the issue from their end and even though I told them to not touch anything on my ONU about it being in "bridge" mode, they did. Surprise surprise. And now I'm currently at route mode where my pfsense WAN interface is getting a private IP from the ONU DHCP server (NAT).
What I notice is that I don't get the DHCP lease issue when on route mode. The problem now is that unbound as a resolver won't work. All it gives my clients are THROWAWAY/SERVFAIL results. When I turn it to a forwarder, it works. So there's got to be something with route mode that's hijacking DNS for some reason. I'm still pushing them to put back everything to bridge mode as that is very important for me and told them about the issue being potentially caused by the DHCP lease.
-
If they are hijacking DNS it would still fail in forwarding mode unless you're forwarding to their DNS servers perhaps. Or maybe you are disabling DNSSec in forwarding mode allowing them to.
-
Or maybe they have famous DNS servers (like Google, Cloudfare, etc.) whitelisted or something? I tried disabling DNSSEC for both cases and it just doesn't work in resolver mode. Do you have any other ideas? What I know is that this happened exactly the same time they put my ONU in route mode.
Snippet of unbound logs:
Dec 3 21:41:55 unbound 6032 [6032:0] info: control cmd: dump_infra Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 202.12.27.33#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:1::53 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 202.12.27.33#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:2d::d port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 192.112.36.4#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:503:c27::2:30 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 199.9.14.201#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:200::b port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:7fe::53 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 198.97.190.53#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 198.41.0.4#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:503:c27::2:30 port 53 Dec 3 21:41:54 unbound 6032 [6032:3] info: query response was THROWAWAY Dec 3 21:41:54 unbound 6032 [6032:3] info: reply from <.> 192.36.148.17#53 Dec 3 21:41:54 unbound 6032 [6032:3] info: response for . NS IN Dec 3 21:41:54 unbound 6032 [6032:3] info: error sending query to auth server 2001:500:2::c port 53
-
Mmm, well you seem to have to v6 servers configured that are not responding or you cannot reach so the first thing I would do is disable those.
-
@stephenw10 said in Gateway monitor down:
Mmm, well you seem to have to v6 servers configured that are not responding or you cannot reach so the first thing I would do is disable those.
How do I disable those? I tried removing the IPv6 link-local in "Network Interfaces" of the DNS Resolver settings and nothing changed.
That screenshot above is when DNS Resolver is NOT FORWARDING. So not sure how to instruct unbound to not query ipv6 DNS servers.
-
Do you actually have a routable IPv6 address on that firewall?
-
@stephenw10 said in Gateway monitor down:
Do you actually have a routable IPv6 address on that firewall?
I don't. I even have ipv6 disabled:
-
Checking that box allows IPv6. But it doesn't matter. Unbound is trying to reach an external c6 server and obviously cannot if you don't have a public v6 IP it can use.
That in itself should not be an issue as long as it's not only using IPv6 servers, which it isn't.
Steve
-
@stephenw10 said in Gateway monitor down:
Checking that box allows IPv6. But it doesn't matter. Unbound is trying to reach an external c6 server and obviously cannot if you don't have a public v6 IP it can use.
That in itself should not be an issue as long as it's not only using IPv6 servers, which it isn't.
Steve
Oops, you're right. I had this unchecked before but checked it just recently. Correct, it shouldn't be an issue since it still tries ipv4. That's why I think all DNS queries from unbound are being blocked somehow. As to how the ISP detects this type of traffic vs when just forwarding to a few DNS servers is what I don't understand. And how when in bridge mode it all works just fine.
At this day and age, is it recommended to just allow all ipv6? I know in Windows the official recommendation from MS is to not disable ipv6.
-
Either enable it completely if your ISP supports it or disable it completely.
The worst case is where you have some IPv6 but not actual connectivity and client try to use it over v4.
Steve
-
@stephenw10 said in Gateway monitor down:
Either enable it completely if your ISP supports it or disable it completely.
The worst case is where you have some IPv6 but not actual connectivity and client try to use it over v4.
Steve
I know my ISP supports it but I haven't gotten to setting it up yet because of the main issue I'm having. Do clients prefer v6 over v4 if they're both setup?
-
Yes, most OSes will prefer v6 if the think they have a valid IP and that can introduce lengthy delays whilst it times out.
-
@stephenw10 They put it back to bridge mode now and DNS resolving is working properly again. However, my DHCP lease issue is back. So to sum it all up:
Bridge mode: no DNS resolving issue but DHCP lease issue is present
Route mode: No DHCP lease issue but DNS resolving issue is presentI need to be in bridge mode so that's where I'll focus my troubleshooting on. When my gateway went down, this is what I saw:
https://pastebin.com/tP1wm3Uf
However, a new IP was given to my WAN interface (gateway went up) after around 3 minutes of downtime. I'm seeing DHCPNAK's. What does that tell us? Also, why am I seeing frequent "renewal in 1800 seconds" messages? Does that mean the DHCP lease is just every 30 minutes?
I also got a packet capture while this is a happening. Since that contains public IP addresses, do you want me to send it to you?
-
Could it be similar to this issue?
https://forum.netgate.com/topic/112869/dhclient-on-wan-occasionally-fails-to-renew-lease-with-cable-isp
-
It happened again at 4:10PM. Here's a clearer view of what's happening (I filtered the dhclient process only):
Dec 8 16:44:57 dhclient 26504 bound to {New WAN IP} -- renewal in 1800 seconds. Dec 8 16:44:57 dhclient 17600 Creating resolv.conf Dec 8 16:44:57 dhclient 16975 RENEW Dec 8 16:44:57 dhclient 26504 unknown dhcp option value 0x52 Dec 8 16:44:57 dhclient 26504 DHCPACK from {DHCP Server/WAN interface Gateway} Dec 8 16:44:57 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:14:57 dhclient 26504 bound to {New WAN IP} -- renewal in 1800 seconds. Dec 8 16:14:57 dhclient 83905 Creating resolv.conf Dec 8 16:14:57 dhclient 83772 /sbin/route add default {DHCP Server/WAN interface Gateway} Dec 8 16:14:57 dhclient 83557 /sbin/route add -host {DHCP Server/WAN interface Gateway} -iface igb0 Dec 8 16:14:57 dhclient 82650 Adding new routes to interface: igb0 Dec 8 16:14:57 dhclient 82454 New Routers (igb0): {DHCP Server/WAN interface Gateway} Dec 8 16:14:57 dhclient 82184 New Broadcast Address (igb0): {New WAN Broadcast IP} Dec 8 16:14:57 dhclient 81938 New Subnet Mask (igb0): 255.255.224.0 Dec 8 16:14:57 dhclient 81784 New IP Address (igb0): {New WAN IP} Dec 8 16:14:57 dhclient 81144 ifconfig igb0 inet {New WAN IP} netmask 255.255.224.0 broadcast {New WAN Broadcast IP} Dec 8 16:14:57 dhclient 80989 Starting add_new_address() Dec 8 16:14:57 dhclient 80666 BOUND Dec 8 16:14:57 dhclient 26504 unknown dhcp option value 0x52 Dec 8 16:14:57 dhclient 26504 DHCPACK from {DHCP Server/WAN interface Gateway} Dec 8 16:14:56 dhclient 26504 DHCPREQUEST on igb0 to 255.255.255.255 port 67 Dec 8 16:14:56 dhclient 80354 ARPCHECK Dec 8 16:14:54 dhclient 79636 ARPSEND Dec 8 16:14:54 dhclient 26504 unknown dhcp option value 0x52 Dec 8 16:14:54 dhclient 26504 DHCPOFFER from {DHCP Server/WAN interface Gateway} Dec 8 16:14:54 dhclient 26504 DHCPDISCOVER on igb0 to 255.255.255.255 port 67 interval 1 Dec 8 16:14:54 dhclient 26504 DHCPNAK from {DHCP Server/WAN interface Gateway} Dec 8 16:14:54 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:14:26 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:13:46 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:13:32 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:13:20 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:13:09 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:12:37 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:12:22 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:12:09 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:57 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:52 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:50 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:49 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:48 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67 Dec 8 16:11:47 dhclient 26504 DHCPREQUEST on igb0 to {DHCP Server/WAN interface Gateway} port 67
So the client tries to do a DHCPREQUEST for several times until it finally receives a DHCPNAK from the server to initiate the whole DORA process again. At 4:44PM, it does the same thing but the server sends a DHCPACK after the first DHCPREQUEST from the client.