Gateway monitor down
-
It happened again just this very moment and the logs show the exact same thing.
-
Does it eventually switch back to broadcast and then get a reply from a different server?
I have seen ISPs with badly configured redundant DHCP servers that can behave like that.
You can set the WAN dhcp client to requests a different lease time. The server can just ignore that though.
Steve
-
@stephenw10 said in Gateway monitor down:
Does it eventually switch back to broadcast and then get a reply from a different server?
I have seen ISPs with badly configured redundant DHCP servers that can behave like that.
You can set the WAN dhcp client to requests a different lease time. The server can just ignore that though.
Steve
No, it doesn't. Though I'm reading that it should do broadcast after several tries. Not sure if there has been any update to pfsense about this causing the behavior to change. And from the logs, it's always talking to the same DHCP server IP.
What it does is that the client sends multiple (no exact number) unicast DHCPREQUESTs to the ISP DHCP server and the server responds with a DHCPNAK eventually. As expected, when the client receives a NAK, it starts the whole DORA process. At this point, the DISCOVER will be a broadcast and it gets completed until the clients gets an ACK from the server.
But then, like I said, the usual unicast process works "most of the time". So that tells me that it's not a case of unicast or broadcast but I don't know what's causing it.
And yes, changing the lease time would probably be ignored by the server. I think it's one of the most basic security mechanisms of DHCP.
-
The DHCP server may have limits set that it ignores requests outside of but it may well accept requests inside that. I have seen similar situations where the DHCP server was handing out a lease that was far too long resolved by doing that. That doesn't fit what you're seeing here exactly though.
Steve
-
@stephenw10 said in Gateway monitor down:
The DHCP server may have limits set that it ignores requests outside of but it may well accept requests inside that. I have seen similar situations where the DHCP server was handing out a lease that was far too long resolved by doing that. That doesn't fit what you're seeing here exactly though.
Steve
I see. But what will increasing the lease time do though?
-
For example it may be something rejecting too frequent requests. Though that seems unlikely at 1800s.
-
So it does look like that they fixed the DHCP lease issue. However, I'm still having issues with gateway monitoring and ping latency in general.
Look how crappy my gateway montioring graph is. It started increasing in latency since Dec. 16:
When I try pinging even just the WAN gateway (a public router IP on my ISP's network), it's very unstable too. It's very hard to explain this to the ISP support agents because they simply don't understand.
-
Looks like the graph didn't upload.
-
@stephenw10 said in Gateway monitor down:
Looks like the graph didn't upload.
Sorry. I edited my post above to fix this. Here's another tracert result that also shows the problem:
-
Here's the latency problem that is evident even when I have my router ping the WAN interface IP (first hop from my router):
PING 112.205.32.1 (112.205.32.1) from {my router's WAN interface IP}: 56 data bytes 64 bytes from 112.205.32.1: icmp_seq=0 ttl=255 time=1242.815 ms 64 bytes from 112.205.32.1: icmp_seq=1 ttl=255 time=1310.078 ms 64 bytes from 112.205.32.1: icmp_seq=2 ttl=255 time=1457.912 ms 64 bytes from 112.205.32.1: icmp_seq=3 ttl=255 time=473.654 ms 64 bytes from 112.205.32.1: icmp_seq=4 ttl=255 time=2.773 ms 64 bytes from 112.205.32.1: icmp_seq=5 ttl=255 time=2.146 ms 64 bytes from 112.205.32.1: icmp_seq=6 ttl=255 time=1.822 ms 64 bytes from 112.205.32.1: icmp_seq=7 ttl=255 time=4.379 ms 64 bytes from 112.205.32.1: icmp_seq=8 ttl=255 time=455.918 ms 64 bytes from 112.205.32.1: icmp_seq=9 ttl=255 time=424.541 ms --- 112.205.32.1 ping statistics --- 10 packets transmitted, 10 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.822/537.604/1457.912/557.557 ms
That's a a ping from the same WAN subnet and should be just less than 1ms or maybe even 2/3ms.
-
Mmm, that's catastrophically bad!
If that's the first hop, and it's not just the gateway not responding to ping, there's not much that pfSense can do about it. I assume you were not saturating the link at that time? -
@stephenw10 said in Gateway monitor down:
Mmm, that's catastrophically bad!
If that's the first hop, and it's not just the gateway not responding to ping, there's not much that pfSense can do about it. I assume you were not saturating the link at that time?Exactly! Yes, the gateway/first hop does respond respond to ping but the latency is very unstable as you see in my last post. No saturation at all.
That's also why I'm very convinced that it's an ISP issue. I just don't know how to dumb it down for them to understand. They all base their "knowledge" on the results of www.speedtest.net. When I do a test there, I do see a 2ms latency which I think is just one of those ping results that's normal. But what's more important is the average of continous ping latency results which is what pfsense does.
-
The gateway itself does not have to respond to ping so results against it directly are not necessarily indicative of an issue.
I would try running smokeping or MTR against a number of external targets and see how that varies.
Steve
-
@stephenw10 said in Gateway monitor down:
The gateway itself does not have to respond to ping so results against it directly are not necessarily indicative of an issue.
I would try running smokeping or MTR against a number of external targets and see how that varies.
Steve
Yeah it doesn't need to respond to ping but shouldn't that be a clear cut yes or no scenario? Since we're seeing that it responds to ping, doesn't that tell us that it is setup to respond to ping?
I do run smokeping and it's also seeing the issue:
-
Ouch, yeah that's pretty bad!
The gateway might respond to ping but it may not be prioritised at all. When the gateway is loaded it might drop ping packets or respond with high latency and that's acceptable if traffic through it is passed as expected. That's not happening here though.
-
@stephenw10 said in Gateway monitor down:
Ouch, yeah that's pretty bad!
Right? And now I'm getting intermittent packet losses with my pfsense gateway monitor. I disable gateway monitoring action for now in pfsense but because there are packet losses I can definitely "feel" how sluggish my Internet browsing is. I probably need to ask my neighbors if they experience the same.
The gateway might respond to ping but it may not be prioritised at all. When the gateway is loaded it might drop ping packets or respond with high latency and that's acceptable if traffic through it is passed as expected. That's not happening here though.
Ahh, that makes sense.
-
I did a bit more troubleshooting and got interesting results:
- Tests from my wired desktop client directly connected to the ISP ONU:
C:\Users\Kevin>ping 112.204.224.1 -t Pinging 112.204.224.1 with 32 bytes of data: Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=1ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=13ms TTL=255 Reply from 112.204.224.1: bytes=32 time=17ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Reply from 112.204.224.1: bytes=32 time=3ms TTL=255 Ping statistics for 112.204.224.1: Packets: Sent = 14, Received = 14, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 17ms, Average = 4ms Control-C ^C C:\Users\Kevin>tracert 112.204.224.1 Tracing route to 112.204.224.1.pldt.net [112.204.224.1] over a maximum of 30 hops: 1 2 ms 3 ms 2 ms 112.204.224.1.pldt.net [112.204.224.1] Trace complete. C:\Users\Kevin>ping 8.8.8.8 -t Pinging 8.8.8.8 with 32 bytes of data: Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=28ms TTL=58 Reply from 8.8.8.8: bytes=32 time=49ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=28ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=28ms TTL=58 Reply from 8.8.8.8: bytes=32 time=28ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Reply from 8.8.8.8: bytes=32 time=28ms TTL=58 Reply from 8.8.8.8: bytes=32 time=27ms TTL=58 Ping statistics for 8.8.8.8: Packets: Sent = 31, Received = 31, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 27ms, Maximum = 49ms, Average = 27ms Control-C ^C C:\Users\Kevin>tracert 8.8.8.8 Tracing route to dns.google [8.8.8.8] over a maximum of 30 hops: 1 3 ms 3 ms 3 ms 112.204.224.1.pldt.net [112.204.224.1] 2 4 ms 3 ms 3 ms 122.2.187.142.static.pldt.net [122.2.187.142] 3 * * * Request timed out. 4 27 ms 27 ms 28 ms 210.213.130.103.static.pldt.net [210.213.130.103] 5 29 ms 28 ms 29 ms 74.125.118.24 6 28 ms 28 ms 28 ms 209.85.244.25 7 24 ms 23 ms 23 ms 216.239.42.89 8 27 ms 27 ms 27 ms dns.google [8.8.8.8] Trace complete.
- Tests from the same wired desktop client connected through my ASUS RT-AC66U (acting as a switch), the same switch connected to my pfsense box, and the ISP ONU connected to the same pfsense box:
C:\Users\Kevin>ping 112.205.32.1 -t Pinging 112.205.32.1 with 32 bytes of data: Reply from 112.205.32.1: bytes=32 time=2ms TTL=254 Reply from 112.205.32.1: bytes=32 time=1ms TTL=254 Reply from 112.205.32.1: bytes=32 time=6ms TTL=254 Reply from 112.205.32.1: bytes=32 time=2180ms TTL=254 Reply from 112.205.32.1: bytes=32 time=1571ms TTL=254 Reply from 112.205.32.1: bytes=32 time=52ms TTL=254 Reply from 112.205.32.1: bytes=32 time=448ms TTL=254 Reply from 112.205.32.1: bytes=32 time=1ms TTL=254 Reply from 112.205.32.1: bytes=32 time=3ms TTL=254 Reply from 112.205.32.1: bytes=32 time=2ms TTL=254 Reply from 112.205.32.1: bytes=32 time=740ms TTL=254 Reply from 112.205.32.1: bytes=32 time=523ms TTL=254 Reply from 112.205.32.1: bytes=32 time=1275ms TTL=254 Reply from 112.205.32.1: bytes=32 time=1318ms TTL=254 Reply from 112.205.32.1: bytes=32 time=17ms TTL=254 Reply from 112.205.32.1: bytes=32 time=88ms TTL=254 Reply from 112.205.32.1: bytes=32 time=4ms TTL=254 Reply from 112.205.32.1: bytes=32 time=3ms TTL=254 Reply from 112.205.32.1: bytes=32 time=3ms TTL=254 Reply from 112.205.32.1: bytes=32 time=523ms TTL=254 Ping statistics for 112.205.32.1: Packets: Sent = 20, Received = 20, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 1ms, Maximum = 2180ms, Average = 438ms Control-C ^C C:\Users\Kevin>tracert 112.205.32.1 Tracing route to 112.205.32.1.pldt.net [112.205.32.1] over a maximum of 30 hops: 1 <1 ms <1 ms <1 ms pfSense.condo.arpa [192.168.20.1] 2 1955 ms 235 ms 94 ms 112.205.32.1.pldt.net [112.205.32.1] Trace complete. C:\Users\Kevin>tracert 8.8.8.8 Tracing route to dns.google [8.8.8.8] over a maximum of 30 hops: 1 <1 ms <1 ms 1 ms pfSense.condo.arpa [192.168.20.1] 2 1193 ms 1072 ms 1043 ms 112.205.32.1.pldt.net [112.205.32.1] 3 11 ms 4 ms 3 ms 122.2.187.146.static.pldt.net [122.2.187.146] 4 * * * Request timed out. 5 615 ms 1429 ms 122 ms 210.213.130.103.static.pldt.net [210.213.130.103] 6 1646 ms 847 ms 769 ms 72.14.195.168 7 24 ms 23 ms 21 ms 108.170.231.19 8 245 ms 917 ms 46 ms 209.85.143.37 9 1570 ms 1371 ms 70 ms dns.google [8.8.8.8] Trace complete.
- Tests from the pfsense diagnostic tools when ISP ONU is connected to it. I've made sure that traceroute using ping (instead of the default UDP) so that it mirrors exactly what tracert in Windows does.
PING 112.205.32.1 (112.205.32.1): 56 data bytes 64 bytes from 112.205.32.1: icmp_seq=0 ttl=255 time=789.747 ms 64 bytes from 112.205.32.1: icmp_seq=1 ttl=255 time=3276.583 ms 64 bytes from 112.205.32.1: icmp_seq=2 ttl=255 time=2379.589 ms 64 bytes from 112.205.32.1: icmp_seq=3 ttl=255 time=1405.570 ms 64 bytes from 112.205.32.1: icmp_seq=4 ttl=255 time=407.121 ms 64 bytes from 112.205.32.1: icmp_seq=5 ttl=255 time=2.525 ms 64 bytes from 112.205.32.1: icmp_seq=6 ttl=255 time=3.033 ms 64 bytes from 112.205.32.1: icmp_seq=7 ttl=255 time=1.394 ms 64 bytes from 112.205.32.1: icmp_seq=8 ttl=255 time=314.137 ms 64 bytes from 112.205.32.1: icmp_seq=9 ttl=255 time=594.438 ms --- 112.205.32.1 ping statistics --- 10 packets transmitted, 10 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 1.394/917.414/3276.583/1058.258 ms
1 112.205.32.1.pldt.net (112.205.32.1) 500.894 ms 66.996 ms 942.851 ms
1 112.205.32.1.pldt.net (112.205.32.1) 1.776 ms 2.692 ms 3.052 ms 2 122.2.187.146.static.pldt.net (122.2.187.146) 2.664 ms 1.913 ms 3.742 ms 3 * * * 4 210.213.130.103.static.pldt.net (210.213.130.103) 534.225 ms 804.088 ms 139.527 ms 5 72.14.195.168 (72.14.195.168) 24.507 ms 22.051 ms 23.182 ms 6 108.170.231.19 (108.170.231.19) 21.379 ms 21.898 ms 21.930 ms 7 209.85.143.37 (209.85.143.37) 29.757 ms 46.183 ms 52.547 ms 8 dns.google (8.8.8.8) 53.694 ms 25.930 ms 25.661 ms
With these tests, I think we can safely say that there's something going on with pfsense that's causing the problem. The ISP is not at fault since everything is working properly when I connect the ONU directly to the desktop client, bypassing pfsense.
We also know that the problem isn't the AC66U switch (or any other wired/wireless client connected to it) because the problem also exists using the pfsense diagnostic tools.
With this troubleshooting development, do you have any ideas where to start looking into?
-
@kevindd992002, you can post a screenshot of your dasboard.
-
Mmm, there are only two things I'm aware of that can produce behaviour like that in pfSense.
-
Active traffic shaping.
-
A bug in 21.05 that affected the SG-3100.
-
-
@silence said in Gateway monitor down:
@kevindd992002, you can post a screenshot of your dasboard.
Yes. Here you go: