Losing WAN connection intermittently
-
You can play around with the advanced DHCP options in the WAN interface settings.
For instance try
Timeout = 10
Retry = 10 -
Hello!
Saving Gertjan the typing... :)
"Monitoring a gateway by using 1.1.1.1 or 8.8.8.8 is a bad idea."
https://forum.netgate.com/topic/154815/ntp-server-pools-can-t-be-resolved-solved-2-problems-in-1-post/30
John
-
@serbus
Well, everybody's got their own opinion
But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances. -
Hello!
More info...
- Google services, including Google Public DNS, are not designed as ICMP network testing services - Many large networks, including Google, rate limit ICMP - ICMP ping or traceroute traffic can be discarded or delayed en-route to Google
https://peering.google.com/#/learn-more/faq
John
-
@dotdash said in Losing WAN connection intermittently:
@serbus
Well, everybody's got their own opinion
But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances.I agree with this. This is probably an age old debate. No google DNS service is not a ping tool, but been using 8.8.8.8 for some time and haven't had issues. I've had occasional latency or no response, but nothing minor tweaks couldn't fix yet. I've had issues pinging my ISP 1st/2nd hop router because it was down or traffic was rerouted through another path. Gateway went down even though my connection to the web was fine. Monitor what works for you.
@comatose_tortoise you might have to go into System/Router/Gateways and play around with Advanced settings in there. Maybe increase the latency and packet loss thresholds. It will take a little longer for your gateway to be marked down but it could help avoid false marked downs. You don't have to copy my settings, but this has been working for me when pinging 8.8.8.8.
-
@dotdash said in Losing WAN connection intermittently:
@serbus
Well, everybody's got their own opinion
But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances.There is no easy answer to this. Pinging something like Google Public DNS works until it doesn't. What I mean by that is at any time some particular Google DNS node around the world can decide to stop responding to ICMP for whatever reason. The blackout might be temporary or it might be much longer. See the link provided by @serbus.
Sometimes the default settings for
dpinger
can be a bit "aggressive" in my view by pinging too often. There are really two different worlds to think about. If you are a commercial entity with a multiple WAN failover setup, then you need to ping something past the immediate upstream gateway. You need to ping something truly out on the Internet. However, if you are a home user, or if you are a commercial entity with just a single WAN, then just ping your immediate upstream gateway since if you can't get to it, then you aren't getting to the Internet anyway. When millions of users around the world all decide that pinging Google Public DNS is a good idea, then that's when Google is likely to shut that door. After all, the point of 8.8.8.8 is to serve up DNS records and not to answer ICMP pings. -
@bmeeks said in Losing WAN connection intermittently:
There is no easy answer to this. Pinging something like Google Public DNS works until it doesn't. What I mean by that is at any time some particular Google DNS node around the world can decide to stop responding to ICMP for whatever reason. The blackout might be temporary or it might be much longer. See the link provided by @serbus.
Sometimes the default settings for
dpinger
can be a bit "aggressive" in my view by pinging too often. There are really two different worlds to think about. If you are a commercial entity with a multiple WAN failover setup, then you need to ping something past the immediate upstream gateway. You need to ping something truly out on the Internet. However, if you are a home user, or if you are a commercial entity with just a single WAN, then just ping your immediate upstream gateway since if you can't get to it, then you aren't getting to the Internet anyway. When millions of users around the world all decide that pinging Google Public DNS is a good idea, then that's when Google is likely to shut that door. After all, the point of 8.8.8.8 is to serve up DNS records and not to answer ICMP pings.Agree with this.
Good point on dpinger being aggressive. Twice per second does seem excessive. We probably should back off on that value as well.
-
Thanks for the suggestions. I changed to "Timeout = 10" and "Retry = 10", as well as implemented the same settings that Raffi_ has. Also, I changed the server to ping from 8.8.8.8 to an off site server that I control, and stepped down the aggressiveness of dpinger to not overload it.
Unfortunately, non of this had effect on the problem.
I don't get it, recently pfsense appears to be getting a new IP through DHCP, and even before the lease time is out, the connection drops. If I renew the IP, everything works again, but the offer always comes from a different IP than the one it was asking to renew from. Is this normal for ISPs or is it related to my problem you think?
Also: Is there any way of automation the "Release + Renew" thing of the WAN IP in case the gateway is marked as down?
-
Hi,
We had a similar problem with a Sagemcom DOCSIS modem.
The solution was to set the interface speed negotiation from auto to fixed.
This came from different ethernet controller chips on the ports.like:
I agree with the others regarding the use of the monitor IP.
Try pinging the ISP gateway by default, if it does not ping (or so does not respond to ICMP), ithen you have to choose another option but not 8.8.8.8 (used a lot). -
@DaddyGo said in Losing WAN connection intermittently:
Try pinging the ISP gateway by default, if it does not ping (or so does not respond to ICMP), ithen you have to choose another option but not 8.8.8.8 (used a lot).
I think I have to start a new thread on the topic of which IP to monitor.
-
@Raffi_ said in Losing WAN connection intermittently:
I think I have to start a new thread on the topic of which IP to monitor.
imaginable
Monitoring the ISP gateway may be the best thing to do in this case.
It gives you accurate measurements of your internet connection.Google DNS servers return ICMP with different delays depending on the area, so the information is not relevant.
(it often also depends on their load, as they were not invented for this purpose)Unfortunately, a situation may arise where what is described above is not sustainable.
For example, Express VPN gateways do not respond to ping.
Therefore, we tend to use Cloudflare DNS for this purpose, but we must not forget that our pfSense device, ExpVPN server and CloudFlare device are in the same data center fortunately.ICMP responses arrive in 1 ms (or even less), but this is a special case, because many of our devices are located in larger data centers.
++++edit:
-
Using something close or something far away is a debatable topic.
Run these from some client on your LAN. 'traceroute -n -I google.com' and 'traceroute6 -n -I google.com' if needed. These commands work from a mac client, you may need some variant for your client device, in most networks. Using '-l' uses tcp instead of udp so limiters and the like aren't an issue.
Use the first host that isn't yours as your monitoring address. Typically the second one, the first being your LAN address.
This means you can reach your ISP. For me these don't change even when my public ip or ipv6 prefix changes (Spectrum/Time Warner), YMMV.
-
@jwj said in Losing WAN connection intermittently:
Using something close or something far away is a debatable topic.
Very debatable indeed, which is why I wanted to open that discussion here,
https://forum.netgate.com/topic/155243/monitor-ip-discussion
Everyone has a different solution and it's not always a one size fits all situation.
-
@Raffi_ I've used both. Can't say I thought one was better than the other. In terms of latency, high latency is relative not absolute so no advantage one way or the other to monitoring latency.
-
Regarding the IP-ping discussion, I've tried using the gateway, googles DNS, and a third private machine on the internet. The problem occurs regardless of which of them I use, as well as the settings for considering a gateway down.
Also changed the interface speed as @DaddyGo said, but it didn't have any effect.
Looking at the logs as this happen, it appears as if pfsense gets a lease normally from my ISP:
Jul 15 01:45:36 dhclient 22506 DHCPACK from 10.0.173.50 Jul 15 01:45:36 dhclient RENEW Jul 15 01:45:36 dhclient Creating resolv.conf Jul 15 01:45:36 dhclient 22506 bound to 83.252.76.59 -- renewal in 5400 seconds.
Then, the connection is lost (reason unknown), and the gateway goes down (which is correct, internet connection is lost at this point):
Jul 15 01:52:35 dpinger WAN_DHCP 68.66.241.199: Alarm latency 29510us stddev 0us loss 75%
Then, when the lease runs out, pfsense tries to get a new one, but there is no response:
Jul 15 03:15:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:15:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:15:58 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:47 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:57 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:17 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:27 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:41 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:53 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:18:09 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:18:52 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:19:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:20:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:21:23 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:21:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:22:27 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:23:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:08 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:20 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:58 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:12 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:25 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:32 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:48 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:26:17 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:27:32 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:28:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:31:10 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:31:43 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:32:26 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:34:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:37:52 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:12 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:55 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:39:16 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:39:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:40:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:41:15 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:42:22 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:42:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:43:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:43:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:13 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:23 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:45:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:45:38 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:46:09 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:46:49 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:48:35 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:53:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:55:06 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:56:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:57:53 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:00:21 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:01:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:01:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:02:22 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:03:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:04:04 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:05:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:07 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:10:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:11:10 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:13:39 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:16:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:20:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67
At this time, for unknown reasons, pfsense then instead asks 255.255.255.255 for an IP, which works, but the DHCPACK comes from a different IP than pfsense was requesting on before:
Jul 15 04:24:19 dhclient 22506 DHCPREQUEST on em1 to 255.255.255.255 port 67 Jul 15 04:28:24 dhclient 22506 DHCPREQUEST on em1 to 255.255.255.255 port 67 Jul 15 04:28:24 dhclient 22506 ip length 336 disagrees with bytes received 362. Jul 15 04:28:24 dhclient 22506 accepting packet with data after udp payload. Jul 15 04:28:24 dhclient 22506 DHCPACK from 10.190.1.3 Jul 15 04:28:24 dhclient RENEW Jul 15 04:28:24 dhclient Creating resolv.conf Jul 15 04:28:24 dhclient 22506 bound to 83.252.76.59 -- renewal in 5400 seconds.
After this the gateway goes up and functionality is restored. As I said in the thread start, if I manually release and renew IP on WAN when I notice that I've lost internet connection, it does this last part immediately instead of waiting a long time before asking 255.255.255.255.
Any ideas on why this is? If I could just make renew IP on 255.255.255.255 as soon as WAN gateway goes down, the problem would be, if not solved, radically less severe.
EDIT: Additionally, the ISP provided router/modem has internet connection throughout this process, so there's no real "outage", so to speak.
-
it will not be easy....
the key issue on the subject is this:
It is clear that this is not a pfSense problem as pfSense does the thing and broadcast its requests.
so the ISP router / modem is also UP
the next question is what kind of pfSense hardware do you have?
the NIC in particular may be of interest -
@DaddyGo said in Losing WAN connection intermittently:
The solution was to set the interface speed negotiation from auto to fixed.
I trust you set it to fixed at both ends. You shouldn't set it to fixed at one end only. Also, this sounds a bit strange. The connection is negotiated only when the cable is plugged in or a device is turned on. If it happens at any other time, it would indicate a problem somewhere.
-
Well, it might very well be something the ISP is doing with the dynamic (but public) IP that I get from them, and that this does not work with pfsense for some reason. The ISP router/modem (in bridge mode) works, so whatever they are doing, their own hardware handles it fine.
My pfsense hardware is a dedicated machine only running pfsense, mini-itx board (MP-T3460-D2500CC), two ethernet ports I believe are Intel NICs. 4GB ram, 60 GB SSD, an additional PCI card with dual Intel NICs. This machine has been in use since 2013, and these problems started to occur this year.
Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?
-
@JKnott said in Losing WAN connection intermittently:
I trust you set it to fixed at both ends.
of course at both ends...
this is basically caused by the minimal incompatibility of the ethernet controllers
think of the great, super, frenetic Realtek miracles....
ok then:
so, we use a Raisecom GPON system to serve multiple of our customer
https://www.raisecom.com/product/gpon-sfu-ont-0this ONT miracle, equipped with a Realtek ethernet controlle and has compatibility issues with the Intel i210-AT controller
(so we either replace hundreds of ONTs or solve this issue as we can)
BTW:
anyway the ISCOM system is damn good just shit on the ONT theme
(of course, not all endpoints are Intel versus ONT Realtek eth.the above, was experimented with the manufacturer...
-
One would think that an Ethernet negotiation issue would cause a solid problem, not intermittent. Is there something triggering a disconnect/reconnect? Does the link show down? Also, an intermittent failure shouldn't cause DHCP problems, unless it's long enough for the lease to expire.