Losing WAN connection intermittently
-
@DaddyGo said in Losing WAN connection intermittently:
Try pinging the ISP gateway by default, if it does not ping (or so does not respond to ICMP), ithen you have to choose another option but not 8.8.8.8 (used a lot).
I think I have to start a new thread on the topic of which IP to monitor.
-
@Raffi_ said in Losing WAN connection intermittently:
I think I have to start a new thread on the topic of which IP to monitor.
imaginable
Monitoring the ISP gateway may be the best thing to do in this case.
It gives you accurate measurements of your internet connection.Google DNS servers return ICMP with different delays depending on the area, so the information is not relevant.
(it often also depends on their load, as they were not invented for this purpose)Unfortunately, a situation may arise where what is described above is not sustainable.
For example, Express VPN gateways do not respond to ping.
Therefore, we tend to use Cloudflare DNS for this purpose, but we must not forget that our pfSense device, ExpVPN server and CloudFlare device are in the same data center fortunately.ICMP responses arrive in 1 ms (or even less), but this is a special case, because many of our devices are located in larger data centers.
++++edit:
-
Using something close or something far away is a debatable topic.
Run these from some client on your LAN. 'traceroute -n -I google.com' and 'traceroute6 -n -I google.com' if needed. These commands work from a mac client, you may need some variant for your client device, in most networks. Using '-l' uses tcp instead of udp so limiters and the like aren't an issue.
Use the first host that isn't yours as your monitoring address. Typically the second one, the first being your LAN address.
This means you can reach your ISP. For me these don't change even when my public ip or ipv6 prefix changes (Spectrum/Time Warner), YMMV.
-
@jwj said in Losing WAN connection intermittently:
Using something close or something far away is a debatable topic.
Very debatable indeed, which is why I wanted to open that discussion here,
https://forum.netgate.com/topic/155243/monitor-ip-discussion
Everyone has a different solution and it's not always a one size fits all situation.
-
@Raffi_ I've used both. Can't say I thought one was better than the other. In terms of latency, high latency is relative not absolute so no advantage one way or the other to monitoring latency.
-
Regarding the IP-ping discussion, I've tried using the gateway, googles DNS, and a third private machine on the internet. The problem occurs regardless of which of them I use, as well as the settings for considering a gateway down.
Also changed the interface speed as @DaddyGo said, but it didn't have any effect.
Looking at the logs as this happen, it appears as if pfsense gets a lease normally from my ISP:
Jul 15 01:45:36 dhclient 22506 DHCPACK from 10.0.173.50 Jul 15 01:45:36 dhclient RENEW Jul 15 01:45:36 dhclient Creating resolv.conf Jul 15 01:45:36 dhclient 22506 bound to 83.252.76.59 -- renewal in 5400 seconds.
Then, the connection is lost (reason unknown), and the gateway goes down (which is correct, internet connection is lost at this point):
Jul 15 01:52:35 dpinger WAN_DHCP 68.66.241.199: Alarm latency 29510us stddev 0us loss 75%
Then, when the lease runs out, pfsense tries to get a new one, but there is no response:
Jul 15 03:15:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:15:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:15:58 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:47 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:16:57 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:17 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:27 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:41 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:17:53 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:18:09 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:18:52 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:19:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:20:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:21:23 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:21:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:22:27 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:23:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:08 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:20 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:24:58 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:12 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:25 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:32 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:25:48 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:26:17 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:27:32 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:28:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:31:10 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:31:43 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:32:26 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:34:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:37:52 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:12 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:38:55 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:39:16 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:39:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:40:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:41:15 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:42:22 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:42:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:43:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:43:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:13 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:23 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:36 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:44:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:45:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:45:38 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:46:09 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:46:49 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:48:35 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:53:42 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:55:06 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:56:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 03:57:53 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:00:21 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:01:05 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:01:40 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:02:22 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:03:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:04:04 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:05:33 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:07 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:08:51 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:11 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:09:29 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:10:00 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:11:10 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:13:39 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:16:45 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67 Jul 15 04:20:24 dhclient 22506 DHCPREQUEST on em1 to 10.0.173.50 port 67
At this time, for unknown reasons, pfsense then instead asks 255.255.255.255 for an IP, which works, but the DHCPACK comes from a different IP than pfsense was requesting on before:
Jul 15 04:24:19 dhclient 22506 DHCPREQUEST on em1 to 255.255.255.255 port 67 Jul 15 04:28:24 dhclient 22506 DHCPREQUEST on em1 to 255.255.255.255 port 67 Jul 15 04:28:24 dhclient 22506 ip length 336 disagrees with bytes received 362. Jul 15 04:28:24 dhclient 22506 accepting packet with data after udp payload. Jul 15 04:28:24 dhclient 22506 DHCPACK from 10.190.1.3 Jul 15 04:28:24 dhclient RENEW Jul 15 04:28:24 dhclient Creating resolv.conf Jul 15 04:28:24 dhclient 22506 bound to 83.252.76.59 -- renewal in 5400 seconds.
After this the gateway goes up and functionality is restored. As I said in the thread start, if I manually release and renew IP on WAN when I notice that I've lost internet connection, it does this last part immediately instead of waiting a long time before asking 255.255.255.255.
Any ideas on why this is? If I could just make renew IP on 255.255.255.255 as soon as WAN gateway goes down, the problem would be, if not solved, radically less severe.
EDIT: Additionally, the ISP provided router/modem has internet connection throughout this process, so there's no real "outage", so to speak.
-
it will not be easy....
the key issue on the subject is this:
It is clear that this is not a pfSense problem as pfSense does the thing and broadcast its requests.
so the ISP router / modem is also UP
the next question is what kind of pfSense hardware do you have?
the NIC in particular may be of interest -
@DaddyGo said in Losing WAN connection intermittently:
The solution was to set the interface speed negotiation from auto to fixed.
I trust you set it to fixed at both ends. You shouldn't set it to fixed at one end only. Also, this sounds a bit strange. The connection is negotiated only when the cable is plugged in or a device is turned on. If it happens at any other time, it would indicate a problem somewhere.
-
Well, it might very well be something the ISP is doing with the dynamic (but public) IP that I get from them, and that this does not work with pfsense for some reason. The ISP router/modem (in bridge mode) works, so whatever they are doing, their own hardware handles it fine.
My pfsense hardware is a dedicated machine only running pfsense, mini-itx board (MP-T3460-D2500CC), two ethernet ports I believe are Intel NICs. 4GB ram, 60 GB SSD, an additional PCI card with dual Intel NICs. This machine has been in use since 2013, and these problems started to occur this year.
Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?
-
@JKnott said in Losing WAN connection intermittently:
I trust you set it to fixed at both ends.
of course at both ends...
this is basically caused by the minimal incompatibility of the ethernet controllers
think of the great, super, frenetic Realtek miracles....
ok then:
so, we use a Raisecom GPON system to serve multiple of our customer
https://www.raisecom.com/product/gpon-sfu-ont-0this ONT miracle, equipped with a Realtek ethernet controlle and has compatibility issues with the Intel i210-AT controller
(so we either replace hundreds of ONTs or solve this issue as we can)
BTW:
anyway the ISCOM system is damn good just shit on the ONT theme
(of course, not all endpoints are Intel versus ONT Realtek eth.the above, was experimented with the manufacturer...
-
One would think that an Ethernet negotiation issue would cause a solid problem, not intermittent. Is there something triggering a disconnect/reconnect? Does the link show down? Also, an intermittent failure shouldn't cause DHCP problems, unless it's long enough for the lease to expire.
-
@comatose_tortoise said in Losing WAN connection intermittently:
Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?
Do you have a packet capture of the full DHCP sequence?
-
Ok, so now I changed the NIC for my WAN to one I know for sure is an Intel NIC.
@JKnott said in Losing WAN connection intermittently:
@comatose_tortoise said in Losing WAN connection intermittently:
Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?
Do you have a packet capture of the full DHCP sequence?
Nope, never done that. I'll try to look into it. But I guess I would need to do it at the exact moment the error occurs? Could be problematic due to the intermittent behavior.
-
I didn't see this mentioned or suggested yet, so I'll be that guy... you did check the cable between pfSense and the modem? If you have no way of checking it, replace it with a known good cable.
-
That depends on how you do it. If you use Packet Capture, it would be difficult to catch the first half, though you might be able to if you do a release/renew. The other way is to use a data tap, as I mentioned above, then reboot pfSense to get the initial sequence and as many renews as you want. One advantage of this method, using Wireshark, is you can watch what's happening, without stopping the capture. On the negative side, if it really is a NIC negotiation issue, this might mask it.
-
@JKnott said in Losing WAN connection intermittently:
Is there something triggering a disconnect/reconnect?
Who knows deeply the Realtekโs inner world?
The fact is, causes intermittent problems as we are already past this examination and I have read about similar problems in on the other forums.What do you think of a periodic heat run?
(since it starts with a significant packet loss and not basically with dhcp problem)something more came to mind because the ISP is not a god:
our other typical case is with Telecom (HU) ISP DOCSIS Cisco CMTS and edgeQAM, using the DHCP allocation method...
the problem is caused by the tightly configured Cisco IOS - Prerequisites for Cable DHCP Leasequery + DHCP MAC Address Exclusion List
the error phenomenon is very similar to the OP s issue
BTW:
-Raisecom replaced the ONT ethernet controller with i211 and all problems went away. -
@comatose_tortoise said in Losing WAN connection intermittently:
After this the gateway goes up and functionality is restored. As I said in the thread start, if I manually release and renew IP on WAN when I notice that I've lost internet connection, it does this last part immediately instead of waiting a long time before asking 255.255.255.255.
Any ideas on why this is? If I could just make renew IP on 255.255.255.255 as soon as WAN gateway goes down, the problem would be, if not solved, radically less severe.Hello!
Are the 10.x.x.x DHCP servers relays? Maybe they wont accept unicast dhcp request and require broadcasts?
https://forum.netgate.com/topic/112869/dhclient-on-wan-occasionally-fails-to-renew-lease-with-cable-isp
https://social.technet.microsoft.com/Forums/windows/en-US/69a3a8f6-8199-4f24-8d4a-a4b5a083176b/why-cant-windows-7-be-forced-to-use-dhcp-broadcast-lease-renewal
John
-
@serbus said in Losing WAN connection intermittently:
Maybe they wont accept unicast dhcp request and require broadcasts?
the problem is only intermittent, so it is not relevant
-
@DaddyGo said in Losing WAN connection intermittently:
What do you think of a periodic heat run?
That could be a possibility. I've seen stuff fail when it gets warm. Many years ago, I learned about some stuff called "Freeze Mist", which was handy for locating thermal problems. It was also useful for putting frost on a penny.
-
@JKnott said in Losing WAN connection intermittently:
@DaddyGo said in Losing WAN connection intermittently:
What do you think of a periodic heat run?
That could be a possibility. I've seen stuff fail when it gets warm. Many years ago, I learned about some stuff called "Freeze Mist", which was handy for locating thermal problems. It was also useful for putting frost on a penny.
A can of compressed air held upside down does the same thing. I think the "freeze mist" and compressed air are the same product with the can and labels flipped :)