Losing WAN connection intermittently



  • After many years of functioning without problem, my pfsense router has started to lose WAN connection intermittently (varies from 1 to 10 times per week).

    WAN is connected to a cable modem (ISP provided router, used in bridge mode).

    The problem manifests as a dpinger alarm:

    Jul 8 12:25:53	dpinger	WAN_DHCP 8.8.8.8: Alarm latency 11392us stddev 679us loss 25%
    

    and nothing gets trough on WAN, just dead. I know the modem/ISP router itself works fine and still has an internet connection since I can connect to a WiFi on it and immediately get a public IP and use the internet fine.

    If I go go into the interfaces status page and chose to "Release WAN" (Relinquish Lease selected), and the "Renew WAN", pfsense then gets a new IP from the ISP and everything works again.

    The only thing I notice in the logs is that when this happens, pfsense appears to be asking for a new IP from the ISP, but there is no answer:

    Jul 8 10:37:22	dhclient	4870	bound to 83.252.73.22 -- renewal in 5400 seconds.
    Jul 8 12:07:22	dhclient	4870	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 8 12:07:22	dhclient	4870	DHCPACK from 10.0.173.50
    Jul 8 12:07:22	dhclient		RENEW
    Jul 8 12:07:22	dhclient		Creating resolv.conf
    Jul 8 12:07:22	dhclient	4870	bound to 83.252.73.22 -- renewal in 5400 seconds.
    Jul 8 13:37:22	dhclient	4870	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 8 13:37:23	dhclient	4870	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 8 13:37:24	dhclient	4870	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 8 13:37:25	dhclient	4870	DHCPREQUEST on em1 to 10.0.173.50 port 67
    

    After the "Release WAN" and "Renew WAN", the answer seems to come from another address than the one pfsense was asking when it stopped working:

    Jul 8 12:38:25	dhclient		Internet Systems Consortium DHCP Client 4.4.1
    Jul 8 12:38:25	dhclient		Copyright 2004-2018 Internet Systems Consortium.
    Jul 8 12:38:25	dhclient		All rights reserved.
    Jul 8 12:38:25	dhclient		For info, please visit https://www.isc.org/software/dhcp/
    Jul 8 12:38:25	dhclient		Listening on BPF/em1/00:22:4d:9b:65:b2
    Jul 8 12:38:25	dhclient		Sending on BPF/em1/00:22:4d:9b:65:b2
    Jul 8 12:38:25	dhclient		Can't attach interface {} to bpf device /dev/bpf0: Device not configured
    Jul 8 12:38:25	dhclient		If you think you have received this message due to a bug rather
    Jul 8 12:38:25	dhclient		than a configuration issue please read the section on submitting
    Jul 8 12:38:25	dhclient		bugs on either our web page at www.isc.org or in the README file
    Jul 8 12:38:25	dhclient		before submitting a bug. These pages explain the proper
    Jul 8 12:38:25	dhclient		process and the information we find helpful for debugging.
    Jul 8 12:38:25	dhclient		exiting.
    Jul 8 12:38:25	dhclient	81916	connection closed
    Jul 8 12:38:25	dhclient	81916	exiting.
    Jul 8 12:38:38	dhclient		PREINIT
    Jul 8 12:38:38	dhclient	94413	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 8 12:38:39	dhclient	94413	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 8 12:38:41	dhclient	94413	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 8 12:38:45	dhclient	94413	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 8 12:38:54	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 1
    Jul 8 12:38:55	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 1
    Jul 8 12:38:56	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 2
    Jul 8 12:38:58	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 4
    Jul 8 12:39:02	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 6
    Jul 8 12:39:08	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 14
    Jul 8 12:39:22	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 7
    Jul 8 12:39:29	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 13
    Jul 8 12:39:42	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 9
    Jul 8 12:39:51	dhclient	94413	DHCPDISCOVER on em1 to 255.255.255.255 port 67 interval 4
    Jul 8 12:39:51	dhclient	94413	DHCPOFFER from 10.190.1.3
    Jul 8 12:39:51	dhclient		ARPSEND
    Jul 8 12:39:51	dhclient	94413	DHCPOFFER from 10.190.1.2
    Jul 8 12:39:51	dhclient	94413	DHCPOFFER already seen.
    Jul 8 12:39:53	dhclient		ARPCHECK
    Jul 8 12:39:53	dhclient	94413	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 8 12:39:53	dhclient	94413	DHCPACK from 10.190.1.3
    Jul 8 12:39:53	dhclient		BOUND
    Jul 8 12:39:53	dhclient		Deleting old routes
    Jul 8 12:39:53	dhclient		Starting add_new_address()
    Jul 8 12:39:53	dhclient		ifconfig em1 inet 83.252.79.47 netmask 255.255.224.0 broadcast 83.252.95.255
    Jul 8 12:39:53	dhclient		New IP Address (em1): 83.252.79.47
    Jul 8 12:39:53	dhclient		New Subnet Mask (em1): 255.255.224.0
    Jul 8 12:39:53	dhclient		New Broadcast Address (em1): 83.252.95.255
    Jul 8 12:39:53	dhclient		New Routers (em1): 83.252.64.1
    Jul 8 12:39:53	dhclient		Adding new routes to interface: em1
    Jul 8 12:39:53	dhclient		/sbin/route add default 83.252.64.1
    Jul 8 12:39:53	dhclient		Creating resolv.conf
    Jul 8 12:39:54	dhclient	94413	bound to 83.252.79.47 -- renewal in 5083 seconds.
    

    Questions:

    1. Any ideas why this is? Any setting I could try to prevent it from happening? It's as if pfsense is asking for IP renewal, but the server at the ISP has changed IP and won't answer.
    2. Is there a way of making pfsense automatically do the "Release WAN" + "Renew WAN" thing as soon as dpinger craps out? Because that's what I do manually now, and it works every time. Would increase reliability when I'm travelling and need to connect home with VPN.


  • You can play around with the advanced DHCP options in the WAN interface settings.
    For instance try
    Timeout = 10
    Retry = 10



  • Hello!

    Saving Gertjan the typing... :)

    "Monitoring a gateway by using 1.1.1.1 or 8.8.8.8 is a bad idea."

    https://forum.netgate.com/topic/154815/ntp-server-pools-can-t-be-resolved-solved-2-problems-in-1-post/30

    John



  • @serbus
    Well, everybody's got their own opinion
    But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances.



  • Hello!

    More info...

    - Google services, including Google Public DNS, are not designed as ICMP network testing services
    - Many large networks, including Google, rate limit ICMP
    - ICMP ping or traceroute traffic can be discarded or delayed en-route to Google
    

    https://peering.google.com/#/learn-more/faq

    John



  • @dotdash said in Losing WAN connection intermittently:

    @serbus
    Well, everybody's got their own opinion
    But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances.

    I agree with this. This is probably an age old debate. No google DNS service is not a ping tool, but been using 8.8.8.8 for some time and haven't had issues. I've had occasional latency or no response, but nothing minor tweaks couldn't fix yet. I've had issues pinging my ISP 1st/2nd hop router because it was down or traffic was rerouted through another path. Gateway went down even though my connection to the web was fine. Monitor what works for you.

    @comatose_tortoise you might have to go into System/Router/Gateways and play around with Advanced settings in there. Maybe increase the latency and packet loss thresholds. It will take a little longer for your gateway to be marked down but it could help avoid false marked downs. You don't have to copy my settings, but this has been working for me when pinging 8.8.8.8.

    12b247c4-3753-4f95-80d2-0febd9699b4f-image.png



  • @dotdash said in Losing WAN connection intermittently:

    @serbus
    Well, everybody's got their own opinion
    But, I've never had a problem using public DNS servers on over a hundred sites. I have had sites down because the ISP gateway was pingable, but could not reach the Internet and the line never failed over. YMMV, use your own judgment and take your chances.

    There is no easy answer to this. Pinging something like Google Public DNS works until it doesn't. What I mean by that is at any time some particular Google DNS node around the world can decide to stop responding to ICMP for whatever reason. The blackout might be temporary or it might be much longer. See the link provided by @serbus.

    Sometimes the default settings for dpinger can be a bit "aggressive" in my view by pinging too often. There are really two different worlds to think about. If you are a commercial entity with a multiple WAN failover setup, then you need to ping something past the immediate upstream gateway. You need to ping something truly out on the Internet. However, if you are a home user, or if you are a commercial entity with just a single WAN, then just ping your immediate upstream gateway since if you can't get to it, then you aren't getting to the Internet anyway. When millions of users around the world all decide that pinging Google Public DNS is a good idea, then that's when Google is likely to shut that door. After all, the point of 8.8.8.8 is to serve up DNS records and not to answer ICMP pings.



  • @bmeeks said in Losing WAN connection intermittently:

    There is no easy answer to this. Pinging something like Google Public DNS works until it doesn't. What I mean by that is at any time some particular Google DNS node around the world can decide to stop responding to ICMP for whatever reason. The blackout might be temporary or it might be much longer. See the link provided by @serbus.

    Sometimes the default settings for dpinger can be a bit "aggressive" in my view by pinging too often. There are really two different worlds to think about. If you are a commercial entity with a multiple WAN failover setup, then you need to ping something past the immediate upstream gateway. You need to ping something truly out on the Internet. However, if you are a home user, or if you are a commercial entity with just a single WAN, then just ping your immediate upstream gateway since if you can't get to it, then you aren't getting to the Internet anyway. When millions of users around the world all decide that pinging Google Public DNS is a good idea, then that's when Google is likely to shut that door. After all, the point of 8.8.8.8 is to serve up DNS records and not to answer ICMP pings.

    Agree with this.

    Good point on dpinger being aggressive. Twice per second does seem excessive. We probably should back off on that value as well.



  • Thanks for the suggestions. I changed to "Timeout = 10" and "Retry = 10", as well as implemented the same settings that Raffi_ has. Also, I changed the server to ping from 8.8.8.8 to an off site server that I control, and stepped down the aggressiveness of dpinger to not overload it.

    Unfortunately, non of this had effect on the problem.

    I don't get it, recently pfsense appears to be getting a new IP through DHCP, and even before the lease time is out, the connection drops. If I renew the IP, everything works again, but the offer always comes from a different IP than the one it was asking to renew from. Is this normal for ISPs or is it related to my problem you think?

    Also: Is there any way of automation the "Release + Renew" thing of the WAN IP in case the gateway is marked as down?



  • @comatose_tortoise

    Hi,

    We had a similar problem with a Sagemcom DOCSIS modem.

    The solution was to set the interface speed negotiation from auto to fixed.
    This came from different ethernet controller chips on the ports.

    like:
    204e0f4c-883e-4671-8645-28c761d58032-image.png

    I agree with the others regarding the use of the monitor IP.
    Try pinging the ISP gateway by default, if it does not ping (or so does not respond to ICMP), ithen you have to choose another option but not 8.8.8.8 (used a lot).



  • @DaddyGo said in Losing WAN connection intermittently:

    Try pinging the ISP gateway by default, if it does not ping (or so does not respond to ICMP), ithen you have to choose another option but not 8.8.8.8 (used a lot).

    I think I have to start a new thread on the topic of which IP to monitor.



  • @Raffi_ said in Losing WAN connection intermittently:

    I think I have to start a new thread on the topic of which IP to monitor.

    imaginable 馃槈

    Monitoring the ISP gateway may be the best thing to do in this case.
    It gives you accurate measurements of your internet connection.

    Google DNS servers return ICMP with different delays depending on the area, so the information is not relevant.
    (it often also depends on their load, as they were not invented for this purpose)

    Unfortunately, a situation may arise where what is described above is not sustainable.

    For example, Express VPN gateways do not respond to ping.
    Therefore, we tend to use Cloudflare DNS for this purpose, but we must not forget that our pfSense device, ExpVPN server and CloudFlare device are in the same data center fortunately.

    ICMP responses arrive in 1 ms (or even less), but this is a special case, because many of our devices are located in larger data centers.

    ++++edit:

    498032d5-7d0f-4062-ab49-0ee8274e1843-image.png



  • Using something close or something far away is a debatable topic.

    Run these from some client on your LAN. 'traceroute -n -I google.com' and 'traceroute6 -n -I google.com' if needed. These commands work from a mac client, you may need some variant for your client device, in most networks. Using '-l' uses tcp instead of udp so limiters and the like aren't an issue.

    Use the first host that isn't yours as your monitoring address. Typically the second one, the first being your LAN address.

    This means you can reach your ISP. For me these don't change even when my public ip or ipv6 prefix changes (Spectrum/Time Warner), YMMV.



  • @jwj said in Losing WAN connection intermittently:

    Using something close or something far away is a debatable topic.

    Very debatable indeed, which is why I wanted to open that discussion here,

    https://forum.netgate.com/topic/155243/monitor-ip-discussion

    Everyone has a different solution and it's not always a one size fits all situation.



  • @Raffi_ I've used both. Can't say I thought one was better than the other. In terms of latency, high latency is relative not absolute so no advantage one way or the other to monitoring latency.



  • Regarding the IP-ping discussion, I've tried using the gateway, googles DNS, and a third private machine on the internet. The problem occurs regardless of which of them I use, as well as the settings for considering a gateway down.

    Also changed the interface speed as @DaddyGo said, but it didn't have any effect.

    Looking at the logs as this happen, it appears as if pfsense gets a lease normally from my ISP:

    Jul 15 01:45:36	dhclient	22506	DHCPACK from 10.0.173.50
    Jul 15 01:45:36	dhclient		RENEW
    Jul 15 01:45:36	dhclient		Creating resolv.conf
    Jul 15 01:45:36	dhclient	22506	bound to 83.252.76.59 -- renewal in 5400 seconds.
    

    Then, the connection is lost (reason unknown), and the gateway goes down (which is correct, internet connection is lost at this point):

    Jul 15 01:52:35	dpinger	WAN_DHCP 68.66.241.199: Alarm latency 29510us stddev 0us loss 75%
    

    Then, when the lease runs out, pfsense tries to get a new one, but there is no response:

    Jul 15 03:15:36	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:15:42	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:15:58	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:16:24	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:16:40	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:16:47	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:16:57	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:17:17	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:17:27	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:17:41	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:17:53	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:18:09	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:18:52	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:19:45	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:20:42	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:21:23	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:21:51	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:22:27	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:23:36	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:24:08	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:24:20	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:24:33	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:24:51	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:24:58	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:25:12	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:25:25	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:25:32	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:25:48	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:26:17	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:27:32	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:28:45	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:31:10	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:31:43	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:32:26	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:34:00	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:37:52	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:38:12	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:38:40	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:38:55	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:39:16	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:39:29	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:40:05	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:41:15	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:42:22	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:42:40	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:43:11	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:43:33	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:44:13	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:44:23	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:44:36	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:44:51	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:45:05	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:45:38	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:46:09	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:46:49	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:48:35	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:53:42	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:55:06	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:56:11	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 03:57:53	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:00:21	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:01:05	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:01:40	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:02:22	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:03:29	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:04:04	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:05:33	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:08:07	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:08:24	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:08:51	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:09:00	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:09:11	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:09:29	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:10:00	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:11:10	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:13:39	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:16:45	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    Jul 15 04:20:24	dhclient	22506	DHCPREQUEST on em1 to 10.0.173.50 port 67
    

    At this time, for unknown reasons, pfsense then instead asks 255.255.255.255 for an IP, which works, but the DHCPACK comes from a different IP than pfsense was requesting on before:

    Jul 15 04:24:19	dhclient	22506	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 15 04:28:24	dhclient	22506	DHCPREQUEST on em1 to 255.255.255.255 port 67
    Jul 15 04:28:24	dhclient	22506	ip length 336 disagrees with bytes received 362.
    Jul 15 04:28:24	dhclient	22506	accepting packet with data after udp payload.
    Jul 15 04:28:24	dhclient	22506	DHCPACK from 10.190.1.3
    Jul 15 04:28:24	dhclient		RENEW
    Jul 15 04:28:24	dhclient		Creating resolv.conf
    Jul 15 04:28:24	dhclient	22506	bound to 83.252.76.59 -- renewal in 5400 seconds.
    

    After this the gateway goes up and functionality is restored. As I said in the thread start, if I manually release and renew IP on WAN when I notice that I've lost internet connection, it does this last part immediately instead of waiting a long time before asking 255.255.255.255.

    Any ideas on why this is? If I could just make renew IP on 255.255.255.255 as soon as WAN gateway goes down, the problem would be, if not solved, radically less severe.

    EDIT: Additionally, the ISP provided router/modem has internet connection throughout this process, so there's no real "outage", so to speak.



  • @comatose_tortoise

    it will not be easy....

    the key issue on the subject is this:
    cb0224f2-46d6-41fd-bc37-4e15dfcd11d2-image.png

    It is clear that this is not a pfSense problem as pfSense does the thing and broadcast its requests.

    so the ISP router / modem is also UP

    the next question is what kind of pfSense hardware do you have?
    the NIC in particular may be of interest



  • @DaddyGo said in Losing WAN connection intermittently:

    The solution was to set the interface speed negotiation from auto to fixed.

    I trust you set it to fixed at both ends. You shouldn't set it to fixed at one end only. Also, this sounds a bit strange. The connection is negotiated only when the cable is plugged in or a device is turned on. If it happens at any other time, it would indicate a problem somewhere.



  • @DaddyGo

    Well, it might very well be something the ISP is doing with the dynamic (but public) IP that I get from them, and that this does not work with pfsense for some reason. The ISP router/modem (in bridge mode) works, so whatever they are doing, their own hardware handles it fine.

    My pfsense hardware is a dedicated machine only running pfsense, mini-itx board (MP-T3460-D2500CC), two ethernet ports I believe are Intel NICs. 4GB ram, 60 GB SSD, an additional PCI card with dual Intel NICs. This machine has been in use since 2013, and these problems started to occur this year.

    Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?



  • @JKnott said in Losing WAN connection intermittently:

    I trust you set it to fixed at both ends.

    of course at both ends... 馃槈

    this is basically caused by the minimal incompatibility of the ethernet controllers

    think of the great, super, frenetic Realtek miracles....

    ok then:
    so, we use a Raisecom GPON system to serve multiple of our customer
    https://www.raisecom.com/product/gpon-sfu-ont-0

    this ONT miracle, equipped with a Realtek ethernet controlle and has compatibility issues with the Intel i210-AT controller

    (so we either replace hundreds of ONTs or solve this issue as we can)

    BTW:
    anyway the ISCOM system is damn good just shit on the ONT theme
    (of course, not all endpoints are Intel versus ONT Realtek eth.

    the above, was experimented with the manufacturer...



  • @DaddyGo

    One would think that an Ethernet negotiation issue would cause a solid problem, not intermittent. Is there something triggering a disconnect/reconnect? Does the link show down? Also, an intermittent failure shouldn't cause DHCP problems, unless it's long enough for the lease to expire.



  • @comatose_tortoise said in Losing WAN connection intermittently:

    Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?

    Do you have a packet capture of the full DHCP sequence?



  • Ok, so now I changed the NIC for my WAN to one I know for sure is an Intel NIC.

    @JKnott said in Losing WAN connection intermittently:

    @comatose_tortoise said in Losing WAN connection intermittently:

    Why does pfsense first make DHCPREQUEST to one address, but after a while changes to another?

    Do you have a packet capture of the full DHCP sequence?

    Nope, never done that. I'll try to look into it. But I guess I would need to do it at the exact moment the error occurs? Could be problematic due to the intermittent behavior.



  • I didn't see this mentioned or suggested yet, so I'll be that guy... you did check the cable between pfSense and the modem? If you have no way of checking it, replace it with a known good cable.



  • @comatose_tortoise

    That depends on how you do it. If you use Packet Capture, it would be difficult to catch the first half, though you might be able to if you do a release/renew. The other way is to use a data tap, as I mentioned above, then reboot pfSense to get the initial sequence and as many renews as you want. One advantage of this method, using Wireshark, is you can watch what's happening, without stopping the capture. On the negative side, if it really is a NIC negotiation issue, this might mask it.



  • @JKnott said in Losing WAN connection intermittently:

    Is there something triggering a disconnect/reconnect?

    Who knows deeply the Realtek鈥檚 inner world?
    The fact is, causes intermittent problems as we are already past this examination and I have read about similar problems in on the other forums.

    What do you think of a periodic heat run?
    (since it starts with a significant packet loss and not basically with dhcp problem)

    something more came to mind because the ISP is not a god:

    our other typical case is with Telecom (HU) ISP DOCSIS Cisco CMTS and edgeQAM, using the DHCP allocation method...

    the problem is caused by the tightly configured Cisco IOS - Prerequisites for Cable DHCP Leasequery + DHCP MAC Address Exclusion List

    the error phenomenon is very similar to the OP s issue

    BTW:
    -Raisecom replaced the ONT ethernet controller with i211 and all problems went away.



  • @comatose_tortoise said in Losing WAN connection intermittently:

    After this the gateway goes up and functionality is restored. As I said in the thread start, if I manually release and renew IP on WAN when I notice that I've lost internet connection, it does this last part immediately instead of waiting a long time before asking 255.255.255.255.
    Any ideas on why this is? If I could just make renew IP on 255.255.255.255 as soon as WAN gateway goes down, the problem would be, if not solved, radically less severe.

    Hello!

    Are the 10.x.x.x DHCP servers relays? Maybe they wont accept unicast dhcp request and require broadcasts?

    https://forum.netgate.com/topic/112869/dhclient-on-wan-occasionally-fails-to-renew-lease-with-cable-isp

    https://social.technet.microsoft.com/Forums/windows/en-US/69a3a8f6-8199-4f24-8d4a-a4b5a083176b/why-cant-windows-7-be-forced-to-use-dhcp-broadcast-lease-renewal

    John



  • @serbus said in Losing WAN connection intermittently:

    Maybe they wont accept unicast dhcp request and require broadcasts?

    the problem is only intermittent, so it is not relevant 馃槈



  • @DaddyGo said in Losing WAN connection intermittently:

    What do you think of a periodic heat run?

    That could be a possibility. I've seen stuff fail when it gets warm. Many years ago, I learned about some stuff called "Freeze Mist", which was handy for locating thermal problems. It was also useful for putting frost on a penny. 馃槈



  • @JKnott said in Losing WAN connection intermittently:

    @DaddyGo said in Losing WAN connection intermittently:

    What do you think of a periodic heat run?

    That could be a possibility. I've seen stuff fail when it gets warm. Many years ago, I learned about some stuff called "Freeze Mist", which was handy for locating thermal problems. It was also useful for putting frost on a penny. 馃槈

    A can of compressed air held upside down does the same thing. I think the "freeze mist" and compressed air are the same product with the can and labels flipped :)



  • @Raffi_ said in Losing WAN connection intermittently:

    A can of compressed air held upside down does the same thing.

    贸贸贸贸, the blessed physics and the expanding gases 馃槈


Log in to reply