PFsense random loss of WAN gateway
-
I believe this is more of an ISP issue than my home networking solution, but I'll elaborate.
Network = FIBER(metronet) > ISP-Modem > Protectli Vault > Switch > PC.
Basically, at random times every day or two my home network gateway(modem) is lost and the pfsense device thinks there is no internet connection. Indeed, if I reset either the modem -OR- the Protectli everything works fine.
I have disabled gateway monitoring, but even with that disabled - internet went down a few minutes before this post. Once again, solution was simply to restart the pfsense device.
Rather than to try to teach my wife how to restart the device, or constantly check myself ... I'd love to just run some sort of rule or command to restart the WAN gateway/firewall every time it thinks it's down. And continue "gateway monitoring".
I have CRON package installed. I was going to go with a command to reboot the device/pfsense every day, but this would still possibly lead to hours of the connection being down. If that happens, my battery powered cameras outside lose 6 months of battery life in like 1 hour lol.
Does anyone know a simple way to just use the gateway monitoring and run a reboot on my device whenever it's down? Thanks in advance, I am indeed not a software engineer so much of this is new to me.
-
The problem with a command or script like that is that it can very easily get stuck in a loop continually restarting the WAN. It would be better to determine exactly what is being lost and see if we can see why it's happening.
If you simply unplug and reattach the WAN cable does that also restore the connection?
When it fails I assume you see pfSense report the WAN gateway as down? What IP address is that monitoring? The gateway IP directly? That's the default but often an ISPs gateway may not reliably respond to ping.
Does the gateway IP still appear as a current ARP entry in Diag > ARP?
Steve
-
Thanks for the reply. I think I may have come to that better solution. I also posted on Reddit and someone had me go through logs. I have since turned back on gateway monitoring. They suggested the following:
"Turn gateway monitoring back on. Your issue is not with that. It's with Metronet DHCP relays not responding to unicast renewals, the logs just confirm my suspicions. Perform the following. Goto interfaces > WAN and under DHCP client configuration check the box "Advanced configuration" and under presets select FreeBSD default. Then further down under Lease requirements and requests in the box "Option Modifiers" enter the following supersede dhcp-server-identifier 255.255.255.255
This will ensure you are using the proper renewal timing requests and not an older saved config and also forces PFsense to request WAN lease renewal on broadcast. Since you are behind CGNAT you request your external IP from a DHCP relay that controls your assigned subnet and sends the request to their central server that manages entire groups of subnets. This is a requirement for CGNAT to work with the topology they have setup and they just don't know what they are doing to properly route a unicast request up to the DHCP servers, they just have really basic routing for broadcasts on port 67 I think. "
-
Oh that is fun! Yeah that would have taken a while to pin down if you've never seen if before. I'll have to make a note of it.
Let us know if that solved it for you.
Steve
-
Just wanted to let you know that this was indeed the solution. Thanks for being responsive, regardless!! Now I just wish I understood some of the terminology lol.
-
I'm on a different ISP than yourself (lilaconnect) but am also behind CGNAT and was having exactly the same symptoms. I made the changes which you suggested and I haven't had a disconnect for over 48 hours now.
I don't understand what this does or why it works but thank you so much, this has been driving me crazy.
-
Usually when pfSense, or any rational DHCP client, renews the lease it sends the renewal query to the DHCP server dircetly (unicast). It knows the DHCP server address because it already has a lease from there. The DHCP client will only send a broadcast looking for a DHCP server if it has no lease (when it's initially connected) or if it eventually sees no response from an existing server.
In this case the upstream DHCP server only responds to broadcast queries so the standard renewals fail. Setting that option modifier causes the dhclient to to use the broadcast address for renewals.
I agree with the poster on Reddit this seems like a badly configured (broken) dhcp relay.
-
Thanks for this explanation it makes a lot more sense now. I'm going to give this feedback to my ISP, not that I expect them to change anything.
-
Well Holy Crap @johnnyf1ve! I have been dealing with this same problem with Metronet for over 2 years. The cron script you initially wanted to implement is here. I've been using it for a long time to work around the issue.
I too have a Protectli appliance right after Metronet's Nokia modem. Bet you got one of those modem's too. I can say it's definitely not a hardware issue on my pfSense appliance because I've used the Protectli and tested with an old Dell with a Dual NIC Intel NIC. Same problems with dropped WAN.
I have long thought this is a unique problem with Metronet. They have cheap Fiber and you gotta cut costs somewhere. I have other Protectli boxes in other environments running with Metronet with the same result of losing the WAN connection. If you have a static IP from Metronet the problem of course goes away and that was the ultimate fix.
I just made the changes that you have found from the Reddit user and I'll see how it goes. I'll probably keep my Cron pintest.sh scheduled just in case. I can check the log files to see if it's actually resetting the interface after your suggestion. Thanks man!
Lastly, here are my last few pingtest.sh logs with the dropped connection from the past few weeks. When the ping test fails it turns the WAN interface off for a few seconds and then right back on which fixes the problem 99% of the time. As you can see, this happens a lot.
20220917.181601 All pings failed. Resetting interface igb0.
20220917.181635 All pings failed. Resetting interface igb0.
20220919.071749 All pings failed. Resetting interface igb0.
20220920.211301 All pings failed. Resetting interface igb0.
20220920.211335 All pings failed. Resetting interface igb0.
20220922.180104 All pings failed. Resetting interface igb0.
20220922.180138 All pings failed. Resetting interface igb0.
20220924.160249 All pings failed. Resetting interface igb0.
20220924.160323 All pings failed. Resetting interface igb0.
20220927.105456 All pings failed. Resetting interface igb0.
20220927.105530 All pings failed. Resetting interface igb0.Thanks MEtroNeT!
-
-
-
Re: PFsense random loss of WAN gateway
I just wanted to add my thanks!
I have a Telia Fiber connection and it would lose WAN every six hours. Turns out that the Telia DHCP server only allows a limited number of renewals after which it demands a broadcast again.
The above option to always broadcast works fine.
It took me several month to find this solution! Thanks again!