Gateway drops and never comes back
-
Hi there -
I'm running in to a new problem with 2.5.x. I'm noticing that one of my gateways (Comcast Internet, DHCP acquired address) seems to disappear momentarily from dpinger's perspective. My Gateway / Gateway Group configuration isn't super interesting, but I could provide it if interested. The Internet connections does come back, but pfSense keeps it down thinking there is packet loss. It's as if it forgot to check. I checked and dpinger for that interface was still running. My other gateway is a manually configured line and doesn't have these issues.
Is the Gateway logs I can see (at failure time a few entries like this, and then no more)
WAN2ONOPT1_DHCP 1.1.1.1: sendto error: 50
However, if I login to pfSense and run dpinger manually, I can verify that pfSense can send packets through this network successfully
e.g. the following succeeds
dpinger -f -B 10.1.10.229 1.1.1.1
Finally, if I force down the Gateway and then bring it back up, that seems to bring things back up to normal.
Would appreciate any help. Happy to provide more configuration details.
Thanks,
Scott -
I have the same problem with my customer but with a different version of pfSense, and our Comcast is not on DHCP. We have a monitor up to let us know when Comcast has issues, so when it stops having issues we can manually down/up the gateway as you are doing.
Is this a very active community? The first few posts I've looked at don't have any responses.
Mainly commenting to boost your posts visibility,
George -
-
This post is deleted! -
@scottmsilver A more correct version.
@scottmsilver @SypsG Ok, I think I worked this out. My theories were wrong.
Basically what is going on is pfSense is forgetting to reset the gateway monitor when the Comcast interface comes back up since it comes back up on the same IP address it was on before.
Here are the details:
- The Comcast interface goes away, so pfSense loses one of its WANs.
- When Comcast comes back pfSense requests a new IP via DHCP.
- Subsequently there is code that is suppose to run when a WAN interfaces gets a new IP.
- This code is guarded by roughly "isSameAddress()" and since Comcast issues the same address, pfSense does not run this code.
- This code, in particular, resets the gateway monitor. Since pfSense does not reset it, the old instance of the gateway monitor (dpinger) will continue to run. However, it can never send out any new ICMP/ping messages because the socket refers to a dead interface and not the new one so no pings come back.
- Thus dpinger never thinks the interface comes back.
- So why does running dpinger from the command line work, even when the gateway monitor instance doesn't? When run dpinger from the command it it gets a working socket for the new interface.
- The "quick but wrong" fix is to make this code on line 204 always run. See that I OR'd in 1 into the conditional below.
if (/*added*/ 1 || !is_ipaddr($oldip) || ($curwanip != $oldip) || (!is_ipaddrv4($config['interfaces'][$interface]['ipaddr']) && ($config['interfaces'][$interface]['ipaddr'] != 'dhcp'))) { /* * Some services (e.g. dyndns, see ticket #4066) depend on * filter_configure() to be called before, otherwise pass out * route-to rules have the old ip set in 'from' and connections * do not go through the correct link */ filter_configure_sync(); /* reconfigure our gateway monitor, dpinger results need to be * available when configuring the default gateway */ setup_gateways_monitor();
-
@scottmsilver If you can reproduce you can enter a bug report at redmine.pfsense.org.
A while ago we had trouble with a client with multi-WAN which wouldn't fail back, and we had to call the gateway page (eventually, via cron but I think we could manually go to the System/Gateways page) and it would realize it was up again. That was resolved I want to say about a year ago?
An alternate workaround would be to disable the gateway monitoring which assumes the connection is always up. (checkbox when editing a gateway)
-
@steveits I think you were likely running into this bug I think I found. I agree your workarounds are reasonable (though not excited about turning off gateway monitoring...:-)). The bug fix I suggested does fix this problem, I think at the root cause. There may be other problems with my solution though, so I'll figure out how to file a bug.
-
@scottmsilver said in Gateway drops and never comes back:
not excited about turning off gateway monitoring
Yeah it's not ideal but if there's only one WAN it kinda doesn't matter. It's not a great a workaround if multiple WAN. See if opening the Gateway page lets pfSense rediscover the gateway is up. IIRC I didn't even have to edit anything just view the page. (which is why we ended up
On many DHCP connections the IP isn't going to change for short disconnections so it sounds like the logic is faulty.
I see you found the existing redmine I just found. I tried finding my old forum topic but couldn't in a quick search.
-
@steveits Thanks. I also found the bug that they were trying to fix that created this (https://redmine.pfsense.org/issues/11142?tab=history)
-
-
-
I might also running into this problem.
I have 4 WANS, with one dynamic IP and the other 3 of them having a fixed IP. Once in a while I notice that pfsense thinks one of them is down ("Offline, Packetloss"). It is not; I also monitor them from the outside with Uptime Robot, so I do know when they have been up/down in case I need to do something about it.
So I go to System -> Routing -> Gateways, edit the gateway, remove the monitor IP, save changes, and it comes back up. Edit the gateway again, configure the same monitor IP it had, and now it will stay up.
I would say that this does not happen to the WAN that has dynamic IP, but I am not so sure. I will keep an eye of this.
Next time I will go the disable/enable gateway route to see if it also works.
-
That was me above ^^
-
So I had this happen to me again tonight. On two different pfSenses, both of them with 2 WANs, were the second WAN has fixed IP and is on the same ISP. They both went down tonight at the same time, and they both came back 13 minutes later. But on pfSense they remained offline.
I tried to disable/enable the gateway as @scottmsilver did above, but I was unable to do so, since they are part of a gateway group.
So I did as usual, remove the monitor IP, so it will use my own routr as monitor IP. A few seconds later the gateway is back up. And reconfigure again the same monitor IP I had.
-
I am having a similar (same?) issue on 22.01 as well as 21.05.1 and 21.05.2 on a SG-2100. I have two gateways, a bog standard configuration composed of a DHCP WAN interface gateway, as well as an OVPN Gateway on a virtual interface. The WAN interface represents a remote cellular connection, and as you might expect, it isn't that stable. I have a gateway monitor applied pinging Cloudflare DNS Servers, and this works until the first time the Gateway goes down. At that point, the Gateway sticks in "Pending, Gathering Data" in the Gateway Group. Just as @scottmsilver points out, in the logs they'll be entries showing sendto errors for a couple of tries, and then nothing more. The Gateway is forever in pending.
For me, the fix is simpler. If I go to System -> Routing -> Edit the WAN_DHCP Gateway, then simply scroll to the bottom and click Save without changing anything on the page, and finally Apply Changes, the Gateway immediately comes Online. In the logs, dpinger immediately logs the configuration of the monitor:
send_interval 500ms loss_interval 2000ms time_period 60000ms report_interval 0ms data_len 1 alert_interval 1000ms latency_alarm 500ms loss_alarm 20% dest_addr 1.1.1.1 bind_addr <ipaddress> identifier "WAN_DHCP "
From there, the Gateway will stay up again until the cellular link is lost, then it will go back to pending. It just seems like dpinger gives up monitoring the new WAN interface every time DHCP applies a new lease on link restoration.
-
I got the same issue here. instable celluar connection, gateway goes to pending, saving the gateway again makes it work again. Now i changed it from DHCP to static , will see if this will be better...
-
-
@scottmsilver thanks for that. I had seen it while making my post and I can confirm it does the job on my side too. Was searching to see if you had opened a bug on redmine for this (I couldn't find one). If you had not, I was going to so that there's at least a chance this can get fixed in a future revision.
-
-
@jimp
I got some systems with multiple WAN on unstable celluar connections, with GW groups.Is there a chance to get this fix or changeset
https://redmine.pfsense.org/projects/pfsense/repository/1/revisions/ec73bb89489d830ec21c4e04ffa3ec401791b55d/diff
for 2.7 as a patch for 2.6?
-
@pete35 In System Patches, Add New Patch and use the ID on that diff page (ec73bb89489d830ec21c4e04ffa3ec401791b55d). The patches just apply the diff to the files on disk.
-
@steveits That's pretty cool. I didn't know about that!