Gateway monitor IPs are being put into the routing table

Derelict

It's the way it has always been done. Put in a feature request (it is not a bug) at https://redmine.pfsense.org/.

A Former User

Whatever you may call it, it's certainly not correct behaviour of the firewall.

Again, please let's look at the behavior with the below pretty-much-standard configuration I (and a lot of other people) use:

two WAN interfaces
the gateways of those WAN interfaces configured with monitor IPs 1.1.1.1 and 8.8.8.8
a gateway group configured for failover
that gateway group set as default gateway
WAN1 interface with monitor IP 1.1.1.1 is the primary (Tier 1) internet connection
WAN2 interface with monitor IP 8.8.8.8 is the secondary (Tier 2) internet connection

Scenario 1: Everything is running normally, i.e. both WAN interfaces are up, the monitor IPs are both reachable and WAN1 is active as configured via the Gateway Group Tier.

Although WAN1 is the primary and active WAN interface, all traffic destined to 8.8.8.8 (be it from pfsense itself in case that nameserver is configured somewhere, or be it from clients behind the firewall that may use that DNS server) will actually go through WAN2. Many people probably don't even notice this, yet, it is not correct behaviour, as the firewall is configured to use the primary WAN1 interface for all outgoing traffic as long as WAN1 is not down.

Scenario 2: WAN1 goes down (or, to be precise, the monitor IP 1.1.1.1 does not respond to pings anymore)

Pfsense will detect that 1.1.1.1 is not pingable anymore and switch over to WAN2. But the route to 1.1.1.1 is still there and thus, 1.1.1.1 becomes completely unreachable for both the firewall as well as the clients behind it. Which is certainly not correct behaviour.

Scenario 3: WAN2 goes down (or, to be precise, the monitor IP 8.8.8.8 does not respond to pings anymore)

Pfsense will detect that 8.8.8.8 is not pingable, but since WAN2 is the secondary (Tier 2) WAN interface, won't switch over or do anything. But the route to 8.8.8.8 is still there and thus 8.8.8.8 becomes completely unreachable for both the firewall as well as the clients behind it, although the primary WAN1 internet connection is still working fine. Which is certainly not correct behaviour.

Derelict

@Derelict said in Gateway monitor IPs are being put into the routing table:

Since dpinger is binding to the source address itself it might be time to revisit whether that host route is still necessary.

@Derelict said in Gateway monitor IPs are being put into the routing table:

It's the way it has always been done. Put in a feature request (it is not a bug) at https://redmine.pfsense.org/.

@GustavG It is not a bug as it is working as designed. Not sure what else you want from me. Not really interested in arguing about it when we both essentially agree.

A Former User

Well, i'd like to have this wrong behaviour fixed, simple as that. That's not a feature request in my book.

In scenario 1, the firewall is sending traffic out to interfaces it's not supposed to do

In scenario 2, the monitor IP becomes completely unreachable although it's reachable just fine over the other WAN interface

In scenario 3, the monitor IP becomes completely unreachable, just because the currently inactive WAN connection goes down

All of the above three issues are not documented anywhere and looking at the forum posts regarding Multi-WAN failover, I can tell other people are experiencing these issues also.

Let me ask you a direct question: How can you call the above three issues "working as designed"? What would you name that feature that fixes the three above mentioned scenarios?

johnpoz

Your use case is lets call it ODD to say the least..

When monitoring your wan.. The NORMAL scenario is to ping the gateway of that wan interface.. Its not some IP out of the public internet that needs to be gotten to.

You are choosing to use some IP that you want to also use, like 8.8.8.8 - pick something else for starters.

So yeah I would call what your asking for a "feature"..

A Former User

Your use case is lets call it ODD to say the least..

When monitoring your wan.. The NORMAL scenario is to ping the gateway of that wan interface.. Its not some IP out of the public internet that needs to be gotten to.

No, it's not odd to do that, not at all. It's necessary, otherwise many outages are not detected as the gateway often stays up because something further upstream breaks.

In fact, you guys explain it in the documentation and recommend not using the WAN gateway:

By default pfSense will ping the gateway to determine the quality of the WAN. In some cases, that is not an accurate measure. For instance, if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit, then the actual WAN link could be down and pinging the gateway would never show it. Also, if the ISP gateway is up but the ISP experiences upstream failures, those cannot be detected by pinging only the gateway.
A custom IP address can be entered to monitor here that will be used to determine the WAN quality. A public website, Google public DNS, or any IP on the Internet that responds to pings can be used. The downside is that should that IP ever go offline, or suffer a failure of its own, the WAN could be marked down when it’s really up.
(https://docs.netgate.com/pfsense/en/latest/routing/multi-wan.html)

You are choosing to use some IP that you want to also use, like 8.8.8.8 - pick something else for starters.

See, that's the hard part. I've already mentioned it in the first post of this thread. There are not that many reliable anycast IPs out there where I can be sure that they are never used by my users. I know for example they use Google DNS, so 8.8.8.8 and 8.8.4.4 are not possible anymore. I also know they use Cloudflare DNS, so 1.1.1.1 is also not possible anymore.

I'd need two IPs that are:

reliable
never used by my users

That's not easy to find.

johnpoz

No it is ODD.. you pay your isp for connectivity... If something upstream breaks like the site your monitoring goes down.. Your internet connection still works.

They mention that scenario for when say your gateway is local to pfsense, say your behind a double nat. Or an upstream provider router for example that is on site.
"if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit"

Also, where did they say you should do that in a multi wan scenario?

Pick and ip just upstream of your gateway.. Do a traceroute, whats the next hop past your gateway? Use that!

The monitoring of your wan connection is meant to monitor the connection, not that path to internet is broken to some X ip out on the internet.. Or that the path is much longer because some peer connection when down somewhere upstream in the internet. So now your asking to use the monitoring of the connection as best path selection..

So yeah what your asking for is a "feature" And the current system works as designed... Be it all the things doesn't work how you want them to work, doesn't mean its broken in any way.

While your at is requesting this feature you want - you should also ask to measure latency to every dest IP gone to, and if isp A is faster than B, use A vs B ;) And if at some point later B becomes faster switch over to B for that destination.

There is nothing saying that the monitor IP you pick has to be anycast. Monitoring is not meant to monitor the internet and some specific ISPs connectivity to it.. Its meant to monitor if your ISP connection is up and working.

Pretty much any major dns providers IPs are going to be anycast.. Pick one your not using if that is what you want. Why are you letting users use other dns other than your local dns in the first place ;)

A Former User

They mention that scenario for when say your gateway is local to pfsense, say your behind a double nat. Or an upstream provider router for example that is on site.
"if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit"

Yes, that is exactly the case for me, the gateway is local. It's a pretty much standard internet connection as done by many ISPs around the world. A /29 network with one of those IPs being configured on a router in the same building. And even if the gateway wouldn't be local, there is still a lot of stuff between that gateway and "the internet", so no, it doesn't make sense to monitor only the gateway. Also not something further upstream. Not at all. And actually, there are not many cases where that makes sense.

Also, where did they say you should do that in a multi wan scenario?

It says "Using Multiple IPv4 WAN Connections" in the headline. Did you actually ever read the documentation? Or atleast looked at the URL, which also contains the words "multi-wan"?

Pick and ip just upstream of your gateway.. Do a traceroute, whats the next hop past your gateway? Use that!

Bad idea because of several reasons:

sometimes ISPs re-arrange their network and routing which could lead to that IP becoming unavailable
those IPs belong to routers whose job is to forward traffic, not answer ICMP requests, often they don't answer reliably
sometimes, ISPs just decide to not have their routers answer ICMP echo requests anymore

So yeah what your asking for is a "feature" And the current system works as designed... Be it all the things doesn't work how you want them to work, doesn't mean its broken in any way.

Let me ask you the same direct question I asked Derelict: How can you call the above three error scenarios "working as designed"? What would you name that feature that fixes the three above mentioned scenarios?

The monitoring of your wan connection is meant to monitor the connection, not that path to internet is broken to some X ip out on the internet.. Or that the path is much longer because some peer connection when down somewhere upstream in the internet. So now your asking to use the monitoring of the connection as best path selection..

The monitoring is meant to make sure that "the internet connection" works. Meaning I can actually reach "the internet". For that, I have to ping an IP that is actually on "the internet". It's completely pointless knowing that some IP inside my ISPs core network is still pingeable, but the internet doesn't work. In that case, the automatic failover won't occur and users will stay offline.

There is nothing saying that the monitor IP you pick has to be anycast. Monitoring is not meant to monitor the internet and some specific ISPs connectivity to it.. Its meant to monitor if your ISP connection is up and working.

Ofcourse it doesn't need to be anycast per se, it only needs to be reliable. Which anycast usually is, because, well, it's anycast, i.e. not dependent on a single host or path. Like I explained above,
the monitor IPs need to be reliable, otherwise people will suffer from unecessary false-alarm WAN failover switching caused by the monitor IPs being down but not the path to the internet.

But whatever, I see it's pointless, you guys declare this working as designed and that's it. No, I'm not going to request a "don't route monitor IP traffic the wrong way" feature.

Thanks for re-assuring me to never go for a support contract or buy hardware from Netgate.

Derelict

As was specified in the redmine, it is not a bug and is necessary to ensure the packets to go out the gateway being monitored. So there's your answer.

dynach

Hi A Former User

I had the same problem, but you can do a policy route for the monitoring IPs on the LAN interface with the gateway group as gateway.

Derelict

Yes, if the pings are sourcing from something on the LAN and not the firewall itself, that should work fine.

Gabri.91

@dynach do you think policy routing could be useful in this case as well?
Basically the issue it's the same of "A former user".

https://forum.netgate.com/category/32/ha-carp-vips

follysuperscript

@johnpoz

necro'ing this thread to point out that Comcast (at least in my area) is no longer responding to pings on the DHCP assigned gateway, which might push this scenario a little further towards normal on the normal to odd continuum. This has forced me to put a public ip in dpinger which is not pfSense's fault, and may still be more ODD than not.

luckman212

This may help:

https://github.com/pfsense/pfsense/pull/4551

pete35

@luckman212

i tried your patch but it doesnt seem to be correct, any ideas ?

/usr/bin/patch --directory=/ -f -p2 -i /var/patches/6214fbf0c578a.patch --check --reverse --ignore-whitespace

Hmm... Looks like a unified diff to me...
The text leading up to this was:

+++ b/src/etc/inc/gwlb.inc
Patching file etc/inc/gwlb.inc using Plan A...
Hunk #1 failed at 239.
Hunk #2 failed at 282.
Hunk #3 failed at 2078.
3 out of 3 hunks failed while patching etc/inc/gwlb.inc
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:

|diff --git a/src/usr/local/pfSense/include/www/system_advanced_misc.inc b/src/usr/local/pfSense/include/www/system_advanced_misc.inc
|index 6cb826693cb..aa8c24d1c37 100644
|--- a/src/usr/local/pfSense/include/www/system_advanced_misc.inc

+++ b/src/usr/local/pfSense/include/www/system_advanced_misc.inc
Patching file usr/local/pfSense/include/www/system_advanced_misc.inc using Plan A...
Hunk #1 failed at 56.
1 out of 1 hunks failed while patching usr/local/pfSense/include/www/system_advanced_misc.inc
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:

|diff --git a/src/usr/local/www/system_advanced_misc.php b/src/usr/local/www/system_advanced_misc.php
|index 9806aac040e..4f676b58feb 100644
|--- a/src/usr/local/www/system_advanced_misc.php

+++ b/src/usr/local/www/system_advanced_misc.php
Patching file usr/local/www/system_advanced_misc.php using Plan A...
Hunk #1 failed at 304.
1 out of 1 hunks failed while patching usr/local/www/system_advanced_misc.php
Hmm... The next patch looks like a unified diff to me...
The text leading up to this was:

|diff --git a/src/usr/local/www/system_gateways_edit.php b/src/usr/local/www/system_gateways_edit.php
|index 96b80171790..57afa7ce7f7 100644
|--- a/src/usr/local/www/system_gateways_edit.php

+++ b/src/usr/local/www/system_gateways_edit.php
Patching file usr/local/www/system_gateways_edit.php using Plan A...
Hunk #1 failed at 72.
Hunk #2 failed at 223.
2 out of 2 hunks failed while patching usr/local/www/system_gateways_edit.php
done

Gateway monitor IPs are being put into the routing table

Hmm... Looks like a unified diff to me... The text leading up to this was:

Hmm... Looks like a unified diff to me...
The text leading up to this was: