Gateway monitor IPs are being put into the routing table

Setup using pfSense 2.4.4-p3.

two WAN interfaces
the gateways of those WAN interfaces configured with monitor IPs 1.1.1.1 and 8.8.8.8
a gateway group configured for failover
that gateway group set as default gateway

While testing with traceroute to find out which WAN interface is being used (and having the habit to use the above monitor IPs for testing) I found that when doing "traceroute 1.1.1.1", it's always the interface/gw with that monitor IP being used as exit point. When doing "traceroute 8.8.8.8", the other interface/gw is used. Regardless of which interface/gw is actually the 'active' one determined by the tier logic.

First problem with that is obviously, it circumvents to use the correct exit point for those IPs. When people ping those IPs (or use their DNS functionality) it's not the 'active' gateway being used, but the one that has that monitor IP configured.

Second and much bigger problem with that is, when the interface/gw that has that monitoring IP associated stops working, the monitor IP also stops working, because the routing for that IP is fixed and doesn't failover to the other interface/gw.

What to do about it? Use different monitor IPs that are not being used by users, but which ones?

Since pfSense cannot use multiple monitor IPs, the IPs being used for monitoring must be very reliable. Cloudflare and Google DNS IPs are very reliable because they're using anycast. However, they're also being used by users.

Another option would be to use the gateway IP itself for monitoring (i.e. do not configure a monitor IP). Obvious problem with that is, it won't trigger if the ISP has problems further down in the network, which is the usual case for our ISPs. Also, when trying to debug things by pinging those gateways, things become confusing quickly because of the fixed routing.

Yet another option would be using some monitor IP further into the network of the ISP. However, this is also not reliable, as core routers priority is not answering ICMP packets and may show packetloss that's not 'real' and also those IPs get changed/re-assigned, suddenly some routers do not answer ICMP anymore, so this is also not reliable.

As I see it, to properly fix this:

pfSense must not set routes to those IPs ~~and instead make the dpinger process bind to the approprate interface and send out the packets directly without looking at the routing table~~ (correction: dpinger already binds itself to the correct interface, only the routes must be removed)
pfSense needs support for multiple monitor IPs (Yes, I understand this is not trivial, I have seen the discussions about this)

How are you guys doing this? Currently, I just don't see a way to run a Multi WAN setup in a reliable way due to the above issues.

Only "workaround" I currently see is to find me some monitor IPs that are very reliable and are never ever being used for anything else. Any hints?

Did some more digging. As it appears, dpinger is actually binding to the respective interface (using the -B option of dpinger). "ps auxww" output from the FreeBSD shell (real interface IPs replaced with "<interface ip>" for privacy reasons):

root    41537   0.0  0.0  8948  2488  -  Is   10:31      0:00.04 /usr/local/bin/dpinger -S -r 0 -i GW_WANUM -B <interface ip> -p /var/run/dpinger_GW_WANUM~<interface ip>~1.1.1.1.pid -u /var/run/dpinger_GW_WANUM~<interface ip>~1.1.1.1.sock -C /etc/rc.gateway_alarm -d 0 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 1.1.1.1

root    41993   0.0  0.0  6900  2444  -  Is   10:31      0:00.04 /usr/local/bin/dpinger -S -r 0 -i GW_WANCOLT -B <interface ip> -p /var/run/dpinger_GW_WANCOLT~<interface ip>~8.8.8.8.pid -u /var/run/dpinger_GW_WANCOLT~<interface ip>~8.8.8.8.sock -C /etc/rc.gateway_alarm -d 0 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 8.8.8.8

Looks like there doesn't seem to be any necessity to keep those static routes. Tried it out by manually deleting those two routes on the freebsd shell with "route delete 8.8.8.8" and "route delete 1.1.1.1".

After that, everything works as expected. WAN failover works fine and connectivity to the monitor IPs is also not affected anymore.

Did some more searching for the isue, it seems that other people noticed it also, but it has not been correctly understood as it's not exactly easy to tell what's going on, especially when using DNS server as monitoring IPs (which, when they become unreachable cause name resolution to stop working which may appear as "no connectivity at all" when not testing thouroughly), please see the below topics from the forum:

This seems to be the same issue:
https://forum.netgate.com/topic/76682/gateway-monitoring-ip-set-results-in-all-traffic-going-to-that-ip-from-that-gw/2

This looks related (but I'm not sure though, never used VPN with pfSense):
https://forum.netgate.com/topic/139015/weird-gateway-monitoring-ip-issue/9

This looks to be the same issue also:
https://forum.netgate.com/topic/127191/setting-host-as-monitored-ip-makes-it-unreachable

I also found related issues in the bugtracker:

In this Bug: https://redmine.pfsense.org/issues/2514 the static routes to those monitor IPs already got removed, however, at that time, "apinger" was being used, which did not bind to the respective interface, which then of course lead to this bug https://redmine.pfsense.org/issues/3179 because without binding to the correct interface, the IP would be reachable from the other WAN connection and the broken WAN connection would not be detected.

However, now pfSense is using dpinger with the -B option to bind itself to the correct interface, and thus, those static routes are not needed anymore.

Added a script to /usr/local/etc/rc.d to delete those routes after bootup.

cat /usr/local/etc/rc.d/delete-dpinger-routes.sh
route delete 1.1.1.1
route delete 8.8.8.8

Not sure though if the routes may get re-created when touching something in the webinterface.

Could the developers please look into fixing this (i.e. don't create those static routes).

Just happened to stumble across this, seems like that guy has the same issue.

https://www.reddit.com/r/PFSENSE/comments/cdyq83/force_a_client_out_2nd_wan_link/

Derelict

That is by design. It is how it works. If you do not want a host route on a particular interface, do not use that host address as a gateway monitor address.

It also does the same for DNS servers if you set a gateway in System > General.

Since dpinger is binding to the source address itself it might be time to revisit whether that host route is still necessary.

That is by design. It is how it works. If you do not want a host route on a particular interface, do not use that host address as a gateway monitor address

If this is really by design, it's broken by design. But I think it's not really broken by design, just by accidental mistake, quoting myself from above:

In this Bug: https://redmine.pfsense.org/issues/2514 the static routes to those monitor IPs already got removed, however, at that time, "apinger" was being used, which did not bind to the respective interface, which then of course lead to this bug https://redmine.pfsense.org/issues/3179 because without binding to the correct interface, the IP would be reachable from the other WAN connection and the broken WAN connection would not be detected.
However, now pfSense is using dpinger with the -B option to bind itself to the correct interface, and thus, those static routes are not needed anymore.

It seems that there was just a misconception regarding apinger/dpinger and it's "bind to IP address" functionality in the past.

Since dpinger is binding to the source address itself it might be time to revisit whether that host route is still necessary.

It is not only not necessary, it actually breaks functionality and causes a lot of confusion for people who do not have the the time and means to analyze the issue.

Derelict

It's the way it has always been done. Put in a feature request (it is not a bug) at https://redmine.pfsense.org/.

Whatever you may call it, it's certainly not correct behaviour of the firewall.

Again, please let's look at the behavior with the below pretty-much-standard configuration I (and a lot of other people) use:

two WAN interfaces
the gateways of those WAN interfaces configured with monitor IPs 1.1.1.1 and 8.8.8.8
a gateway group configured for failover
that gateway group set as default gateway
WAN1 interface with monitor IP 1.1.1.1 is the primary (Tier 1) internet connection
WAN2 interface with monitor IP 8.8.8.8 is the secondary (Tier 2) internet connection

Scenario 1: Everything is running normally, i.e. both WAN interfaces are up, the monitor IPs are both reachable and WAN1 is active as configured via the Gateway Group Tier.

Although WAN1 is the primary and active WAN interface, all traffic destined to 8.8.8.8 (be it from pfsense itself in case that nameserver is configured somewhere, or be it from clients behind the firewall that may use that DNS server) will actually go through WAN2. Many people probably don't even notice this, yet, it is not correct behaviour, as the firewall is configured to use the primary WAN1 interface for all outgoing traffic as long as WAN1 is not down.

Scenario 2: WAN1 goes down (or, to be precise, the monitor IP 1.1.1.1 does not respond to pings anymore)

Pfsense will detect that 1.1.1.1 is not pingable anymore and switch over to WAN2. But the route to 1.1.1.1 is still there and thus, 1.1.1.1 becomes completely unreachable for both the firewall as well as the clients behind it. Which is certainly not correct behaviour.

Scenario 3: WAN2 goes down (or, to be precise, the monitor IP 8.8.8.8 does not respond to pings anymore)

Pfsense will detect that 8.8.8.8 is not pingable, but since WAN2 is the secondary (Tier 2) WAN interface, won't switch over or do anything. But the route to 8.8.8.8 is still there and thus 8.8.8.8 becomes completely unreachable for both the firewall as well as the clients behind it, although the primary WAN1 internet connection is still working fine. Which is certainly not correct behaviour.

Derelict

@Derelict said in Gateway monitor IPs are being put into the routing table:

Since dpinger is binding to the source address itself it might be time to revisit whether that host route is still necessary.

@Derelict said in Gateway monitor IPs are being put into the routing table:

It's the way it has always been done. Put in a feature request (it is not a bug) at https://redmine.pfsense.org/.

@GustavG It is not a bug as it is working as designed. Not sure what else you want from me. Not really interested in arguing about it when we both essentially agree.

Well, i'd like to have this wrong behaviour fixed, simple as that. That's not a feature request in my book.

In scenario 1, the firewall is sending traffic out to interfaces it's not supposed to do

In scenario 2, the monitor IP becomes completely unreachable although it's reachable just fine over the other WAN interface

In scenario 3, the monitor IP becomes completely unreachable, just because the currently inactive WAN connection goes down

All of the above three issues are not documented anywhere and looking at the forum posts regarding Multi-WAN failover, I can tell other people are experiencing these issues also.

Let me ask you a direct question: How can you call the above three issues "working as designed"? What would you name that feature that fixes the three above mentioned scenarios?

johnpoz

Your use case is lets call it ODD to say the least..

When monitoring your wan.. The NORMAL scenario is to ping the gateway of that wan interface.. Its not some IP out of the public internet that needs to be gotten to.

You are choosing to use some IP that you want to also use, like 8.8.8.8 - pick something else for starters.

So yeah I would call what your asking for a "feature"..

Your use case is lets call it ODD to say the least..

When monitoring your wan.. The NORMAL scenario is to ping the gateway of that wan interface.. Its not some IP out of the public internet that needs to be gotten to.

No, it's not odd to do that, not at all. It's necessary, otherwise many outages are not detected as the gateway often stays up because something further upstream breaks.

In fact, you guys explain it in the documentation and recommend not using the WAN gateway:

By default pfSense will ping the gateway to determine the quality of the WAN. In some cases, that is not an accurate measure. For instance, if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit, then the actual WAN link could be down and pinging the gateway would never show it. Also, if the ISP gateway is up but the ISP experiences upstream failures, those cannot be detected by pinging only the gateway.
A custom IP address can be entered to monitor here that will be used to determine the WAN quality. A public website, Google public DNS, or any IP on the Internet that responds to pings can be used. The downside is that should that IP ever go offline, or suffer a failure of its own, the WAN could be marked down when it’s really up.
(https://docs.netgate.com/pfsense/en/latest/routing/multi-wan.html)

You are choosing to use some IP that you want to also use, like 8.8.8.8 - pick something else for starters.

See, that's the hard part. I've already mentioned it in the first post of this thread. There are not that many reliable anycast IPs out there where I can be sure that they are never used by my users. I know for example they use Google DNS, so 8.8.8.8 and 8.8.4.4 are not possible anymore. I also know they use Cloudflare DNS, so 1.1.1.1 is also not possible anymore.

I'd need two IPs that are:

reliable
never used by my users

That's not easy to find.

johnpoz

No it is ODD.. you pay your isp for connectivity... If something upstream breaks like the site your monitoring goes down.. Your internet connection still works.

They mention that scenario for when say your gateway is local to pfsense, say your behind a double nat. Or an upstream provider router for example that is on site.
"if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit"

Also, where did they say you should do that in a multi wan scenario?

Pick and ip just upstream of your gateway.. Do a traceroute, whats the next hop past your gateway? Use that!

The monitoring of your wan connection is meant to monitor the connection, not that path to internet is broken to some X ip out on the internet.. Or that the path is much longer because some peer connection when down somewhere upstream in the internet. So now your asking to use the monitoring of the connection as best path selection..

So yeah what your asking for is a "feature" And the current system works as designed... Be it all the things doesn't work how you want them to work, doesn't mean its broken in any way.

While your at is requesting this feature you want - you should also ask to measure latency to every dest IP gone to, and if isp A is faster than B, use A vs B ;) And if at some point later B becomes faster switch over to B for that destination.

There is nothing saying that the monitor IP you pick has to be anycast. Monitoring is not meant to monitor the internet and some specific ISPs connectivity to it.. Its meant to monitor if your ISP connection is up and working.

Pretty much any major dns providers IPs are going to be anycast.. Pick one your not using if that is what you want. Why are you letting users use other dns other than your local dns in the first place ;)

They mention that scenario for when say your gateway is local to pfsense, say your behind a double nat. Or an upstream provider router for example that is on site.
"if the WAN gateway is actually a device that is local and not on the other side of the ISP circuit"

Yes, that is exactly the case for me, the gateway is local. It's a pretty much standard internet connection as done by many ISPs around the world. A /29 network with one of those IPs being configured on a router in the same building. And even if the gateway wouldn't be local, there is still a lot of stuff between that gateway and "the internet", so no, it doesn't make sense to monitor only the gateway. Also not something further upstream. Not at all. And actually, there are not many cases where that makes sense.

Also, where did they say you should do that in a multi wan scenario?

It says "Using Multiple IPv4 WAN Connections" in the headline. Did you actually ever read the documentation? Or atleast looked at the URL, which also contains the words "multi-wan"?

Pick and ip just upstream of your gateway.. Do a traceroute, whats the next hop past your gateway? Use that!

Bad idea because of several reasons:

sometimes ISPs re-arrange their network and routing which could lead to that IP becoming unavailable
those IPs belong to routers whose job is to forward traffic, not answer ICMP requests, often they don't answer reliably
sometimes, ISPs just decide to not have their routers answer ICMP echo requests anymore

So yeah what your asking for is a "feature" And the current system works as designed... Be it all the things doesn't work how you want them to work, doesn't mean its broken in any way.

Let me ask you the same direct question I asked Derelict: How can you call the above three error scenarios "working as designed"? What would you name that feature that fixes the three above mentioned scenarios?

The monitoring of your wan connection is meant to monitor the connection, not that path to internet is broken to some X ip out on the internet.. Or that the path is much longer because some peer connection when down somewhere upstream in the internet. So now your asking to use the monitoring of the connection as best path selection..

The monitoring is meant to make sure that "the internet connection" works. Meaning I can actually reach "the internet". For that, I have to ping an IP that is actually on "the internet". It's completely pointless knowing that some IP inside my ISPs core network is still pingeable, but the internet doesn't work. In that case, the automatic failover won't occur and users will stay offline.

There is nothing saying that the monitor IP you pick has to be anycast. Monitoring is not meant to monitor the internet and some specific ISPs connectivity to it.. Its meant to monitor if your ISP connection is up and working.

Ofcourse it doesn't need to be anycast per se, it only needs to be reliable. Which anycast usually is, because, well, it's anycast, i.e. not dependent on a single host or path. Like I explained above,
the monitor IPs need to be reliable, otherwise people will suffer from unecessary false-alarm WAN failover switching caused by the monitor IPs being down but not the path to the internet.

But whatever, I see it's pointless, you guys declare this working as designed and that's it. No, I'm not going to request a "don't route monitor IP traffic the wrong way" feature.

Thanks for re-assuring me to never go for a support contract or buy hardware from Netgate.

Derelict

As was specified in the redmine, it is not a bug and is necessary to ensure the packets to go out the gateway being monitored. So there's your answer.

dynach

Hi A Former User

I had the same problem, but you can do a policy route for the monitoring IPs on the LAN interface with the gateway group as gateway.

Derelict

Yes, if the pings are sourcing from something on the LAN and not the firewall itself, that should work fine.

Gabri.91

@dynach do you think policy routing could be useful in this case as well?
Basically the issue it's the same of "A former user".

https://forum.netgate.com/category/32/ha-carp-vips

follysuperscript

@johnpoz

necro'ing this thread to point out that Comcast (at least in my area) is no longer responding to pings on the DHCP assigned gateway, which might push this scenario a little further towards normal on the normal to odd continuum. This has forced me to put a public ip in dpinger which is not pfSense's fault, and may still be more ODD than not.

luckman212

This may help:

https://github.com/pfsense/pfsense/pull/4551