Gateway groups: will pfSense take both gateways out of service?

mclaborn

In a multi-WAN setup using Gateway Groups, the group I use has 2 gateways in it

A faster link at Tier 1
A slower link at Tier 2

The "trigger level" (when a gateway is taken out of service) is set to "packet loss or high latency".

Question: suppose that the Tier 1 GW is taken out for packet loss and the Tier 2 GW is experiencing high latency. Will pfSense take the Tier 2 GW out of service also, leaving me with no active gateways?

TheNarc

@mclaborn It will, yes. The idea behind gateway monitoring is to set thresholds for packet loss and/or latency at which the gateway is considered unusable. So given that, it wouldn't make sense to leave one active if both have passed those thresholds. You can of course disable gateway monitoring for one or more gateway within a group, such that it will always be considered active. But I'm not aware of any way to tell it to "dynamically" disable gateway monitoring, which seems to be what you'd like (i.e. never take all gateways within a group offline, and if they are all misbehaving, leave only the one that is misbehaving the least online).

mclaborn

This poses somewhat of a dilemma. Ideally, I'd like to somehow configure it so that if all the gateways are taken out by the monitoring, that the routing switch to a defined set of gateways that were considered "fallback" gateways and would not have any monitoring. This would allow us to operate under the "a bad connection is better than no connection" principle. It's rare, but it has happened to us.

Here are some thoughts about ways that it might make sense. I'm assuming that all of these would require some level of development within pfSense to accomplish. Comments welcome.

Allow a gateway to be specified more than once within a gateway group, and move the "trigger level" setting from the group level to the individual gateway within the group, adding a "don't monitor" setting. This would allow me, within a single group, to define the normal gateways at (for example) Tier 1 and Tier 2 with monitoring enabled then define one or both of the gateways at Tier 3 without monitoring as the fallback gateways.
Allow a gateway group to define a "fallback" gateway group that would be used when all the members of the primary group are down. Would also need to add a "don't monitor" setting to the "trigger level" setting. Switching to the fallback group would need to be transparent so that the firewall rules that send traffic through the gateway group function properly.

mclaborn

I have (hopefully) solved my immediate problem by marking the Tier 2 gateway in the group that we use most as "Disable Gateway Monitoring Action" so that if the Tier 1 gateway is down pfSense will never take the Tier 2 gateway down. This should be fine for our most used gateway group but it is inappropriate for other groups that we occasionally use. If/when we switch to using another gateway group I'll have to remember and change that setting on that gateway.

It seems to me that the various monitoring and threshold settings should be defined in the gateway group and would override those on the gateway, when the gateway is used as part of a group. That would allow me to configure each group as it makes sense and then switch between them with ease.