A high latency monitor IP causes abnormal latency on all interfaces.

ryan87

I've been experiencing an issue lately that seems to be caused by the IP I'm pinging for gateway monitoring responding slowly. I ping my ISP gateway and a couple weeks ago the replies started bouncing all over the place (2ms to 2000ms).

The issue is only with the ICMP responses from the ISP gateway. The actuall connection seems to be ok. Obviously the ISP needs to resolve that issue, but I've noticed a strange collateral effect with pfSense that's causing me problems.

Whenever the responses from the monitored IP get high I experience abnormal latency on all interfaces, even to the LAN interface which surprised me.

For example, I have a dual-WAN gateway where I'm monitoring the problematic gateway on WAN1 and 1.1.1.1 on WAN2. There's no failover or anything complicated. The latency to 1.1.1.1 via WAN2 looks like this:

When I swap WAN1 to monitor 8.8.8.8 the latency on WAN2 immediately reverts to what I'd expect:

I monitored the pfSense LAN IP from a workstation for a few days and with the high latency monitor IP I was seeing latency of about 1.3ms and standard deviation of roughly 20ms. However, I could see responses getting above 100ms on occasion when I was watching it manually.

As soon as I set the WAN monitor IP to something reliable like 8.8.8.8 the latency from the workstation to the pfSense LAN IP went back to normal; .3ms with a standard deviation of .22ms.

That's on 2.5.2. I also tested on a spare firewall that was still running 2.4.5 and it seemed like whenever the gateway get flagged offline the LAN IP would become unreachable for up to 5s. I don't know what would cause that, but I noticed the same thing seems to happen when I apply firewall rules on that one (v2.4.5).

I updated the spare v2.4.5 firewall to 2.5.2 and now it behaves the same as my regular 2.5.2 firewall. It's a simple single WAN config IIRC.

I think the issue seems to coincide with the gateway coming back online after being flagged as offline, but I'm not positive. Is there something that happens like a reload of firewall rules that could cause LAN side latency issues in that scenario?

I've been using 1.1.1.1 as a monitor IP for 2 days and even though my ISP gateway is still responding erratically my connection is back to feeling 100%.

If anyone knows what's going on there, I'd appreciate any insight you can give.

I'm also curious if Disable Gateway Monitoring Action should be used for a single WAN connection. The docs say "This is useful if the administrator wants to monitor a gateway without the monitoring causing additional disruptions." Does anyone know what those disruptions could be?

stephenw10

Hmm, that does seem like an issue with loading on pfSense if pings against the LAN are affected.

Yes, I would suggest disabling gateway monitoring action except that will prevent failover in a dual WAN setup. So instead I suggest tuning your monitoring so that those latency values do not trigger the action.

Check the gateway log for dpinger trigger events.

Steve

ryan87

@stephenw10 Thanks for the reply. I disabled the gateway monitoring action and watched it for a couple days and it definitely makes a difference. My gateway logs show alarm latency and clear latency over and over, so I suspect my hunch about something happening as the gateway is flagged offline or online causing local latency may be correct.

This is a very narrow edge case though. I've never seen a network where the ISP assigned gateway gives ICMP responses bad enough for it to be flagged as offline, but everything else works perfectly fine (at least as far as I can tell).

Disabling the gateway monitoring action is a decent solution for me. It lets me monitor the gateway without triggering whatever causes that latency. Do you know if there's anything happening as part of that gateway monitoring action that I'd want on a single WAN config? If not I'll leave that setting enabled permanently since I'm not doing any failover.

Of course, as you mentioned, for a dual-WAN config with failover it would be necessary to re-tune the thresholds in the gateway monitoring and tune it back when the ISP fixes their issue. I suppose using an alternate monitor IP would be a good solution too.

Thanks again for the help!

stephenw10

In a single WAN setup disabling the monitoring action is an acceptable solution. The gateway monitoring allows you to tune the latency and loss levels to match your WAN though. You should should be able to set levels that are not triggered in normal use but still do trigger if it actually goes down.

The ISP supplied gateway does not have to respond to ping at all. And it if does it doesn't have to prioritise it. It's not that unusual to see the gateway drop pings when it's under load but still route traffic just fine. A lot of devices like that would have separate control and data planes and it's the control plane which would usually have to respond to pings to it's own IP.
Setting a monitoring IP as some external site also gives you a much better idea of the actual state of your connectivity. Monitoring the gateway wouldn't show an outage at the ISP but upstream of the gateway for example.

Steve