dpinger gateway monitoring - strange issue
-
Since 9:15am Eastern today i have been having instability on my IPsec tunnels and my WAN Gateway monitoring IP [ATT DNS server - 68.94.156.11]. At first, i thought this was because of some upstream issue within the ATT network as one of my IPsec tunnel endpoints is on ATT Fiber. Then I started seeing instability on another IPsec endpoint thats on OptimumOnline[New York]. I am running eBGP peering on my IPsecs so i got the additional email spam of routing flapping all day. My inbox is not happy.
So I ran some mtr traces and noticed that the ATT modem i am using at my location, Arris BGW210-700, in passthrough mode, was seeing drops. So i am thinking it was just the modem issue so i restarted it. The issue still persists where I am losing my routing neighbors and getting gateway packet loss emails.My last step in troubleshooting was to change the WAN monitor IP from ATTs DNS server to just the default gateway for my WAN which is 162.193.210.1. Its not the best practice to do this but i was out of ideas.
All packet loss and bgp down alerts have stopped. Im stable.What is,if any, is the relationship between dpinger and IPsec? Why after swapping the monitor address for my WAN_DHCP caused all alerts to stop.
edit: My IPsec tunnels use the remote IPsec endpoint, another firewall, as the monitor IP. So it uses the tunnel endpoint of say 10.6.106.2/30 where this pfsense with the instability is 10.6.106.1/30
IPsec tunnels flapping
I made the change at 1:45pm Eastern
-
Dpinger monitors across the VTI links when you have the remote side set up as a gateway. It can be more sensitive that local monitoring simply because it's going over a longer route with more hops.
The WAN monitoring you had set was seeing packet loss to an extent that it would have been throwing the gateway alarm action. That will have been restarting numerous things including VPNs and BGP. There's a good chance it was just the target not responding though whist the actually connectivity remained good. Or at least good enough.
Rather than changing the monitor IP to something local it would be better to disable the gateway alarm action to prevent the service restarts.Steve
-
@stephenw10 Thanks for the added color. Ok so if i understand you correctly, if the WAN_DHCP monitoring IP is having packet loss that will interrupt the IPsec tunnel connectivity as well? So if WAN_DHCP is getting packet loss, IPsec will restart the tunnels? Why does a gateway alarm restart the IPsec and BGP process?
edit
This is on of the emails i get. From syslog
-
@stephenw10 ok I had to read over the documentation again but I think I see what you’re getting at.
My packet loss thresholds are 10/20.
So losing 20 packets marks the gateway as down. Pf probably removes the gateway, the default route and nexthop from the route table so naturally anything relying on it such as IPsec will fail too. I suppose raising my threshold would’ve masked the issue.
Am I right on this? -
It's 10 and 20% loss not total packets. When you only have a single gateway pfSense will not remove it as the default route but it will still run all the gateway scripts which restart things. The gateway action is almost entirely for multiwan setups where a gateway down even needs to restarts services on an alternative WAN connection.
Yes, changing the gateway thresholds would prevent the alarms and hence the gateway events but simply disabling the action also does that whist still logging the alarms.
-
@stephenw10 Thanks as always. Curious about the gateway scripts..what are they? where can I find them?
The restarting of things with the packet loss is what tripped me up yesterday.
I'm going to move forward with your suggestion by disabling the action BUT i do still find the alerting such as packet loss very useful for diagnosing circuit health.Do i just disable gateway monitoring to in effect disable the gateway scripts? To confirm once i disable i still will get emails/alerts about packet loss?
-
No you want monitoring enabled in order to log events and quality data. Just disable the gateway monitoring action. It's a setting just below that.
-
@stephenw10
Do you know if there is any documentation on these gateway scripts? what they do, how they are tied dpinger?
-
There is no specific documentation I'm aware of. We were discussing it internally just yesterday.
However you can see what is triggered in /etc/rc.gateway_alarm
-
@stephenw10 Perfect thank you. I think we're settled here.
my two cents - a quick blurb in the documentation noting what would happen if there is instability. Knowing that VPNs will restart would've been helpful as i was troubleshooting an upstream issue where as this was at its core a gateway action because of my monitor IP.
-
I agree. Exactly what we were discussing yesterday.
This also applies: https://redmine.pfsense.org/issues/13416
-
@stephenw10 This was what i was going to respond to you with in my 2 cents comment but i let it go.
The redmine is spot on. If you are doing a Multi-WAN set up than as part of the configuration you should, explicitly, enable gateway actions because thats the whole point. Otherwise, keep the gateway action disabled.
The RRD graphs are very valuable so i would keep the monitoring enabled for sure.Thanks again for your help. I think you're 10/10 with my issues now?