Routing for failover - how to monitor underlying uplinks...



  • Hi,

    I've found myself in the position where I have two independent moderately reliable internet connections... I originally intended to implement my own router using Debian... then evaluated PFSense and found it much easier to configure.

    I'm very happy with routing based upon fail-over. However... I still need to know when either of my uplinks fails... as, otherwise, I'd not know to raise a fault ticket. My existing 'brain-dead-simple' cron-job script, running on a server behind the router, no-longer informs me of failed uplink... as, now, the fail-over is automatic - and does not need me to switch cables.

    I can't believe I'm the only user of the fail-over feature who still needs to know when either uplink fails. What's the recommended strategy to deal with this?

    (Thanks in advance for any suggestions...)



  • Hi @sjh ,

    Do you have pfSense setup to send email alerts? If not, make sure to enable that under System => Advanced => Notifications. I created a new gmail account just for that purpose, so that could be an option.

    Once you have that setup, make sure that the Default Gateway under Routing => Gateway is set to automatic. This should be all that's needed to receive an email when let's say your Tier 1 gateway fails and the Tier 2 takes over. That email notification should still go out since the default gw is set to automatic.

    I would also suggest setting your monitor IP to the second or third hop along the route to your ISP (depending on network) for a better metric of telling when the link is down. You may want to test this first and disable the Gateway monitoring action at first to be sure the link does not get taken down just because that next hop is not ping friendly. I've had that happen and had to switch to the third hop.

    Raffi


  • Galactic Empire

    @Raffi_ said in Routing for failover - how to monitor underlying uplinks...:

    Do you have pfSense setup to send email alerts? If not, make sure to enable that under System => Advanced => Notifications. I created a new gmail account just for that purpose, so that could be an option.

    Create a gateway group and enable email notifications.

    https://docs.netgate.com/pfsense/en/latest/routing/gateway-settings.html

    https://forum.netgate.com/topic/111569/howto-notifications-with-gmail-smtp

    You'd get an email like this:-

    Notifications in this message: 3
    
    23:49:06 MONITOR: NORDVPN_UK906_VPNV4 has packet loss, omitting from routing group NORDVPN
    10.8.1.33|10.8.1.33|NORDVPN_UK906_VPNV4|5.572ms|26.866ms|16%|loss
    23:49:08 53135MONITOR: NORDVPN_UK906_VPNV4 is available now, adding to routing group NORDVPN
    10.8.0.15|10.8.0.15|NORDVPN_UK906_VPNV4|0.166ms|0.041ms|0.0%|none
    23:49:08 53135MONITOR: NORDVPN_US2896_VPNV4 is available now, adding to routing group NORDVPN
    10.8.3.59|10.8.3.59|NORDVPN_US2896_VPNV4|1.748ms|0.073ms|0.0%|none
    


  • Many thanks... I'd completely overlooked those settings... and had disappeared down a rabbit-hole looking at more complicated solutions. This mostly solves my problem.

    Two other things I'd still like to be able to do:

    • Monitor latency, for each ISP, over time. (Collecting timings for a few pings, to a handful of relevant remote sites, each hour would suffice.)
    • Notify if either either ISP has problems routing to a particular site-of-interest (to me)... as, even if an ISP is up, if it doesn't route packets to my services, I've still got a problem - as I wouldn't be able to fail-over to it.

    Is there a similarly easy way to solve these?


  • Galactic Empire

    Status -> Monitoring

    You could do one remote site per ISP by changing the monitor IP.

    Screenshot 2019-06-13 at 20.37.57.png

    FYI you can't use the same monitoring IP on multiple links.



  • Latency is reported under Status => Monitoring as shown below. Setup the graph to show Quality of the gateways. The delay average I think it what you're looking for. You can change the time scale of the graph using the settings (wrench icon in the upper right). You can also set up a second gateway for the right axis to show the statistics for two gateways at the same time on the same graph.
    9d74bdcc-8f36-41a8-bc56-d8d39a9d6285-image.png
    The monitoring interval/period can be changed under the Routing => Gateways => Advanced settings, but I would suggest leaving them alone. You get good sampling with the defaults. More details on the settings in the book.
    https://docs.netgate.com/pfsense/en/latest/book/routing/gateway-settings.html

    I think the second question would be solved with the last suggest in my previous post. Run a tracert to google.com going out from the WAN interface for link you want to monitor. Assuming the first hop will be pfSense, ignore that. The second will likely be the gateway itself such as your modem, ignore that. The third hop might be a routing device in the ISP's network. That's really what you want to use for a monitor IP since it will tell you that the link has a connection to the ISP. It's not 100% guarantee of service, but I think it's better than using something further down the line or closer.

    I recently set this up and found that the closest pingable device on my ISP's network did respond to ping and it seemed reliable, but after a few hours, it would stop responding for long enough to mark that link as down. The link was not actually down!! Of course it caused my main WAN link to go down. It was not a big deal, but it did switch us over to our inferior 4G LTE emergency service which was obviously not intended. I ended up disabling the gateway monitoring action and switching the monitoring address to the next pingable device on the tracert and this time I monitored it for a month to make sure it wouldn't give me trouble. Learn from my mistake and make sure to test this over a period of time before going live with it, unless you don't mind it switching over to the other link.



  • New update on my own experience. We had a storm in our area take out a bunch of trees and we had a momentary power outage. Coincidentally (or not) a few hours after this, my monitoring IP on my ISP's network decided to stop responding to pings completely. This is after months of doing so reliably. My traffic was still working since I disabled automatic failover due to issues I'm having with my backup gateway (another story). But long story short, go with @NogBadTheBad setting of using Google DNS (8.8.8.8) for the monitor IP. One way or another Google DNS should respond to ping reliably. In my case, I think my ISP may have routed traffic through a different path, maybe due to the storm taking out the part of it I was trying to ping? I have no idea but it was very coincidental.


Log in to reply