Failover notifications



  • Hi all,

    For a few weeks we have been receiving the following email notifications:

    MONITOR: WAN1GW is down, omitting from routing group SecondaryFailover
    MONITOR: WAN1GW is down, omitting from routing group PrimaryFailover

    Different gateway from a different day:

    MONITOR: GW_OPT1 is down, omitting from routing group SecondaryFailover
    MONITOR: GW_OPT1 is down, omitting from routing group PrimaryFailover

    They always come in bursts of 4 and always at quiet times (i.e. middle of the night).

    Example log entries:

    Aug 16 04:42:33 check_reload_status: Reloading filter
    Aug 16 04:42:33 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Aug 16 04:42:33 check_reload_status: Restarting ipsec tunnels
    Aug 16 04:42:33 check_reload_status: updating dyndns WAN1GW
    Aug 16 04:42:06 php-fpm[61527]: /rc.filter_configure_sync: Message sent to admin@example.com OK
    Aug 16 04:42:06 php-fpm[61527]: /rc.filter_configure_sync: MONITOR: WAN1GW is down, omitting from routing group SecondaryFailover
    Aug 16 04:42:06 php-fpm[61527]: /rc.filter_configure_sync: Message sent to admin@example.com OK
    Aug 16 04:42:06 php-fpm[61527]: /rc.filter_configure_sync: MONITOR: WAN1GW is down, omitting from routing group PrimaryFailover
    Aug 16 04:42:05 php-fpm[60973]: /rc.dyndns.update: Message sent to admin@example.com OK
    Aug 16 04:42:05 php-fpm[60973]: /rc.dyndns.update: MONITOR: WAN1GW is down, omitting from routing group SecondaryFailover
    Aug 16 04:42:05 php-fpm[60973]: /rc.dyndns.update: Message sent to admin@example.com OK
    Aug 16 04:42:05 php-fpm[60973]: /rc.dyndns.update: MONITOR: WAN1GW is down, omitting from routing group PrimaryFailover
    Aug 16 04:42:01 check_reload_status: Reloading filter
    Aug 16 04:42:01 check_reload_status: Restarting OpenVPN tunnels/interfaces
    Aug 16 04:42:01 check_reload_status: Restarting ipsec tunnels
    Aug 16 04:42:01 check_reload_status: updating dyndns WAN1GW

    We don't run IPsec or OpenVPN on the firewall.

    Apart from the alerts I couldn't find any evidence of any disruptions: AWS Route53 healthchecks running every 30 seconds never pick anything, the firewall doesn't reboot etc.

    System info:

    Version 2.2.6-RELEASE (i386)
    built on Mon Dec 21 14:50:36 CST 2015
    FreeBSD 10.1-RELEASE-p25
    You are on the latest version.

    Platform nanobsd (1g)
    NanoBSD Boot Slice pfsense0 / ada0s1 (rw)
    CPU Type Geode(TM) Integrated Processor by AMD PCS

    Hardware: LinITX ALIX 2D3 LX800 (3NIC+USB) pfSense Firewall Kit

    Some people indicated it could be due to firewall's resources being exhausted.
    I doubt it since even at peak times CPU and memory usage oscillates around 30-40%.

    Are these alerts false positives?
    Is this a know bug fixed between version 2.2.6 that we currently run and the latest available (2.3.4-p1 as of today)?
    I know we should be able to upgrade by replacing 1GB CF card with a bigger one.

    Please advise.

    Regards
    Adam



  • It has just happened again and I have spotted the following log entries:

    Sep 26 12:32:14 apinger: alarm canceled: WAN1GW(8.8.8.8) *** down ***
    Sep 26 12:31:46 apinger: ALARM: WAN1GW(8.8.8.8) *** down ***
    Sep 22 14:14:19 apinger: alarm canceled: WAN1GW(8.8.8.8) *** loss ***
    Sep 22 14:13:30 apinger: ALARM: WAN1GW(8.8.8.8) *** loss ***

    The "loss" from 4 days ago didn't sent any email notifications.

    The monitor seems to be a little over sensitive.
    Can this be adjusted somewhere?
    Would you recommend some other monitor IP rather than Google public DNS?



  • First, it's time to upgrade your pfSense! 2.2.6 is pretty old, and one of the best things to come in 2.3+ was that apinger (gateway monitoring daemon) was replaced by dpinger – which is infinitely more reliable. Anyone who's been using pfSense for more than a couple of years will remember with much angst the nightmare of wrestling with apinger.

    Once you've done that, I highly suggest you read https://doc.pfsense.org/index.php/Multi-WAN#Optional_Tweaks and experiment with the latency & loss thresholds.

    The messages about IPSEC/OpenVPN/Dyndns are not important and do not indicate any problem. They are just basically debug messages from code paths that, in your case, are not being hit.

    Good luck. If you need more specific help feel free to come back and ask.


Log in to reply