Gateway not resuming from DOWN state



  • Hi,

    I have been having this issue ever since I have dual WAN setup. The primary link (DSL/cable) is stable so non-issue but the backup link (via cellular modem with Eth interface) isn't perfectly stable.
    Cellular/backup gateway goes into DOWN state because of too many pings are lost (>20%), then soon (in a minute or so) it resumes to UP state (I'm notified by email everytime it happens).
    Sometimes it's stable for a whole day or more, sometimes it happens a couple of times a day.

    The problem is that after certain number of such gateway going DOWN/UP events (typically after a month or so), it does not want to resume from DOWN state. It goes down because >20% pings are lost but then it stays DOWN forever.

    The last email I'm receiving is like this:
    5:46:10 MONITOR: WAN_BACKUP_DHCP is down, omitting from routing group WAN_GROUP
    8.8.4.4|10.81.62.92|WAN_BACKUP_DHCP|39.595ms|2.508ms|23%|down
    and there's no email about gateway going up again.
    In the dashboard, it's showin as red/down with 100% ping loss.

    I have been researching this and it's aboslutely not a problem with cellular modem/cellular network etc. Restarting modem does not help. It's something in pfSense.

    What helps is one of two things:

    • reboot pfsense
      or
    • disable/re-enable cellular/backup gateway interface

    Then instantly the cellular gateway returns to online/up state.

    I said it happened only on the cellular link becuase it is having periodic stability issues.
    But recently my cable operator was having some problems thoroughout one day and I was experiencing ping losses on the primary link for the whole day.
    It was extremely annoying because I was receiving tens of notification emails on that day.

    And guess what ? After a certain number of such down/up events, primary/cable gateway died for good. And again, restarting cable modem did not help. In the dashboard it was showing as down with 100% ping loss.
    I had to manually disable/enable primary gateway in pfSense and instantly this primary gateway returned to online.

    So it seems like after a certain number of down/up events (unsure whether this is steady number and what the number is), ping monitor in pfSense malfunctions and is marking gateway as down permanently, until pfSense is rebooted or specific gateway disabled/enabled.

    Can you help with this ?

    Thanks


Log in to reply