Failover regulary fails to return to tier1
-
hi,
i have a dozen pfsense's systems with multiwan. ONE of them is acting up, and i don't know why ;)
For whatever reason it does not ALLWAYS generate a 'cancel alarm'.
Apinger keeps running and it outputs the correct value.The log below is a perfect example of what i mean.
-at 7:54am a succesful flip-flop occurs and all is well
-at 8:21am a similar situation arrises. GW is removed from routing. After a while gateway comes back online, but alarm is never canceled ==> filter_reload is never called ==> users stuck GW2 until i manually initiate a filter_reload.any know how to quickly resolve/debug this issue? Currently i've created a cron to manually run check_reload_status every 15mins, but that should not be needed.
Kind regards
Sep 4 11:12:49 check_reload_status: Reloading filter Sep 4 10:57:58 sshd[55891]: Accepted keyboard-interactive/pam for root from 172.20.20.21 port 49334 ssh2 Sep 4 10:54:16 php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21 Sep 4 10:54:16 php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21 Sep 4 10:49:02 dhclient: Creating resolv.conf Sep 4 10:49:02 dhclient: RENEW Sep 4 09:49:03 dhclient: Creating resolv.conf Sep 4 09:49:03 dhclient: RENEW Sep 4 09:23:43 dnsmasq[39882]: possible DNS-rebind attack detected: duvel.shilsec.lan Sep 4 09:23:43 dnsmasq[39882]: possible DNS-rebind attack detected: duvel.shilsec.lan Sep 4 09:19:07 php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21 Sep 4 09:19:07 php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21 Sep 4 08:49:02 dhclient: Creating resolv.conf Sep 4 08:49:02 dhclient: RENEW Sep 4 08:21:44 php: : MONITOR: GW_WAN has high latency, removing from routing group Sep 4 08:21:44 php: : MONITOR: GW_WAN has high latency, removing from routing group Sep 4 08:21:42 check_reload_status: Reloading filter Sep 4 08:21:41 apinger: alarm canceled: GW_WAN(8.8.8.8) *** GW_WANdelay *** Sep 4 08:21:32 apinger: ALARM: GW_WAN(8.8.8.8) *** GW_WANdelay *** Sep 4 08:16:05 dhcpd: parse_option_buffer: malformed option dhcp.streettalk-directory-assistance-server (code 76): option length exceeds option buffer length. Sep 4 07:55:02 check_reload_status: Reloading filter Sep 4 07:55:01 apinger: alarm canceled: GW_WAN(8.8.8.8) *** GW_WANdelay *** Sep 4 07:54:52 apinger: ALARM: GW_WAN(8.8.8.8) *** GW_WANdelay ***
-
forgot to mention what exactly i added to a every-15-minute-cron:
/etc/rc.filter_configure
the problem i'm experiencing is happening multiple times / day. i'm not even sure why it is being triggered.
i've currently set the latency high-set-point to 450 and the packetloss high-set-point to 50.
RRD shows a max latency of 200 & a max packetloss of 2 … still there have been multiple gateway-down events the last 8h'sany clues ?
edit: oh yea, i just noticed the monitor ip WAS 8.8.8.8, i've removed the manual override and it's now monitoring it's actual gateway. Oddly enough i set the monitor ip to 8.8.8.8 on all my site's and there are no issues with it, except on Site Alpha ;)