Failover regulary fails to return to tier1

heper

hi,

i have a dozen pfsense's systems with multiwan. ONE of them is acting up, and i don't know why ;)

For whatever reason it does not ALLWAYS generate a 'cancel alarm'.
Apinger keeps running and it outputs the correct value.

The log below is a perfect example of what i mean.
-at 7:54am a succesful flip-flop occurs and all is well
-at 8:21am a similar situation arrises. GW is removed from routing. After a while gateway comes back online, but alarm is never canceled ==> filter_reload is never called ==> users stuck GW2 until i manually initiate a filter_reload.

any know how to quickly resolve/debug this issue? Currently i've created a cron to manually run check_reload_status every 15mins, but that should not be needed.

Kind regards

Sep 4 11:12:49 	check_reload_status: Reloading filter
Sep 4 10:57:58 	sshd[55891]: Accepted keyboard-interactive/pam for root from 172.20.20.21 port 49334 ssh2
Sep 4 10:54:16 	php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21
Sep 4 10:54:16 	php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21
Sep 4 10:49:02 	dhclient: Creating resolv.conf
Sep 4 10:49:02 	dhclient: RENEW
Sep 4 09:49:03 	dhclient: Creating resolv.conf
Sep 4 09:49:03 	dhclient: RENEW
Sep 4 09:23:43 	dnsmasq[39882]: possible DNS-rebind attack detected: duvel.shilsec.lan
Sep 4 09:23:43 	dnsmasq[39882]: possible DNS-rebind attack detected: duvel.shilsec.lan
Sep 4 09:19:07 	php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21
Sep 4 09:19:07 	php: /index.php: Successful webConfigurator login for user 'admin' from 172.20.20.21
Sep 4 08:49:02 	dhclient: Creating resolv.conf
Sep 4 08:49:02 	dhclient: RENEW
Sep 4 08:21:44 	php: : MONITOR: GW_WAN has high latency, removing from routing group
Sep 4 08:21:44 	php: : MONITOR: GW_WAN has high latency, removing from routing group
Sep 4 08:21:42 	check_reload_status: Reloading filter
Sep 4 08:21:41 	apinger: alarm canceled: GW_WAN(8.8.8.8) *** GW_WANdelay ***
Sep 4 08:21:32 	apinger: ALARM: GW_WAN(8.8.8.8) *** GW_WANdelay ***
Sep 4 08:16:05 	dhcpd: parse_option_buffer: malformed option dhcp.streettalk-directory-assistance-server (code 76): option length exceeds option buffer length.
Sep 4 07:55:02 	check_reload_status: Reloading filter
Sep 4 07:55:01 	apinger: alarm canceled: GW_WAN(8.8.8.8) *** GW_WANdelay ***
Sep 4 07:54:52 	apinger: ALARM: GW_WAN(8.8.8.8) *** GW_WANdelay ***

heper

forgot to mention what exactly i added to a every-15-minute-cron:

/etc/rc.filter_configure

the problem i'm experiencing is happening multiple times / day. i'm not even sure why it is being triggered.
i've currently set the latency high-set-point to 450 and the packetloss high-set-point to 50.
RRD shows a max latency of 200 & a max packetloss of 2 … still there have been multiple gateway-down events the last 8h's

any clues ?

edit: oh yea, i just noticed the monitor ip WAS 8.8.8.8, i've removed the manual override and it's now monitoring it's actual gateway. Oddly enough i set the monitor ip to 8.8.8.8 on all my site's and there are no issues with it, except on Site Alpha ;)