2.0-BETA5 (Tue Jan 25) not failing over properly in multiwan…

  • I tested load balancing by pulling out the cable from the cable modem and in the system log, apinger detects link is down but it does not trigger any action to remove the route…

    Jan 29 18:41:53 apinger: ALARM: WAN2( *** down ***

    I wait a while and check the rules.debug for the route-to but I still see the gateway that is supposed to be down.

    cat rules.debug | grep route-to

    GWWAN = " route-to ( em0 71.x.x.x ) "
    GWWAN2 = " route-to ( em1 98.x.x.x ) " <-- supposed to be down...
    GWLOADBALANCE = "  route-to { ( em0 71.x.x.x ) ( em1 98.x.x.x )  }  round-robin  " <--GW still listed
    GWWAN_WAN2 = "  route-to { ( em0 71.x.x.x )  }  " <-- Tier two did not switch over...
    GWWAN2_WAN = "  route-to { ( em1 98.x.x.x )  }  "

  • Rebel Alliance Developer Netgate

    Is WAN1 also down at the time?

    If both WANs are down it will assume they are both up… The other surrounding log messages should help figure out what is going on.

  • I'll try again today but WAN 1 is up. There are no apinger messages that indicate wan 1 is down as well. I have seen messages like all gateways are unavailable, using configured settings but in this case, I was simulating a link down by pulling the TV cable from the modem so ethernet was still connected to the modem.

  • I tried again to unplug cable from WAN2, I get the apinger down message but nothing else happens after that. I do the same thing on WAN after WAN2 is brought up again and same behavior. No messages after apinger alarm.

    It seems like the filter reload process is not kicking off when the alarm goes off. Is there anything I can check in the system?

  • I debugged this some more and it seems like apinger is working correctly and calling

    "check_reload_status: reloading filter"

    However, the check_reload_status does not register anything and so, it doesn't call filter_configure_sync(), which calls filter_generate_gateways(); which will finally call return_gateway_groups_array() to remove the downed gateway…

    Hope this helps a little. Please look at /usr/local/sbin/check_reload_status...

  • I am sorry but if you see that message means that check_reload_status called that function.
    Possibly something else is wrong in your setup.

  • I put debug log prints in all those functions and I see no printouts. I can't see into the binary so that is all I can debug.

  • Upgraded to latest and everything seems to work now on simulated link down. Cheers!