Something is wrong in gwlb.inc…



  • For those that have been reading my posts, I have been having MAJOR load balancing issues when links go down. I have played around with some code and found a function to not work properly.

    function return_gateway_groups_array(),

    				foreach($gateways_status as $status) {
    					if ($status['name'] != $gwname) {
    						continue;
    					}
    
    

    if ($status['name'] != $gwname) is also false so it never enters the if condition. If I change != to ==, it will enter the function twice which is wrong. What is this check really supposed to do? If it does not enter this function, you will always get:

    	php: : All gateways are unavailable, proceeding with configured XML settings!
    


  • Looks like the comparison I was pointing out is comparing two different things…

    Jun 23 21:45:25 php: : Comparison WAN2 | opt1
    Jun 23 21:45:25 php: : Comparison WAN | opt1
    Jun 23 21:45:25 php: : 1. Current GWNAME: opt1 | TIER: 1
    Jun 23 21:45:25 php: : Comparison WAN2 | wan
    Jun 23 21:45:25 php: : Comparison WAN | wan
    Jun 23 21:45:25 php: : 1. Current GWNAME: wan | TIER: 2
    Jun 23 21:45:25 php: : All gateways are unavailable, proceeding with configured XML settings!
    Jun 23 21:45:25 php: : Comparison WAN2 | opt1
    Jun 23 21:45:25 php: : Comparison WAN | opt1
    Jun 23 21:45:25 php: : 1. Current GWNAME: opt1 | TIER: 2
    Jun 23 21:45:25 php: : Comparison WAN2 | wan
    Jun 23 21:45:25 php: : Comparison WAN | wan
    Jun 23 21:45:25 php: : 1. Current GWNAME: wan | TIER: 1
    Jun 23 21:45:25 php: : All gateways are unavailable, proceeding with configured XML settings!
    Jun 23 21:45:25 php: : Comparison WAN2 | opt1
    Jun 23 21:45:25 php: : Comparison WAN | opt1
    Jun 23 21:45:25 php: : 1. Current GWNAME: opt1 | TIER: 1
    Jun 23 21:45:25 php: : Comparison WAN2 | wan
    Jun 23 21:45:25 php: : Comparison WAN | wan

    Edit: Part of the problem is that the gateway groups are not saved using WAN, WAN2 names, rather WAN and OPT1, OPT2. Second issue is that I do not know if the comparison between WAN != wan is case insensitive or not.



  • Which snapshot is it?  Maybe you got one that was between some apinger changes.

    By the way, there shouldn't really be any issues of whether the comparison is case sensitive or not.  The problem appears to be that each is using a name from a different type of source.  There are functions to do conversions between them, which could be used here if necessary.



  • I'm definitely using a very recent snapshot (6/22/2010) and I have been monitoring the repository and I have not seen any apinger/gateway/loadbalancer updates or I would have updated to the snapshot to test.

    I'm wondering why not many people see this problem? I already started from a fresh config file with 6/22.



  • I fixed the problem by changing one line in gwlb.inc

    $apingercfg .= "target \"{$gateway['monitor']}\" {\n";
    			$apingercfg .= "	description \"{$gateway['name']}\"\n";
    

    to

    $apingercfg .= "target \"{$gateway['monitor']}\" {\n";
    			$apingercfg .= "	description \"{$gateway['friendlyiface']}\"\n";
    

    All is working well now and routes are removed from routing group when I simulate link down.

    I added one more debug message to the code and now I do not have the all gateway message unavailable :)

    Jun 24 07:54:29 	php: : MONITOR: opt1 is online, adding to routing group
    Jun 24 07:54:29 	php: : MONITOR: wan is online, adding to routing group
    Jun 24 07:54:29 	php: : MONITOR: opt1 is online, adding to routing group
    Jun 24 07:54:29 	php: : MONITOR: wan is online, adding to routing group
    Jun 24 07:54:29 	php: : MONITOR: opt1 is online, adding to routing group
    Jun 24 07:54:29 	php: : MONITOR: wan is online, adding to routing group
    Jun 24 14:54:28 	check_reload_status: reloading filter
    Jun 24 07:54:10 	apinger: alarm canceled: opt1(4.2.2.3) *** down ***
    


  • I'm wondering why not many people see this problem? I already started from a fresh config file with 6/22.

    I've had issues with the gateway load balancing, but I haven't had time to track it down yet, so it wasn't sure if it was my configuration or a bug in the system.

    I've seen in redmine that there is an issue with the link hotplug event that hasn't been resolved yet, and I thought maybe that was causing my problem.

    http://redmine.pfsense.org/issues/656


Locked