Multi WAN seems to be poorly implemented



  • I'm using pfsense 2.0 in a multi-LAN / multi-WAN scenario.
    Quite often, especially on https connections (shopping sites or our own hosting server) we're getting kicked.
    This is typical behaviour for badly implemented multi-WAN which I didn't expect on pfsense.

    My general feeling of pfsense is that it's robust and flexible, but this is a bit of a downside….

    I believe it's a bug that has been reported a long time ago and hasn't been solved yet.
    I don't understand why it takes that long and even less why pfsense 2.0 is considered to be stable.

    For the time being I created a gateway group where the different gateways are in different tiers.
    I'm using that gateway group for port 80,443 and some other ports on which we're using http-traffic.

    To avoid load-balancing problems it would be best to always use the same gateway for the same IP for at least several hours.

    Any info regarding this issue is welcome.
    I would like to know if anyone is at least trying to solve it.



  • Did it also happen with sticky connection enable?



  • Best to never load balance HTTPS, but sticky connections will work around that most of the time.



  • Yes,  sticky connections is/was enabled….

    But, what's that bug that's mentioned in the list?
    I haven't read about it being solved.



  • It's not an issue in 2.0 with multi-WAN.



  • Horde gives me the explicit message that I'm coming from another IP all of a sudden….
    What does "sticky" mean?
    I mean... what's the timeframe?


  • Rebel Alliance Developer Netgate

    Sticky keeps a client_ip+gateway_ip association for as long as a connection state is active. So as long as the browser holds the connection open.

    On 2.1 I have added a box to let the user specify the length of time after the state expires to hold the association, so it can be made longer.


  • Banned

    When does 2.1 go live??


  • Rebel Alliance Developer Netgate

    It will probably be sometime after March for that. Still lots of work to go there.

    2.0.1 will be coming out shortly (maybe today) but it doesn't include that change.



  • jimp,

    Is the upcoming change to "sticky" in pfsense v2.1 still using pf's "sticky-address" feature?

    In the *bsd mailing lists there have been several requests over the years to increase the sticky timeout to e.g. 30min, but no clear solution.


  • Rebel Alliance Developer Netgate

    Yes, it is done the same way. The timeout for that is controller by pf's src.track timer (see pfctl -st)



  • Thx jimp.

    What would happen in a load-balancing pfsense setup with two gateways with sticky enabled, if one gateway failed? What would prevent pf from remembering the old (finished) sessions tracks for src.track (e.g. 30 minutes) and keep sending traffic its way?

    I am having in mind the issue raised in
    http://lists.freebsd.org/pipermail/freebsd-pf/2006-December/002860.html
    http://lists.freebsd.org/pipermail/freebsd-pf/2006-December/002861.html

    In that example they suggest using pfctl -K to remove source tracking entries.


  • Rebel Alliance Developer Netgate

    I believe the way our multi-wan works none of that info applies, as it does some behind-the-scenes magic when a WAN fails that may be doing the right thing here already. Ermal would have to say for sure though.



  • But what code does that checkbox execute?
    Can I do something from the console….?



  • To see the code, do a search, e.g.

    fgrep lb_use_sticky /etc/inc/*

    Basically it enables the "sticky-address" feature of pf.


  • Rebel Alliance Developer Netgate

    The code triggered by the checkbox to activate sticky can be found in /etc/inc/interfaces.inc

    As for the GUI field to set the timeout manually, here is what does that:
    https://github.com/bsdperimeter/pfsense/commit/4573641589d50718b544b778cea864cfd725078a



  • I really ment…
    can I write some value to some pseudo-file in /proc or doesn't it work that way?

    The checkbox / input field in the php-site does something...
    Can you tell us what it does or can this be backported to 2.0

    I'm insisting because it's a bit of a show stopper for my implementation....


  • Rebel Alliance Developer Netgate

    Read my last post again, I provided everything you'd need to know to make the change on 2.0.1 by hand.



  • Did you edit your previous post?….
    Anyhow

    I applied those 2 patches (I wasn't able to create a patch file from it, but downloaded the 2 raw files with fetch)
    I set the timeout to 3600 seconds which should be sufficient.
    If this works I will have the best of both worlds (equal spreading of bandwidth and stability)

    Tomorrow I will know if it does its job better.




  • @frater:

    Tomorrow I will know if it does its job better.

    So, what's your impression of the new src.track feature (i.e. sticky with a timeout)?

    And have you tested how it handles failure of one of the two gateways? (see my question in the previous page)

    TIA.



  • Because there was still one site that gave problems I didn't dare to post immediately.
    But I never had any problems accessing the 2 hosting servers of our own.

    I think it's working…
    I assume it's properly working when your line goes down. The behaviour will depend a lot on what it should do with the current states (flush them or keep them open).

    I just updated to 2.01, so I needed to patch it again.
    I now did it in a way that is easy to share with the community.
    It's a bit less work now....

    I do think it should have been backported to 2.01

    Cheers and happy new year

    
    cd /usr/local/www
    cp -p system_advanced_misc.php system_advanced_misc.php.2.01
    patch system_advanced_misc.php <system_advanced_misc.php.patch<br>cd /etc/inc
    cp -p filter.inc filter.inc.201
    patch filter.inc <filter.inc.patch< pre="">/etc/inc/filter.inc.patch
    

    281a282,284

    if (isset($config['system']['lb_use_sticky']) && is_numeric($config['system']['srctrack']) && ($config['system']['srctrack'] > 0))
          $rules .= "set timeout src.track {$config['system']['srctrack']}\n";

    
    /usr/local/www/system_advanced_misc.php.patch
    

    58a59

    $pconfig['srctrack'] = $config['system']['srctrack'];
    105c106
    <              if($_POST['lb_use_sticky'] == "yes")


    if($_POST['lb_use_sticky'] == "yes") {
    107c108,109
    <              else


    $config['system']['srctrack'] = $_POST['srctrack'];
                  } else
    192a195,200
    function sticky_checked(obj) {
          if (obj.checked)
                  jQuery('#srctrack').attr('disabled',false);
          else
                  jQuery('#srctrack').attr('disabled','true');
    }
    269c277
    <                                                                      />


    onClick="sticky_checked(this)" />
    278a287,292

    " class="formfld unknown" >

    >                                                                      "By default this is 0, so source tracking is removed as soon as the state expires. " .
                                                                          "Setting this timeout higher will cause the source/destination relationship to persist for longer periods of time."); ?>
    387c401
    <                                                                      ---



  • I upgraded to 2.01 and re-applied the patch….
    It's not working  :-(

    I am getting kicked from our hosting server from time to time. 'tcpdump' is running at the same time and as you can see it all of a sudden switches to the other connection.
    This is not sticky at all......

    I did think it was working properly before I upgraded, although I did get some strange behaviour on other servers....

    Can I check if srctrack is really set properly?
    I have little knowledge of FreeBSD / pfilter

    tcpdump -i eth0 -n 'port 8443'

    14:33:33.041319 IP 89.250.179.117.29754 > 46.243.24.60.pcsync-https: P 1949:2786(837) ack 2746 win 16425
    14:33:33.041658 IP 46.243.24.60.pcsync-https > 89.250.179.117.29754: . 2746:4206(1460) ack 2786 win 96
    14:33:33.041676 IP 46.243.24.60.pcsync-https > 89.250.179.117.29754: P 4206:4254(48) ack 2786 win 96
    14:33:33.092396 IP 89.250.179.117.19784 > 46.243.24.60.pcsync-https: . ack 98209 win 16425
    14:33:33.102207 IP 89.250.179.117.19784 > 46.243.24.60.pcsync-https: P 8928:9765(837) ack 98209 win 16425
    14:33:33.102225 IP 89.250.180.164.13586 > 46.243.24.60.pcsync-https: S 929859587:929859587(0) win 8192 <mss 1460,nop,wscale="" 2,nop,nop,sackok="">14:33:33.102315 IP 46.243.24.60.pcsync-https > 89.250.180.164.13586: S 1120370230:1120370230(0) ack 929859588 win 5840 <mss 7="" 1460,nop,nop,sackok,nop,wscale="">14:33:33.102667 IP 46.243.24.60.pcsync-https > 89.250.179.117.19784: . 98209:99669(1460) ack 9765 win 228
    14:33:33.102681 IP 46.243.24.60.pcsync-https > 89.250.179.117.19784: P 99669:99685(16) ack 9765 win 228
    14:33:33.108718 IP 89.250.180.164.38584 > 46.243.24.60.pcsync-https: S 1307358421:1307358421(0) win 8192 <mss 1460,nop,wscale="" 2,nop,nop,sackok="">14:33:33.108751 IP 46.243.24.60.pcsync-https > 89.250.180.164.38584: S 2972341496:2972341496(0) ack 1307358422 win 5840 <mss 7="" 1460,nop,nop,sackok,nop,wscale="">14:33:33.114151 IP 89.250.180.164.16897 > 46.243.24.60.pcsync-https: S 2970339047:2970339047(0) win 8192 <mss 1460,nop,wscale="" 2,nop,nop,sackok="">14:33:33.114183 IP 46.243.24.60.pcsync-https > 89.250.180.164.16897: S 1688705918:1688705918(0) ack 2970339048 win 5840 <mss 7="" 1460,nop,nop,sackok,nop,wscale="">14:33:33.120833 IP 89.250.180.164.48746 > 46.243.24.60.pcsync-https: S 794117234:794117234(0) win 8192 <mss 1460,nop,wscale="" 2,nop,nop,sackok="">14:33:33.120865 IP 46.243.24.60.pcsync-https > 89.250.180.164.48746: S 3021423565:3021423565(0) ack 794117235 win 5840 <mss 7="" 1460,nop,nop,sackok,nop,wscale="">14:33:33.126519 IP 89.250.180.164.52196 > 46.243.24.60.pcsync-https: S 1506349235:1506349235(0) win 8192 <mss 1460,nop,wscale="" 2,nop,nop,sackok="">14:33:33.126550 IP 46.243.24.60.pcsync-https > 89.250.180.164.52196: S 3390423845:3390423845(0) ack 1506349236 win 5840</mss></mss></mss></mss></mss></mss></mss></mss></mss> 
    

    again….  can't this feature be backported?????



  • @frater:

    Can I check if srctrack is really set properly?

    You can check if there is a rule. From shell prompt run:

    pfctl -sr | fgrep src.track


  • Rebel Alliance Developer Netgate

    Also check:

    pfctl -st
    

    To see the current timer values.



  • pfctl -st | grep src.track
    src.track 1800s
    pfctl -sr | grep src.track

    So, I guess I didn't implement the patch properly this time….

    Can you supply a patch file or just 2 the files already patched?


  • Rebel Alliance Developer Netgate

    The fun thing about git is each commit is a patch. If you have the commit id, you have the patch. If you do

    git show 4573641589d50718b544b778cea864cfd725078a
    

    Then you get something usable as a patch file.

    commit 4573641589d50718b544b778cea864cfd725078a
    Author: jim-p <jimp@pfsense.org>Date:   Tue Nov 15 16:28:45 2011 -0500
    
        Add a gui field to set the source tracking timeout for sticky connections.
    
    diff --git a/etc/inc/filter.inc b/etc/inc/filter.inc
    index 29864df..fdd43b7 100644
    --- a/etc/inc/filter.inc
    +++ b/etc/inc/filter.inc
    @@ -280,6 +280,8 @@ function filter_configure_sync($delete_states_if_needed = true) {
     		/* User defined maximum table entries in Advanced menu. */
     		$rules .= "set limit table-entries {$config['system']['maximumtableentries']}\n";
     	}
    +	if (isset($config['system']['lb_use_sticky']) && is_numeric($config['system']['srctrack']) && ($config['system']['srctrack'] > 0))
    +		$rules .= "set timeout src.track {$config['system']['srctrack']}\n";
    
     	// Configure flowtable support if enabled.
     	flowtable_configure();
    diff --git a/usr/local/www/system_advanced_misc.php b/usr/local/www/system_advanced_misc.php
    index d25c96d..e1da772 100644
    --- a/usr/local/www/system_advanced_misc.php
    +++ b/usr/local/www/system_advanced_misc.php
    @@ -56,6 +56,7 @@ $pconfig['proxyuser'] = $config['system']['proxyuser'];
     $pconfig['proxypass'] = $config['system']['proxypass'];
     $pconfig['harddiskstandby'] = $config['system']['harddiskstandby'];
     $pconfig['lb_use_sticky'] = isset($config['system']['lb_use_sticky']);
    +$pconfig['srctrack'] = $config['system']['srctrack'];
     $pconfig['gw_switch_default'] = isset($config['system']['gw_switch_default']);
     $pconfig['preferoldsa_enable'] = isset($config['ipsec']['preferoldsa']);
     $pconfig['racoondebug_enable'] = isset($config['ipsec']['racoondebug']);
    @@ -102,9 +103,10 @@ if ($_POST) {
     		else
     			unset($config['system']['proxypass']);
    
    -		if($_POST['lb_use_sticky'] == "yes")
    +		if($_POST['lb_use_sticky'] == "yes") {
     			$config['system']['lb_use_sticky'] = true;
    -		else
    +			$config['system']['srctrack'] = $_POST['srctrack'];
    +		} else
     			unset($config['system']['lb_use_sticky']);
    
     		if($_POST['gw_switch_default'] == "yes")
    @@ -190,6 +192,12 @@ include("head.inc");
     		print_info_box($savemsg);
     ?></jimp@pfsense.org> 
    


  • Should there really be any output for

    pfctl -sr | grep src.track  ???

    And what would this output be?
    I can't find anything wrong with my /etc/inc/filter.inc nor the /usr/local/www/system_advanced_misc.php

    Replacing "/usr/local/www/system_advanced_misc.php" with the 2.10 version doesn't make a difference….

    Did you apply the patch?

    I really thought it was working in my 2.0 setup.


  • Rebel Alliance Developer Netgate

    If pftctl -st shows anything except 0s for src.track, then it's working. 0s (zero seconds) is the default value of src.track.

    pfctl -sr wouldn't show it, but it would be in /tmp/rules.debug



  • I currently have a situation in which I'm sure that I'm not having a continuing output over the same interface…
    This has been proven by doing a tcpdump on the target server...
    A new connection to that same server should go over the same interface.....

    That's the behaviour one needs in a multi-WAN setup (I believe it should be that way always).
    If not, you'll get kicked....

    I don't understand how one can live with a multi-WAN if all of a sudden it suddenly decides to talk using a different interface....
    This is asking for trouble and will result in people getting constantly kicked...
    I therefore don't understand why this subject is treated with such a low priority....

    Unless I'm totally wrong, of course...
    But no-one has said that thus far either....



  • I don't quite get how sticky works. If I analyzed correctly, with sticky enabled multi-thread http downloading or usenet downloading won't be lOad balanced at all?



  • It's because you only look at it from your point of view.
    If you want multiple connections to the same server, this isn't for you….

    I'm using pfsense in multi-LAN, multi-WAN environment....
    traffic will then be equally spread over the different intranet/Internet connections, but once a source address is using a certain target, I don't want it to switch suddenly.

    Many targets don't like this and you will get kicked constantly from websites because you're all of a sudden coming from somewhere else....



  • kevindd992002,

    When you have sessions(many sites nowadays has) stick must be working to do not get random access erros.



  • But about the stickyness….

    Is the  SRC/DST relation only on IP-level or is it SRC:port/DST:port?

    If the port is included as well it's not really a solution for this problem....
    If a SRC-IP/DST-IP relation is established I want it to follow the same route from then on...

    The WAN-IP's should be considered as endpoints... once the traffic is on the Internet it may of course follow different routes.


  • Rebel Alliance Developer Netgate

    @frater:

    I don't understand how one can live with a multi-WAN if all of a sudden it suddenly decides to talk using a different interface….
    This is asking for trouble and will result in people getting constantly kicked...
    I therefore don't understand why this subject is treated with such a low priority....

    Most sites are smart enough to handle this, through some combination of session tracking/cookie tracking and whatnot. Only certain sites will freak out of the IP changes during a session.

    Note that a session is different than a connection. A specific connection will always stay on a certain WAN, as long as the browser/client holds it open. If the browser closes a connection and/or opens a new one, then that one could go across another WAN.

    Switching IPs in the middle of a connection isn't that uncommon, consider a client on 3G/Wifi/Wired that could switch between connections automatically in some cases, or if someone roams between two different APs connected to two different WANs.


  • Rebel Alliance Developer Netgate

    @frater:

    But about the stickyness….

    Is the  SRC/DST relation only on IP-level or is it SRC:port/DST:port?

    If the port is included as well it's not really a solution for this problem....
    If a SRC-IP/DST-IP relation is established I want it to follow the same route from then on...

    The WAN-IP's should be considered as endpoints... once the traffic is on the Internet it may of course follow different routes.

    The "stickyness" is between the client IP and a gateway. It has nothing to do with the destination.

    So if ClientA makes a connection over WAN2, then everything it does (until its states all expire) will go over WAN2, it will not load balance.
    If ClientB makes a connection over WAN1, then it will use WAN1 for everything (until its states all expire).



  • Alright, so as long as sticky is enabled I don't need to make a firewall rule that will route HTTPS traffic through my failover route?

    I know basic networking but don't understand most of the things you guys mentioned. I use multi-WAN right now by simply making a "route" to have two tiers and directing LAN traffic (except HTTPs) though that route, at least that's what I know how to config in pfsense 2.0.1. When I download through HTTP, say a driver from Nvidia's website, I use Internet Download Manager and it will start multi-thread downloading which will maximized the speed available to me given by my two modems. What will I get if I enable sticky?


  • Rebel Alliance Developer Netgate

    @kevindd992002:

    Alright, so as long as sticky is enabled I don't need to make a firewall rule that will route HTTPS traffic through my failover route?

    True.

    @kevindd992002:

    I know basic networking but don't understand most of the things you guys mentioned. I use multi-WAN right now by simply making a "route" to have two tiers and directing LAN traffic (except HTTPs) though that route, at least that's what I know how to config in pfsense 2.0.1. When I download through HTTP, say a driver from Nvidia's website, I use Internet Download Manager and it will start multi-thread downloading which will maximized the speed available to me given by my two modems. What will I get if I enable sticky?

    That wouldn't do what it does now. All those connections from that single client would go over a single WAN. It wouldn't load balance.



  • @jimp:

    @kevindd992002:

    Alright, so as long as sticky is enabled I don't need to make a firewall rule that will route HTTPS traffic through my failover route?

    True.

    @kevindd992002:

    I know basic networking but don't understand most of the things you guys mentioned. I use multi-WAN right now by simply making a "route" to have two tiers and directing LAN traffic (except HTTPs) though that route, at least that's what I know how to config in pfsense 2.0.1. When I download through HTTP, say a driver from Nvidia's website, I use Internet Download Manager and it will start multi-thread downloading which will maximized the speed available to me given by my two modems. What will I get if I enable sticky?

    Thanks for that info. So in essence, multi-thread downloading does not work while sticky is enabled? Is this true for all cases?
    That wouldn't do what it does now. All those connections from that single client would go over a single WAN. It wouldn't load balance.


  • Rebel Alliance Developer Netgate

    @kevindd992002:

    Thanks for that info. So in essence, multi-thread downloading does not work while sticky is enabled? Is this true for all cases?
    That wouldn't do what it does now. All those connections from that single client would go over a single WAN. It wouldn't load balance.

    A multi-threaded download would still function, but it would not use multiple WANs, so it that really depends on what you mean by "not work".

    What sticky does is quite simple: All connections from a client get associated with a single gateway so long as any states exist for the client.



  • @jimp:

    @kevindd992002:

    Thanks for that info. So in essence, multi-thread downloading does not work while sticky is enabled? Is this true for all cases?
    That wouldn't do what it does now. All those connections from that single client would go over a single WAN. It wouldn't load balance.

    A multi-threaded download would still function, but it would not use multiple WANs, so it that really depends on what you mean by "not work".

    What sticky does is quite simple: All connections from a client get associated with a single gateway so long as any states exist for the client.

    Oh shoot! Yeah, I get you know. For some reason, I assocciated multi-threaded with multi-WAN.


Locked