Failover support added for Load balancing in latest snapshot



  • Thanks to Seth Mos (databeestje) we now have failover support for load balancing.  IE: you can set it up to prefer a gateway for specific traffic and if pfSense detects an issue with that gateway, it will fail over to the next in the pool, etc.

    You can grab the latest snapshot from http://snapshots.pfsense.com/FreeBSD6/RELENG_1/

    Please test, test, test!



  • My load balancing is working before updating to this latest snapshot.
    After upgrade, reconfigure load balancing, all connections are active. But cannot access any websites.
    Changing the gateway in the LAN rule from Balancer to default makes access to websites possible, but no load balancing happens! All traffic uses only the default WAN gateway! :o
    Please help!



  • Wow! it works for me. I created a loadbalancer with the "failover" radio-button selected, then created a firewall rule to access http with the failover loadbalancer as a gateway. and now i am writing this message within my backup connection. Fantastic!



  • Selecting failover or loadbalancing mode in the load balancer does not correct my problem. How did you load the new snapshot, via a full install or upgrade firmware utility? I did it via upgrade firmware utility. Any comments please? I'm already planning to reformat my firewall to restore it.
    But i want the failover feature!  :(



  • I don't see why this would not work for load balancing since that code has not changed. Actually I use load balancing with this code at work with a 2 WAN setup and it works for me. So I cannot replicate this.

    Are you using 2 DHCP wans?



  • WAN is via PPPOE (ADSL), WAN2 is static(also ADSL).
    Load balancing is working before upgrading it to your latest snapshot (1-7-2007), so i'm sure my configs are working.
    Loaded it via firmware upgrade, not a full install. Is this an issue?

    Cannot access internet if the selected gateway in the LAN dafault rule is the balancer, but can access internet when changing it to default. My rules are all default, nothing special except for the 127.0.0.1 rule for the FTP.



  • cheeky: i also updated my pfsense through the firmware update feature. but i didnt used load balance before on this installation, so it's a new pool. check your monitor IP's, be sure they are different from each other and are accesible also.



  • And make sure the monitor ips are on the network of the ISP.  Adding google as a monitor ip which most people do is WRONG.

    The IP needs to be a few hops out on the SAME ISP.  Traceroute out of each of the WANS and find a next hop router to use as the monitor IP.  Just pulling one out of your head is wrong and is asking for trouble.



  • Monitor IPs need to be unique over all interfaces. So you can not add the google IP for multiple interfaces.

    Ignoring the fact that the IP probably lives across a trans atlantic link or some such.

    Do a traceroute from/to each internet connection and find yourself the upstream router. That is way more reliable and they differ for each interface. After changing the firewall rule did you actually wait for the rules to finish loading?



  • I just set up some testenvironment in my lab and tested this with a failoverpool of wan and opt1, both set to dhcp and it works like a charm. Good job seth!

    Btw, lot's of people have demanded this feature and now that it is available only that few testers? Come on all you loadbalancing users out there, we need some feedback!  ::)



  • Firstly, Let me say great job guys. keep up the good work.

    Can someone get an updated/easier howto posted? I think this would help adoption.
    I have looked at two different articles, one from the wiki, and one from somewhere
    else on the site.  They are slightly different, and that makes things even more confusing
    for someone who hasn't done this before.

    That being said, I seem to have gotten mine to work well with three wans. I do have a problem
    that has caused me to turn off the Load Balancing. As soon as I create a firewall rule setting the
    default route the the loadbalancer, I can't access my IPSEC client's.

    I have tried to create different rules, etc to get traffic to pass over the IPSEC, but have failed.

    I am the IPSEC Host, the rest of the clients are all mobile. I was looking for a way to set IPSEC
    to use the default gateway, or force it to one lan, but can't seem to find a way to do so.

    I tried creating the following LAN rule, figuring ipsec could communicate to my network, but my
    network couldn't communicate back. the ip 111 used below would be the original default gateway
    ip.

    *  LAN net  *  192.168.2.0/24  *  111.111.111.111 Default LAN -> IPSEC

    Help, please :)



  • @hoba:

    Btw, lot's of people have demanded this feature and now that it is available only that few testers? Come on all you loadbalancing users out there, we need some feedback!  ::)

    failover available only in the "latest snapshot", we can test it only on the testing environment. for example me: my testing environment is my home and my home network.



  • @Sn3ak:

    Firstly, Let me say great job guys. keep up the good work.

    Can someone get an updated/easier howto posted? I think this would help adoption.
    I have looked at two different articles, one from the wiki, and one from somewhere
    else on the site.  They are slightly different, and that makes things even more confusing
    for someone who hasn't done this before.

    The new page is a lot easier. Just add a interface and it's monitor IP to the server list using the add button.
    Or just pick the gateway from each interface as the monitor IP. that works in a pinch.

    That being said, I seem to have gotten mine to work well with three wans. I do have a problem
    that has caused me to turn off the Load Balancing. As soon as I create a firewall rule setting the
    default route the the loadbalancer, I can't access my IPSEC client's.

    Are your ipsec clients in another subnet or are they assigned addresses in the LAN address range?
    If they have different addresses you need to create a allow any from lan to vpnsubnets rule with the default gateway assigned.



  • Is there a doc availlable of how to install the load-balancing function….?



  • If you do have 2 wan, go to Services -> Load Balancer, Create a new pool, type gateway, add the interfaces and monitor IPs, Save and apply.
    Then go to Firewall -> Rules -> Lan and edit the Lan->Any rule, change the gateway from default to your just created pool.

    Good Luck.



  • Can I have 2 pools at the same time? One with simple load balancer and the other with failover?

    I was thinking that the failover would be used with ssl stuff and load balancer for everything else



  • Yes, that will work fine.



  • I have setup load balance using DSL (PPPoE) as the WAN interface, and Cable (dhcp) as an optional interface.  I added a loadbalancing gateway pool as described in this thread, but it does not work properly.  If I used the loadbalancing gateway DNS name resolution doesn't work for any clients on my network.



  • Add static routes for the DNS servers forcing the traffic out the correct interfaces.



  • Hi all,

    Been trying this new feature.  I have two WAN and one is using a very costly per Mb.  If my top gateway become available again will it switch back after a fail over?

    Also I was wondering, how come in my routing table the gatway always stays to the top one in my pool when I look at my route.

    Martin



  • Yes, it will switch back.  Not sure what you are asking about the route table but we do not route multi-wan via regular routing.  It is handled via PF itself.



  • Great function  :D. I have at home a 100 Mbit line connected to the city's MAN and a ADSL line as a secondary link.
    Great to have something to automate the switch between the WAN's if the primary line goes down, instead of as today, manually connecting cables  :-.

    @databeestje:

    If you do have 2 wan, go to Services -> Load Balancer, Create a new pool, type gateway, add the interfaces and monitor IPs, Save and apply.
    Then go to Firewall -> Rules -> Lan and edit the Lan->Any rule, change the gateway from default to your just created pool.

    Good Luck.

    My problem appears at

    add the interfaces

    because only one NIC is in the list, the NIC named "WAN".
    I have my secondary ISP on the OPT1 NIC, but i cannot choose it.

    Both ISP's issues IP address with the help of DHCP. The ADSL unit is a modem with 4 switchports.
    The 100 Mbit MAN line is a simple Ethernet twisted pair cable.

    The computer running pfSense has 1 onboard 3Com and 2 3Com 3C905 PCI cards.

    How do I tell the failover function that the OPT1 NIC is a WAN NIC so that it gets in the list named "Interface Name" @ load_balancer_pool_edit.php page?



  • Only NICs that have a gateway assigned will be listed in the selection. I guess your OPT1-WAN is not connected and/or has no dhcp lease yet. Make sure it got an IP and gateway assigned first. Then revisit the poolcreationscreen.



  • Thanks, that did the trick  ;).

    Is there a way of controlling the

    ping intervall time,
    ping reply timeout time,
    how many ping timeouts that are needed before it failsover,
    plus controlling how many successful pings on the primary isp that are needed to do a failback?

    If at this time it is not possible to manually control the above values,
    is there a way to find out what the values are today, even if they are hardcoded?



  • @Veni:

    Thanks, that did the trick  ;).

    Is there a way of controlling the

    ping intervall time,
    ping reply timeout time,
    how many ping timeouts that are needed before it failsover,
    plus controlling how many successful pings on the primary isp that are needed to do a failback?

    Not currently.

    @Veni:

    If at this time it is not possible to manually control the above values,
    is there a way to find out what the values are today, even if they are hardcoded?

    1 second timeout, 1 interval every 5 seconds.  Newer snapshots have been changed to ping interval of 3, timeout 2 seconds.



  • Thanks.
    That was the fastest response over a webbased forum i have seen :).



  • It's alive ;D.

    Failover took about max 5 seconds and i could browse the web and check my ipaddress to be sure what isp i was using.
    Failback the same, only a couple of seconds. Thanks everybody :D.

    A question about portforwarding and failover:
    When creating a rule under Firewall/NAT/Port Forward, the first parameter is Interface.
    Is there a way of being able to choose my loadbalancer pool named "Failover" as interface parameter,
    or do i have to clone every PF rule so that it even applies to the OPT1 interface?



  • You have to add seperate rules/forwards for each Interface.



  • I added static routes for my DNS servers, and even tried to use DNS servers from opendns, still can't get DNS to work properly, i can ping outside my network via ip address, but i can't using domain names.



  • I had almost a similar problem. It took a couple of minutes after reboot before the problem started and it did not affect clients
    on the network using the pfSense computer as a DNS server, but the pfSense own use of internet(not local static mappings)DNS
    stopped working. Squid was unable to resolve, ping from pfSense console was unable to resolve and the Packages tab on the web
    gui was unable to resolve.

    Hoba posted a response to my issue and the problem has after that not yet shown itself again.
    The only thing i still can not understand was why my problem showed itself when i was running on the primary WAN link
    and first after a couple of minutes. There was never any failure recorded(nor did i notice one either) on the primary WAN link.
    But still, Hoba's response solved my problem.

    http://forum.pfsense.org/index.php/topic,3467.0.html



  • I got this working. Sort of.

    Its buggy though.

    Set it all up, lb status shows both links up. interface status show both links up. disconnect wan1 and it takes close to 5 minutes for it failover. while the interface status instantly shows the connection down, the load balancer status takes forever to update.

    being mindful of the state table i test against a different destination and eventually traffic begins to cross WAN2.

    Reconnect WAN1, this took 10 minutes for the lb status to show that this connection was back. again the interface status showed it instantly. Traffic never switches back to WAN1. By never I mean I waited for more than 90 minutes. I cleared the state tables etc. The route table shows the WAN1 gw as the default. But all traffic still passes the WAN2 interface.

    Even if I change the gateway on my outbound rule to explicitly specify only the gw of WAN1 all the traffic passes WAN2. Yes I waited for the rules to build. Yes I flushed the states. Yes both interfaces are up. :)

    The way the loadbalancer updates the interface status seems to be screwy. In fact at time it won't update the interface status of all my pools the same ways. See the attached image for an example. Explain that one. :)

    Running 2-09 snapshot.

    rebooting restores traffic to wan1. rinse and repeat.

    suggestions?

    oh…monitor ips are the farside of both connections on the isp networks.




  • Followup :: I've added static routes for the ips i'm monitoring on each interface. Made zero difference.



  • @Sn3ak:

    Firstly, Let me say great job guys. keep up the good work.

    Can someone get an updated/easier howto posted? I think this would help adoption.
    I have looked at two different articles, one from the wiki, and one from somewhere
    else on the site.  They are slightly different, and that makes things even more confusing
    for someone who hasn't done this before.

    That being said, I seem to have gotten mine to work well with three wans. I do have a problem
    that has caused me to turn off the Load Balancing. As soon as I create a firewall rule setting the
    default route the the loadbalancer, I can't access my IPSEC client's.

    I have tried to create different rules, etc to get traffic to pass over the IPSEC, but have failed.

    I am the IPSEC Host, the rest of the clients are all mobile. I was looking for a way to set IPSEC
    to use the default gateway, or force it to one lan, but can't seem to find a way to do so.

    I tried creating the following LAN rule, figuring ipsec could communicate to my network, but my
    network couldn't communicate back. the ip 111 used below would be the original default gateway
    ip.
     
    *  LAN net  *  192.168.2.0/24  *  111.111.111.111 Default LAN -> IPSEC

    Help, please :)

    This is confirmed. I have a patch for this, and I will commit this soon. This should show up as our valentines release.

    You could also create a rule from lan subnet to the VPN subnet above the load balancer rule to negate this effect.
    We now handle this in the background. I just recently stumbled upon this.



  • @nexusone:

    Reconnect WAN1, this took 10 minutes for the lb status to show that this connection was back. again the interface status showed it instantly. Traffic never switches back to WAN1. By never I mean I waited for more than 90 minutes. I cleared the state tables etc. The route table shows the WAN1 gw as the default. But all traffic still passes the WAN2 interface.

    Even if I change the gateway on my outbound rule to explicitly specify only the gw of WAN1 all the traffic passes WAN2. Yes I waited for the rules to build. Yes I flushed the states. Yes both interfaces are up. :)

    Are both your wan interfaces DHCP perhaps?

    I do not know what sort of hardware you have but in my home case (with a secondary wireless link) it takes about ~45 seconds for the rules to be generated. This is via Eden 933 with 256MB ram. I am running from a CF card which is slowing the process down quite a bit though.

    Also keep in mind that it's common for upstream routers to have implemented a icmp rate limit which might affect the load balancer gateway detection.

    On the command page you can execute the following command to see if it has regenerated the correct routes.

    grep round /tmp/rules.debug
    

    This should output all the filter rules that use the load balancer pools. You should check if these are correct.

    We will be implementing a few more other fixes to check for down interfaces in the future as well.



  • Neither of my interfaces are DHCP. Static addresses on both. Normally I can ping either of my monitor IPs until I'm blue in the face without any complications of ICMP rate limiting. Hardware is a dell poweredge 860 with 2 intel gig-e ports and 2 broadcom gig-e ports. Hardware has been great. The wan ports are both served by the broadcom ports.

    In the interface status page i see accurately updated state changes on the link status immediately. The loadbalancer status page lags anywhere between a few minutes and forever to reflect these changes. I've checked the routes like you suggested and it does not appear that the route is being updated when the interface status changes, which subsequently impacts the ability of the load balancer to ping the monitor address and control the pool.

    I dont have anything particularly tricky in my config.

    2 WANS, both static address connections.
    Both are properly configured as both do work with some coaxing.
    1 LAN, 1 DMZ (dmz presently not used)
    No complicated nat or port forwards or anything.
    only a single rule to allow all traffic from lan to pass to *.

    Monitor IPs are good. I've checked and double checked. They are both on the farside of my wan links and have working static routes to control which interface is used to avoid "false positives" and so effectively reflect connection status as up or down.

    Any help or suggestions are appreciated. Really need to get this sorted out.

    @databeestje:

    @nexusone:

    Reconnect WAN1, this took 10 minutes for the lb status to show that this connection was back. again the interface status showed it instantly. Traffic never switches back to WAN1. By never I mean I waited for more than 90 minutes. I cleared the state tables etc. The route table shows the WAN1 gw as the default. But all traffic still passes the WAN2 interface.

    Even if I change the gateway on my outbound rule to explicitly specify only the gw of WAN1 all the traffic passes WAN2. Yes I waited for the rules to build. Yes I flushed the states. Yes both interfaces are up. :)

    Are both your wan interfaces DHCP perhaps?

    I do not know what sort of hardware you have but in my home case (with a secondary wireless link) it takes about ~45 seconds for the rules to be generated. This is via Eden 933 with 256MB ram. I am running from a CF card which is slowing the process down quite a bit though.

    Also keep in mind that it's common for upstream routers to have implemented a icmp rate limit which might affect the load balancer gateway detection.

    On the command page you can execute the following command to see if it has regenerated the correct routes.

    grep round /tmp/rules.debug
    

    This should output all the filter rules that use the load balancer pools. You should check if these are correct.

    We will be implementing a few more other fixes to check for down interfaces in the future as well.



  • hi all

    i'm using 1.0.1-SNAPSHOT-02-14-2007 and trying to use the load balancing feature

    my setup is as follows :

    LAN 192.168.1.254/24
    WAN PPPoE adsl with dynamic IP
    WAN2 DHCP from wireless network (Alvarion 5.4 Ghz) static IP

    i created a pool (gateway, load balancing) and a new rule for trafic from LAN with my pool as gateway

    both wans are marked online in status/load

    but when i try to access websites, i have to clic two times to access the page, so i think that only one wan is working

    my system log is filling with these messages :

    kernel: arplookup 80.8.244.1 failed: host is not on local network
    kernel: arpresolve: can't allocate route for 80.8.244.1

    this IP address is the gateway assigned by my first ISP (i have an ip address with a /32 on my WAN)

    is something misconfigured or does a workaround exists to this problem ?

    thanks



  • @regis:

    hi all

    i'm using 1.0.1-SNAPSHOT-02-14-2007 and trying to use the load balancing feature

    my setup is as follows :

    LAN 192.168.1.254/24
    WAN PPPoE adsl with dynamic IP
    WAN2 DHCP from wireless network (Alvarion 5.4 Ghz) static IP

    When the ip address changes on your wans, the LB will fail. You have to have static ip addresses.

    Work around is to put in a simple router between your pfS and the modems



  • DHCP wans for balancing are supported, but this requires the dhcp address to maintain the same.
    E.g. a static DHCP assigned address or static PPPOE assigned address.

    This is a product limitation. I am currently not considering fixing this yet.

    Cheers,

    Seth



  • thanks for the answer

    i'll consider replacing my DSL modem with a basic router



  • No more comments on my situation?

    I'll be upgrading to the 2-14 snap this evening. Can I expect any change in behavior?

    @nexusone:

    Neither of my interfaces are DHCP. Static addresses on both. Normally I can ping either of my monitor IPs until I'm blue in the face without any complications of ICMP rate limiting. Hardware is a dell poweredge 860 with 2 intel gig-e ports and 2 broadcom gig-e ports. Hardware has been great. The wan ports are both served by the broadcom ports.

    In the interface status page i see accurately updated state changes on the link status immediately. The loadbalancer status page lags anywhere between a few minutes and forever to reflect these changes. I've checked the routes like you suggested and it does not appear that the route is being updated when the interface status changes, which subsequently impacts the ability of the load balancer to ping the monitor address and control the pool.

    I dont have anything particularly tricky in my config.

    2 WANS, both static address connections.
    Both are properly configured as both do work with some coaxing.
    1 LAN, 1 DMZ (dmz presently not used)
    No complicated nat or port forwards or anything.
    only a single rule to allow all traffic from lan to pass to *.

    Monitor IPs are good. I've checked and double checked. They are both on the farside of my wan links and have working static routes to control which interface is used to avoid "false positives" and so effectively reflect connection status as up or down.

    Any help or suggestions are appreciated. Really need to get this sorted out.

    @databeestje:

    @nexusone:

    Reconnect WAN1, this took 10 minutes for the lb status to show that this connection was back. again the interface status showed it instantly. Traffic never switches back to WAN1. By never I mean I waited for more than 90 minutes. I cleared the state tables etc. The route table shows the WAN1 gw as the default. But all traffic still passes the WAN2 interface.

    Even if I change the gateway on my outbound rule to explicitly specify only the gw of WAN1 all the traffic passes WAN2. Yes I waited for the rules to build. Yes I flushed the states. Yes both interfaces are up. :)

    Are both your wan interfaces DHCP perhaps?

    I do not know what sort of hardware you have but in my home case (with a secondary wireless link) it takes about ~45 seconds for the rules to be generated. This is via Eden 933 with 256MB ram. I am running from a CF card which is slowing the process down quite a bit though.

    Also keep in mind that it's common for upstream routers to have implemented a icmp rate limit which might affect the load balancer gateway detection.

    On the command page you can execute the following command to see if it has regenerated the correct routes.

    grep round /tmp/rules.debug
    

    This should output all the filter rules that use the load balancer pools. You should check if these are correct.

    We will be implementing a few more other fixes to check for down interfaces in the future as well.


Log in to reply