Changing load balancing algorithm



  • Is it possible to change the load balancing algorithm used by pfSense, to take account of link usage?

    I find the round robin algorithm pretty useless - as soon as 1 of my 3 links is downloading a large file, every third request will try to use the same link, which is already maxed out.

    Is the load balancing within the pfSense source code, or is it an external BSD module? If the latter, which one?



  • Is there anywhere else I can ask this?

    I am a programmer, so not opposed to modifying the existing code.



  • If you are a kernel coder than you have to modify pf(4).



  • Load-balancing based on interface bandwith usage would probably be non-trivial to implement.

    Just a reminder - if you have different bandwidth pipes, you can use multiple entries in the round robin pool for the 'larger' pipe so it's more likely to get hit.



  • I know I can use multiple entries to round robin one interface more than another, but that doesn't really solve the problem.

    As soon as I start downloading a large file from WAN1 (which is in the pool 4 times, because it is 4 times faster than WAN3), WAN1 will be maxed out. The next 3 external connections will also use WAN1, but will be really slow, because WAN1 is busy serving the file request. Meanwhile the other 2 WAN connections are idle.

    I was not imagining changine the algorithm to be a trivial task, but surely it is not all that complex…

    Every 2 seconds or so, gather the number of bytes received from each port since the last poll, subtract it from the maximum allowed throughput of the port to calculate the spare capacity, and stuff it in an easily accessable variable for that port. The data must be available somewhere, because you can plot graphs of it.

    Every time a packet has to be load balanced, look through the table, and choose whichever port has the largest available capacity.

    Tune to suit (varying the 2 seconds, or perhaps keeping a rolling average from samples taken more often).

    Obviously there are multi-threading issues, and stuff, but its a SMOP.

    Nikki



  • I wonder if it would make sense in your scenario to "simply" trick apinger into treating the interface as bad/down?  Take a look at /var/etc/apinger.conf and its friends.

    I don't know much about how the outbound loadbalancer works - i'd certainly defer to ermal - but I imagine you could come up with a userland solution to some of this - however, due to my lack of understanding here, I don't know what the impact on an existing TCP connection would be if the load balancer decided to mark a gateway as 'bad' - this may be why ermal suggested pf hacking would be necessary.



  • 2.0 fixes this.  You can define a per GW RTT loss and such will automatically remove a GW from the pool during times of slow-downs.

    EDIT: looks like we kill states on GW removal, we need to make that a tunable.



  • @sullrich:

    2.0 fixes this.  You can define a per GW RTT loss and such will automatically remove a GW from the pool during times of slow-downs.

    EDIT: looks like we kill states on GW removal, we need to make that a tunable.

    Please define an "RTT loss". Is this a complete loss of return packets from ping, or just return packets exceeding some round trip time?

    I'm still prepared to get my hands dirty and have a look at the code. But life would be a lot easier if someone would tell me chapter and verse which code to look at, with maybe a pointer to the development environment needed.

    Of course, it is possible that no-one here knows.



  • RTT loss is a measurable property that says if 10 out of 20 packets get lost fire an alarm and say that the gateway is down.

    The code is there if you just know how to code/read php.

    And all the above recommendations are written by 2 very knowledgeable people on pfSense(me and scott).



  • Thanks very much for the help so far.

    I take it you only know pfSense, and not the load balancing code in pf?


Locked