PfSense 1.2.3-RC2 Outbound Load Balancer Replaced

databeestje

Hi,

Recently we have replaced the old mechanism to detect link availability (slbd) with the mechanism we use in 2.0 (apinger).

The switchover should be complete now and brings no configuration changes.
The mechanism from 2.0 is a lot better for the following reasons.

It will only mark a connection down when 10 subsequent pings fail
There is a single process that monitors the connections
When creating multiple load balancer pools it will only monitor each unique monitor address once, instead of per pool.

The Status -> Load Balancer screen included in pfSense 1.2.3-RC2 will show all the pools with their members and the current values for Latency and Loss.
** One side note here is that when the system is booting up apinger might not have enough data and the Latency and Loss values might be empty. These generally fill out when you refresh about 10 seconds later.
** When a member is down it will be marked red and the last measured latency will also be shown here.
** When a member has a high delay or packet loss it will not be excluded from the rules but it will invoke a filter reload. The status screen will show it as yellow.

I hope this information helps.

The inbound load balancer for server pools remains the old slbd. That has not changed.

Regards,

Seth

biatche

so, is the latest snapshot 1.2.3-RC2? Or is this just pre-information?

databeestje

This is a bit ahead of the curve. There is none yet, but this is for reference.

redbaron

I've noticed changes in outbound load balancing in 1.2.3 snapshot from 29.05. I've failover setup with two WAN interfaces. It works much better than slbd except one thing - it stops work after these log messages:

Jun 3 09:12:07 apinger: 217.66.16.44: Received packets buffer: ################################################## ####….............
Jun 3 09:12:07 apinger: 217.66.16.44: Lost packet count mismatch (-4!=0)!
Jun 3 09:12:07 apinger: 77.72.248.2: Received packets buffer: ################################################## ####................
Jun 3 09:12:07 apinger: 77.72.248.2: Lost packet count mismatch (-4!=0)!

Then it doesn't actual do any failover (but quality RRD graphs are still being updated). If I make noop save of load balancing pool, it became functional again (I believe that apinger is simply restarted). I've no idea when it breaks, in general it doesn't works longer than 1 day, so I've to resave pool settings manually to make it alive agin.

databeestje

I have committed a fix to prevent apinger from quitting when this happens. You will still see this log message occasionally though.

We are still tracking the source of this message as we have not seen this message before in the past year that we ran apinger on 2.0. So where this is coming from is a mystery as of yet.

redbaron

the big problem with apinger is the fact that it switches back to main WAN as soon as it recieves ping from it. If main WAN is jumping from up to down, then pfSense switches WAN too frequently.

If main WAN lost pings it waits for timeout and switches to backup one, same behaviour should be applied for opposite direction, ie if main WAN is in fail state and ping is recieved, then wait some time and if no ping loss was detected then switch back to main WAN.

cheesyboofs

Thank you for backporting this.

Its good that you guys have not shut the door on 1.2.x and are still actively tweaking it.

databeestje

Looks like all those debug messages were caused by my lack of C skills. It may or may not be fixed in upcoming snapshots.

databeestje

@redbaron:

the big problem with apinger is the fact that it switches back to main WAN as soon as it recieves ping from it. If main WAN is jumping from up to down, then pfSense switches WAN too frequently.

This problem is not specific to apinger, if you have a dual or more dhcp wan it fail in the exact same way. The filter code in 1.2 has not link detection which is making things worse.

This is a not so common failure as most of the times it will do the right thing. However, sometimes dhclient will remove the static route for that monitor IP on the interface causing this behaviour to happen.

GoldServe

@databeestje:

Looks like all those debug messages were caused by my lack of C skills. It may or may not be fixed in upcoming snapshots.

May I ask which file or what got updated to fix these error messages? I'm using an Aug 8th build and everything is working perfectly for me, I don't want to upgrade but do want to fix this rather annoying error message in the logs every night.

Cheers.

kevindd992002

Is there a guide for this new load balancing scheme?

cmb

@kevindd992002:

Is there a guide for this new load balancing scheme?

The changes were only in the back end, the front end is completely identical to how it's always been.

kambeeng

Thank you for information i hop the load balance better than before alaso Traffic shapping :D

smbsmb

Can new Load Balancer work with 2 or more PPPoE/PPtP connections?

wdavid

Can this be the reason that I have problems routing ip addresses that ends with specific numbers (223-239) described in my post http://forum.pfsense.org/index.php/topic,19763.0.html