PfSense 1.2.3-RC2 Outbound Load Balancer Replaced

redbaron

I've noticed changes in outbound load balancing in 1.2.3 snapshot from 29.05. I've failover setup with two WAN interfaces. It works much better than slbd except one thing - it stops work after these log messages:

Jun 3 09:12:07 apinger: 217.66.16.44: Received packets buffer: ################################################## ####….............
Jun 3 09:12:07 apinger: 217.66.16.44: Lost packet count mismatch (-4!=0)!
Jun 3 09:12:07 apinger: 77.72.248.2: Received packets buffer: ################################################## ####................
Jun 3 09:12:07 apinger: 77.72.248.2: Lost packet count mismatch (-4!=0)!

Then it doesn't actual do any failover (but quality RRD graphs are still being updated). If I make noop save of load balancing pool, it became functional again (I believe that apinger is simply restarted). I've no idea when it breaks, in general it doesn't works longer than 1 day, so I've to resave pool settings manually to make it alive agin.

databeestje

I have committed a fix to prevent apinger from quitting when this happens. You will still see this log message occasionally though.

We are still tracking the source of this message as we have not seen this message before in the past year that we ran apinger on 2.0. So where this is coming from is a mystery as of yet.

redbaron

the big problem with apinger is the fact that it switches back to main WAN as soon as it recieves ping from it. If main WAN is jumping from up to down, then pfSense switches WAN too frequently.

If main WAN lost pings it waits for timeout and switches to backup one, same behaviour should be applied for opposite direction, ie if main WAN is in fail state and ping is recieved, then wait some time and if no ping loss was detected then switch back to main WAN.

cheesyboofs

Thank you for backporting this.

Its good that you guys have not shut the door on 1.2.x and are still actively tweaking it.

databeestje

Looks like all those debug messages were caused by my lack of C skills. It may or may not be fixed in upcoming snapshots.

databeestje

@redbaron:

the big problem with apinger is the fact that it switches back to main WAN as soon as it recieves ping from it. If main WAN is jumping from up to down, then pfSense switches WAN too frequently.

This problem is not specific to apinger, if you have a dual or more dhcp wan it fail in the exact same way. The filter code in 1.2 has not link detection which is making things worse.

This is a not so common failure as most of the times it will do the right thing. However, sometimes dhclient will remove the static route for that monitor IP on the interface causing this behaviour to happen.

GoldServe

@databeestje:

Looks like all those debug messages were caused by my lack of C skills. It may or may not be fixed in upcoming snapshots.

May I ask which file or what got updated to fix these error messages? I'm using an Aug 8th build and everything is working perfectly for me, I don't want to upgrade but do want to fix this rather annoying error message in the logs every night.

Cheers.

kevindd992002

Is there a guide for this new load balancing scheme?

cmb

@kevindd992002:

Is there a guide for this new load balancing scheme?

The changes were only in the back end, the front end is completely identical to how it's always been.

kambeeng

Thank you for information i hop the load balance better than before alaso Traffic shapping :D

smbsmb

Can new Load Balancer work with 2 or more PPPoE/PPtP connections?

wdavid

Can this be the reason that I have problems routing ip addresses that ends with specific numbers (223-239) described in my post http://forum.pfsense.org/index.php/topic,19763.0.html