Load balancing: Monitoring too sensitive?



  • Hi all, I'm testing pfSense 1.0.1 for use as a dual-WAN firewall.  Right now I have a T-1 and a Comcast cable modem.  Both are setup in an outgoing LB pool, with monitoring IP's set to the gateway at the ISP.  I am getting occasional failures on the pings used for monitoring, for whatever reason.  In the logs, these failures only last for 5 seconds, or one monitoring cycle, leading me to the believe the connection isn't really down, but perhaps the gateway on the other end is congested and just dropped the ICMP packet.

    In any event, is there some way to adjust the monitoring interval, or build some leway into it?  I would never failure a test just because a single ping didn't complete.  I would prefer to have the system mark the gateway down only with a consecutive number of ping failures.  I'm guessing three would be sufficient, but if there some way to configure this in the interface, that would be all the better.  I'm guessing this monitoring is built into the slbd daemon, so fixing this myself is beyond my skills.  Scripts I can handle, compiled code not so much.

    Anyone have any insight on this?



  • Try to use a different monitor IP and see if that helps. However, having some more control over the ping and timeout settings would be nice to have imo. Bill has some equipement now to work on multiwan and loadbalancing code again. Maybe he can integrate something like timeouts, intervals, serial ping loss,…whatever makes sense.



  • @hoba:

    Try to use a different monitor IP and see if that helps. However, having some more control over the ping and timeout settings would be nice to have imo. Bill has some equipement now to work on multiwan and loadbalancing code again. Maybe he can integrate something like timeouts, intervals, serial ping loss,…whatever makes sense.

    Technically I don't have all the equipment just yet, but should by years end.  Also, if my memory serves me, the code sends out three pings and all have to fail to have the gateway considered down.  Either way, this is code I expect to start working on again in the new year and making better.

    –Bill



  • Thanks for the response guys.

    I've already switched IP's a few times.  I'm not certain why they're failing, particularly my T-1 ISP's IPs, as that connection is usually like a rock.  Comcast, on the other hand, not so much…

    Anyway, I'm looking forward to any additions or configuration options you can give us in that area.  Bill, if there's anything you need from me, please let me know.  Implementing this box is going to save my employer about $2500, so I'm sure I can throw some spare hardware or a bounty your way if need be.

    Thanks,
    Steven



  • @stevenbdjr:

    Thanks for the response guys.

    I've already switched IP's a few times.  I'm not certain why they're failing, particularly my T-1 ISP's IPs, as that connection is usually like a rock.  Comcast, on the other hand, not so much…

    Anyway, I'm looking forward to any additions or configuration options you can give us in that area.  Bill, if there's anything you need from me, please let me know.  Implementing this box is going to save my employer about $2500, so I'm sure I can throw some spare hardware or a bounty your way if need be.

    Thanks,
    Steven

    At this point I'm somewhat tied up and obligated to work on the traffic shaper.  However, if you have a serious proposition, I'd be willing to entertain it.

    –Bill



  • I'm seeing a similar issue on my end too. I've got wireless internet and a fractional T1. The wireless is faster but a bit jittery so it'd drop some pings sometimes. I'd also like to be able to adjust the threshold of when a line is actually marked down. Can we manually adjust the threshold ourselves in the code?



  • No, it's hardcoded in the slbd binary afaik unless Bill tells you something else.



  • @hoba:

    No, it's hardcoded in the slbd binary afaik unless Bill tells you something else.

    I'm still sure it's 3 pings.  But I'm probably catching the exit code wrong.  This is in the C code for our variant of slb - it's not going to be a quick PHP change.

    –Bill


Locked