Relayd sends traffic to a host that is down in 2.0-RC1



  • Hi,

    We're running "2.0-RC1 (amd64) built on Fri Apr 29 21:19:09 EDT 2011" in a production
    environment providing load balancing to two servers. The load balancer is typically seeing 1500
    requests/sec at peak times, splitting them across two front-end systems and has been in
    this configuration for about a year as the build time suggests.

    Recently, we've tried to take one of the front-end systems out to upgrade it, but we've noticed
    that relayd is still sending a small fraction of the total traffic to 'down' host in the pool even
    after it has acknowledged the host is down. I'd say we're probably seeing 10 requests/sec
    even after the host is seen as down in the pool.

    I've seen some recommendations to upgrade to 2.0.1 but I can't see any specific bug that's
    been fixed that sounds like our problem, but we'll probably speculatively upgrade to 2.0.1
    just to see.

    Under what circumstances might relayd (or pf) continue to send traffic to a pool host that is
    down? What kind of diagnostics can I do to get a better handle on what pf/rdr-to is doing?

    Note that I'm seeing SYN packets being sent to the down host, so these are new TCP connections.

    Regards,
    Mark



  • So much has changed since RC1 that you need to upgrade. I review every single commit and I can't remember offhand whether things related to that have changed, because a LOT has changed. I know for a fact what you're describing works in 2.0 and 2.0.1 release versions, prior to that so much has changed that I'm not sure. It may have been broken for a couple hours that day and you just got an unlucky snapshot, lots of possibilities.



  • Ok, I guessed you were going to say that. :)

    Just as a quick hint, what can I look at besides 'relayctl show summary' to see where
    in the redirection rulesets it still thinks the down host is a suitable redirection target?

    If that's not an easy answer, that's fine, and thanks for the upgrade suggestion.

    Mark



  • check the pfctl output for the relayd anchor. "pfctl -a relayd -sn" IIRC but that's off the top of my head, it's close to that if that's wrong.



  • Just as an update, the upgrade to 2.0.1 seemed to resolve the symptoms we saw and
    the load balancing is behaving as advertised now. :)

    Anywhere I can get a concise list of relayd (or related pf changes)? I'd love to
    track down where this one got fixed.

    Cheers,
    Mark



  • too many different areas that could impact it and too big a timeframe for there to be any remotely short list. Could be in the front end or back end PHP source in one git repo, in any number of kernel patches in another repo, and you'd have to wade through 1000+ changes. If you want to dig, http://github.com/bsdperimeter/pfsense and http://github.com/bsdperimeter/pfsense-tools



  • Ok, thanks. Do pfsense make any custom changes to the pf code in the FreeBSD kernel?


  • Rebel Alliance Developer Netgate

    Yes we do, the patches are all in the tools repo.


Log in to reply