Relayd sends traffic to a host that is down in 2.0-RC1
We're running "2.0-RC1 (amd64) built on Fri Apr 29 21:19:09 EDT 2011" in a production
environment providing load balancing to two servers. The load balancer is typically seeing 1500
requests/sec at peak times, splitting them across two front-end systems and has been in
this configuration for about a year as the build time suggests.
Recently, we've tried to take one of the front-end systems out to upgrade it, but we've noticed
that relayd is still sending a small fraction of the total traffic to 'down' host in the pool even
after it has acknowledged the host is down. I'd say we're probably seeing 10 requests/sec
even after the host is seen as down in the pool.
I've seen some recommendations to upgrade to 2.0.1 but I can't see any specific bug that's
been fixed that sounds like our problem, but we'll probably speculatively upgrade to 2.0.1
just to see.
Under what circumstances might relayd (or pf) continue to send traffic to a pool host that is
down? What kind of diagnostics can I do to get a better handle on what pf/rdr-to is doing?
Note that I'm seeing SYN packets being sent to the down host, so these are new TCP connections.
So much has changed since RC1 that you need to upgrade. I review every single commit and I can't remember offhand whether things related to that have changed, because a LOT has changed. I know for a fact what you're describing works in 2.0 and 2.0.1 release versions, prior to that so much has changed that I'm not sure. It may have been broken for a couple hours that day and you just got an unlucky snapshot, lots of possibilities.
Ok, I guessed you were going to say that. :)
Just as a quick hint, what can I look at besides 'relayctl show summary' to see where
in the redirection rulesets it still thinks the down host is a suitable redirection target?
If that's not an easy answer, that's fine, and thanks for the upgrade suggestion.
check the pfctl output for the relayd anchor. "pfctl -a relayd -sn" IIRC but that's off the top of my head, it's close to that if that's wrong.
Just as an update, the upgrade to 2.0.1 seemed to resolve the symptoms we saw and
the load balancing is behaving as advertised now. :)
Anywhere I can get a concise list of relayd (or related pf changes)? I'd love to
track down where this one got fixed.
too many different areas that could impact it and too big a timeframe for there to be any remotely short list. Could be in the front end or back end PHP source in one git repo, in any number of kernel patches in another repo, and you'd have to wade through 1000+ changes. If you want to dig, http://github.com/bsdperimeter/pfsense and http://github.com/bsdperimeter/pfsense-tools
Ok, thanks. Do pfsense make any custom changes to the pf code in the FreeBSD kernel?
Yes we do, the patches are all in the tools repo.