HAproxy - Some Working, Some Timeout
-
I have an HA firewall pair running 2.4.5p1 with relayd. Ahead of the upgrade to 2.6.0 I installed HAproxy, leaving it disabled. I configured all the frontends, backends and firewall rules on the primary firewall by creating one of each and cloning them, changing relevant IPs and ports, likewise the WAN rules to allow 80 and 443 to the CARP VIPs hosting the HAproxy front-ends.
The upgrade on the primary firewall went fine, relayd was removed, all CARP VIPs transferred back fine from the secondary and I enabled HAproxy. Some of the frontends work, some just timeout. The HAproxy stats page says all the backends are up (some have hosts which are down, but all have hosts that are up).
I'm using basic health checks and no ACLs. The HAproxy stats page reports the backend hosts up that we expect. The firewall rules on the WAN simply allow TCP 80 and 443 from anywhere to the WAN CARP VIPs used by each HAproxy frontend.
As far as I can see the only difference between the ones that work and the ones which don't is that the backend hosts live on different LANs. Those on one LAN work, those on two others don't. There were no packets being blocked by the firewall that I've observed. With a short window for diagnostics, no obvious visible config problems and needing to get it working again I failed back to the secondary firewall which was still on 2.4.5p1 using relayd. I downloaded the config from the upgraded primary firewall and have observed the same issue in a replica environment.
I tcpdumped my WAN and VLAN interfaces and I can observe requests come in, hit the backends and come back again so I don't understand why it's timing out, though I'm not sure of the content of what is coming back.
Curiously, before I was able to investigate much, people reported that the secondary firewall wasn't passing traffic to the relayd virtual servers that had been working under HAproxy but the ones that didn't work under HAproxy were working. Again, with the maintenance window running short for investigation I had to reinstall the primary with 2.4.5p1 and restore it's original config. Now everything is working perfectly again.
Has anybody seen such issues or know how I can diagnose it? I can't find any obvious config differences between the two firewalls in terms of interfaces, HAproxy/relayd config or firewall rules but it must be something weird or a failure of understanding on my part. It seems too coincidental that the ones that worked in HAproxy on the primary didn't work with relayd on the secondary firewall and the ones that didn't work in HAproxy were fine on the secondary, and they all work fine with relayd on the primary.
I'm building a secondary firewall to fully replicate the environment as best as I can so config and screenshots to follow. Happy to provide anything that you think might help.
-
Finished building a replica environment with 2 firewalls on 2.6.0. I failed over to the secondary firewall and am somewhat pleased to see that the HAproxy frontends that worked and didn't work on the primary are the same on the secondary so that rules out any weird split brain config nightmares.
-
So the issue appears to be to do with using "Use Client-IP to connect to backend servers" in my backends. Some work with it enabled, some don't and I don't fully understand why that is, but turning it off on the backends for the frontends that don't work means they start working.
-
@ads76
Possibly a routing issue?Transparent mode is a bad hack anyway and should not be used.
-
Thanks for your suggestion man. It turns out I'm an idiot who didn't understand quite how it worked and didn't look at the settings properly (isn't that always the case?). I leave the explanation here for anyone who makes the same mistake in future.
I enabled "Use Client-IP to connect to backend servers", but didn't set the interface it should use to connect to the backend servers in the drop-down which appears when you enable the use client-IP setting. That's why the LAN ones worked, it's the default interface setting, but the ones on other LANs didn't and I'm a bonehead for not noticing it needed to be set to the correct interface. Literally the last line of this guide brought it to my attention:
https://github.com/PiBa-NL/pfsense-haproxy-package-doc/wiki/haproxy_pass_clientip_to_webserver
You're probably right about the routing on the backend devices for the issue when I was failed over to secondary firewall, I've asked the people that run it to check as I don't have access.
Thanks for taking the time to reply.
-
@ads76
Thanks for coming back with the solution.