HAProxy TCP/use client ip and carp cluster problem

brlamnr

I had HAPROXY working fine in [TCP mode/Use client IP], pfsense was configured as a router (no NAT). Using tcpdump, I can see traffic on the internal interface reaching the load balanced servers with the client IP address, and on their way back on the external side, I can see them leaving the Pfsense LB with the IP address where HAProxy is listening.

The client I am using for testing is outside and the backend servers use the Pfsense box as their default gateway, no other static routes are present.

I had to add redundancy and added a second box (they are using a Chelsio 10Gbps NIC each, LAN/WAN are vlans off this interface) and configured CARP, no issues, with the pfsense devices, looks OK, the virtual IPs answer, configs sync, etc.

HAProxy no longer works as expected, the client no longer receives the packets on their way back. A tcpdump session on both LAN and WAN interfaces show that the servers are seeing the client's packets with their native IP addresses, but, on their way back, most of the packets don't get through the master pfsense. Just in case, HAProxy was configured to listen on the VIP and all the CARP related settings were enabled (no issues there).

So, I disabled carp temporarily, configured HAPROXY to listen on the WAN interface, and tried to connect from the client to this new IP address, same behavior, didn't work.

I removed carp and left only one appliance, just as before, and things started working again using TCP mode/Client IP.

When I disable "Use Client IP" (using CARP), the client works, but, the servers see the LB's IP address.

When I configure ssl ofloading (using CARP), the client also connects fine.

Any ideas what can be causing the behavior mentioned above?

Thanks.

PiBa

Not really sure, but going to give it a guess anyhow :).

Are the servers using the proper carp gateway to send the reply to when using 'Client IP' ?
Perhaps try a "tcpdump -eni <lan-nic> host 192.168.1.2 and port 80" on a ssh session and check what traffic is being send/received between haproxy and webserver? Also make sure the correct mac-addresses are indeed used if you seem to see the proper traffic..
Check for the Syn [S],SynAck [S.], Ack[.] handshake between haproxy and webserver that would be the start of the TCP connection that is probably already failing to happen..

b.t.w. replies from the server would not need to make it to the wan side 1on1, as haproxy is the actual client for the webserver, the ipfw firewall rule should intercept reply traffic, and haproxy decides when its time to send a reply to a possible client if applicable.. This is a separate tcp connection and does not necessarily have the same packet sizes nor count..

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

Not really sure, but going to give it a guess anyhow :).

Any help is appreciated, thanks.

Are the servers using the proper carp gateway to send the reply to when using 'Client IP' ?

Yes, as a matter of fact, everything was working fine when there was only one appliance, the only thing I did when adding the second one was to assign new physical IP addresses to the working appliance, and add the original ones as virtual IPs. When I first tried, I only had only the carp master up (same behavior after I added the second one, showing up as backup in the Carp panel), packets would not get back to the client.

Perhaps try a "tcpdump -eni <lan-nic> host 192.168.1.2 and port 80" on a ssh session and check what traffic is being send/received between haproxy and webserver? Also make sure the correct mac-addresses are indeed used if you seem to see the proper traffic..

When I was troubleshooting the problem, I had tcpdump listening on both the LAN and WAN interfaces (port 443) on the haproxy appliance:

Not using client-ip (works fine), on the LAN side, the SRC addr is the one from the LAN interface of HA-Proxy. Replies from the server are to the SRC addr and all works fine (client sees the virtual WAN IP address).
When using client-ip, on the LAN side, the SRC addr is the one from the client, and the replies from the server are also to the client's IP address.

Check for the Syn [S],SynAck [S.], Ack[.] handshake between haproxy and webserver that would be the start of the TCP connection that is probably already failing to happen..

The handshake sequence looks OK (as seen on both tcpdump captures from the LAN and WAN interfaces), the problem starts after the client sends the "client hello", the server answers with "server hello change cypher", but, this packet doesn't show up on the WAN interface, and of course, never reaches the client. The client then sends a "TCP retransmission Client hello" (which is seen by the server), ACKd and then retransmits the "server hello change cypher", which, is not received by the client (doesn't show up on the the WAN interface).

I can send you the tcpdump capture files if you would like to take a look.

b.t.w. replies from the server would not need to make it to the wan side 1on1, as haproxy is the actual client for the webserver, the ipfw firewall rule should intercept reply traffic, and haproxy decides when its time to send a reply to a possible client if applicable.. This is a separate tcp connection and does not necessarily have the same packet sizes nor count..

I am going to move on to ssl offloading (worked fine with CARP on the same physical appliances), the reason why I tried to use client-ip was because the CPU on the appliances are Intel ATOM and not likely to handle the load. I am switching to Virtualized Pfsenses on high end platforms, with CPU/bandwidth to handle the expected utilization.

For now, I will leave it running with "client-ip" disabled, until the replacement virtual appliances are deployed.

Again, thanks for your help.

PiBa

@brlamnr said in HAProxy TCP/use client ip and carp cluster problem:

The handshake sequence looks OK (as seen on both tcpdump captures from the LAN and WAN interfaces), the problem starts after the client sends the "client hello", the server answers with "server hello change cypher", but, this packet doesn't show up on the WAN interface, and of course, never reaches the client

If you take 1 of the 2 pfSense systems offline, does that 'fix' the issue? Even if you do use carp on that then single box.?
Do you have pfSync state synchronization enabled? Are real interfaces ordered the same? (wan=em1 lan=em2 and wan=em1 lan=em2) or something similar.. Does the firewall-log log something? Can you disable pfSync for testing? As for the tcpdump, please do check the mac-addresses in the traffic, not only the ip-addresses the -e parameter of tcpdump shows them.. Though i guess if first tcp handshake succeeds those are not the actual issue..

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr said in HAProxy TCP/use client ip and carp cluster problem:

The handshake sequence looks OK (as seen on both tcpdump captures from the LAN and WAN interfaces), the problem starts after the client sends the "client hello", the server answers with "server hello change cypher", but, this packet doesn't show up on the WAN interface, and of course, never reaches the client

If you take 1 of the 2 pfSense systems offline, does that 'fix' the issue? Even if you do use carp on that then single box.?

No, it didn't fix the issue. I started with only 1 pfsense in the CARP pair (I hadn't build the 2nd one yet), the problem started there. I then added the 2nd one thinking HAproxy would be expecting it, no luck, same behavior.

I disabled carp manually on the primary, the backup became master, but, no changes, didn't work.

Do you have pfSync state synchronization enabled? Are real interfaces ordered the same? (wan=em1 lan=em2 and wan=em1 lan=em2) or something similar.. Does the firewall-log log something? Can you disable pfSync for testing? As for the tcpdump, please do check the mac-addresses in the traffic, not only the ip-addresses the -e parameter of tcpdump shows them.. Though i guess if first tcp handshake succeeds those are not the actual issue..

Pfsync synchronization is enabled, it also works synchronizing HA-proxy's settings. When I started, I only had 1 firewall up, pfsync wasn't active then. I will try to disable and test laster, will report back.

The order of the interfaces is correct. In my case, I am using VLANs, wan=cxl0.78, LAN=cxl0.79

There is nothing in the logs (checked using cli clog).

The tcpdump sessions were written to a file, and later reviewed with Wireshark. Now that I took a closer look at the MAC addresses, I just noticed the following (capture on the pfsense interface facing the server):

incoming packets generated from the client, show the SRC MAC address as the adapter's physical MAC (in my case, the OUI is from Chelsio), the DST mac is the server's MAC (OUI is from vmware).
outbound packets (answers from the server), show the SRC MAC as per above, but, the DST mac address is now the VRRP/CARP MAC address.
This pattern is the same, during the handshake (works) until it stops (reply to client hello).

Since it worked when no CARP was in place, it appears the problem would be in the reply to the VRRP/CARP MAC (?), but then, why the reply to the SYN gets back fine?

Thanks.

PiBa

@brlamnr
You do have the vSwitch of ESX configured to allow spoofing and promiscuous mode? Im not even sure packets should ever be using the mac of vmware host itself.. Or do you have the real hardware nic's passed through to the pfSense VM? Ive never done that..

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr
You do have the vSwitch of ESX configured to allow spoofing and promiscuous mode? Im not even sure packets should ever be using the mac of vmware host itself.. Or do you have the real hardware nic's passed through to the pfSense VM? Ive never done that..

That's a good point, while preparing to deploy the virtual pfsense that will eventually replace the ones above, I read about the vswitch requirements, I didn't think about it since the ones with the problem are physical appliances connected to a physical switch, but, you never know. I will ask to have those settings enabled in the vmswitch (or port group) where the servers connect to. Will report back.

Thanks.

PiBa

@brlamnr
Only the pfSense VM that is using the carp ip would need to have such special vSwitch 'permissions'.. The webservers should not need it.. If they are currently still on hardware that should not be required.. I'm kinda running low on ideas though..

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr
Only the pfSense VM that is using the carp ip would need to have such special vSwitch 'permissions'.. The webservers should not need it.. If they are currently still on hardware that should not be required.. I'm kinda running low on ideas though..

The servers are actually virtual servers, the pfsense are physical appliances, I'll give it a try tomorrow anyway, nothing to lose.

brlamnr

@brlamnr said in HAProxy TCP/use client ip and carp cluster problem:

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr
Only the pfSense VM that is using the carp ip would need to have such special vSwitch 'permissions'.. The webservers should not need it.. If they are currently still on hardware that should not be required.. I'm kinda running low on ideas though..

The servers are actually virtual servers, the pfsense are physical appliances, I'll give it a try tomorrow anyway, nothing to lose.

The changes on the vswitch didn't make any difference. As soon as client-ip is turned on, the client stops seeing the server.

PiBa

the 'real' clients connect to haproxy, and that connection is likely still working properly.. as nothing changes on that side when enabling use-client-ip on haproxy for the backend connection.
Haproxy however probably nolonger sees the reply from the server.. which is strange.. as you see them in the packet captures...

Do you have any plugins like suricata/snort running? Or do you use the captive-portal which also uses ipfw for some 'low level' firewall tasks..

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

the 'real' clients connect to haproxy, and that connection is likely still working properly.. as nothing changes on that side when enabling use-client-ip on haproxy for the backend connection.
Haproxy however probably nolonger sees the reply from the server.. which is strange.. as you see them in the packet captures...

Do you have any plugins like suricata/snort running? Or do you use the captive-portal which also uses ipfw for some 'low level' firewall tasks..

No, there are no plugins nor captive portals. The appliance was configured to do load balancing only.

PiBa

@brlamnr
Can you check result of command 'ipfw show' ?

brlamnr

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr
Can you check result of command 'ipfw show' ?

Following after activating client-ip:

00010 0 0 fwd ::1 tcp from 10.3.128.10 443 to any in recv cxl0.79
00011 108 20412 fwd ::1 tcp from 10.3.128.11 443 to any in recv cxl0.79
65535 48732381 4651172490 allow ip from any to any

PiBa

@brlamnr
hmm it looks like that has IPv6 and IPv4 mixed together..

Mine currently look like:

00012           0              0 fwd 127.0.0.1 tcp from 192.168.8.15 444 to any in recv em1

Maybe there lies the root cause..
Can you try and add a rule manually?:

ipfw add 50 fwd 127.0.0.1 tcp from 10.3.128.11 443 to any in recv cxl0.79

even though the rule is counting traffic.. it 'seems' to work..

brlamnr

@brlamnr said in HAProxy TCP/use client ip and carp cluster problem:

@piba said in HAProxy TCP/use client ip and carp cluster problem:

@brlamnr
Can you check result of command 'ipfw show' ?

Following after activating client-ip:

00010 0 0 fwd ::1 tcp from 10.3.128.10 443 to any in recv cxl0.79
00011 108 20412 fwd ::1 tcp from 10.3.128.11 443 to any in recv cxl0.79
65535 48732381 4651172490 allow ip from any to any

It didn't work. Same behavior. Thanks.