CARP performance issue on ESXi
-
Hi,
I am having issues with the CARP performance on my ESXi/Vmware environnement.
I have an cluster of 4 ESX with DvSwitch on v6.7. All this ESX are connected with 4 x 10Gb/s (1 Management, 1 vMotion, 2 VM Network). On 2 of them, I have 2 pfsense configured as a HA cluster. Each of them have 2 physicals interfaces : 1 connected to WAN network, and 1 to a "Trunk" network, both of them are VMXnet3. I have enabled promiscious, mac modification and forged transmission on DVswitch port allocated to them. I have 5 "VLAN interfaces" configured in addition to the WAN interface. I configured CARP on each network interface so that VM dont loose connection. The cluster is working fine: if the master reboot or loose an interface, the slave get the VIP back and the vm don't even loose the connection.
My problem is that I have a speed problem on VIP, while it is working great directly on the firewall IP. For example, I put 3 vms on the same ESX to not take in account network :
- A = VM on Wan network, with IP 172.17.254.247
- B = VM on Lan network, with IP 172.19.18.33
- P = pfSense with IP 172.17.254.42 and 172.19.18.251, and CARP VIP 172.17.254.43 and 172.1918.254 (obviously there is more interfaces, but not usedull in this example)
If i run iperf3 on A to connect to P on WAN interface, I have this results :
Connecting to host 172.17.254.42, port 5250 [ 4] local 172.17.254.247 port 34838 connected to 172.17.254.42 port 5250 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 2.11 GBytes 18.2 Gbits/sec 0 860 KBytes [ 4] 1.00-2.00 sec 1.56 GBytes 13.4 Gbits/sec 45 1.05 MBytes [ 4] 2.00-3.00 sec 2.45 GBytes 21.1 Gbits/sec 0 1.17 MBytes [ 4] 3.00-4.00 sec 2.23 GBytes 19.2 Gbits/sec 0 1.26 MBytes [ 4] 4.00-5.00 sec 1.41 GBytes 12.1 Gbits/sec 5 966 KBytes [ 4] 5.00-6.00 sec 788 MBytes 6.61 Gbits/sec 0 1.01 MBytes [ 4] 6.00-7.00 sec 2.59 GBytes 22.2 Gbits/sec 0 1.05 MBytes [ 4] 7.00-8.00 sec 1.98 GBytes 17.0 Gbits/sec 0 1.08 MBytes [ 4] 8.00-9.00 sec 2.23 GBytes 19.2 Gbits/sec 0 1.15 MBytes [ 4] 9.00-10.00 sec 2.22 GBytes 19.1 Gbits/sec 0 1.24 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 19.6 GBytes 16.8 Gbits/sec 50 sender [ 4] 0.00-10.00 sec 19.6 GBytes 16.8 Gbits/sec receiver iperf Done.
If i run iperf3 on A to connect to P on WAN VIP interface, I have this results :
Connecting to host 172.17.254.43, port 5250 [ 4] local 172.17.254.247 port 57642 connected to 172.17.254.43 port 5250 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 1.04 GBytes 8.89 Gbits/sec 0 479 KBytes [ 4] 1.00-2.00 sec 1.08 GBytes 9.26 Gbits/sec 0 1024 KBytes [ 4] 2.00-3.00 sec 1.09 GBytes 9.36 Gbits/sec 0 1.06 MBytes [ 4] 3.00-4.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.10 MBytes [ 4] 4.00-5.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.11 MBytes [ 4] 5.00-6.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.12 MBytes [ 4] 6.00-7.00 sec 1.09 GBytes 9.34 Gbits/sec 0 1.14 MBytes [ 4] 7.00-8.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.15 MBytes [ 4] 8.00-9.00 sec 1.09 GBytes 9.34 Gbits/sec 0 1.15 MBytes [ 4] 9.00-10.00 sec 1.09 GBytes 9.35 Gbits/sec 0 1.15 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 10.8 GBytes 9.29 Gbits/sec 0 sender [ 4] 0.00-10.00 sec 10.8 GBytes 9.29 Gbits/sec receiver iperf Done.
If i run iperf3 on B to connect to P on LAN interface, I have this results :
Connecting to host 172.19.18.251, port 5250 [ 4] local 172.19.18.33 port 34702 connected to 172.19.18.251 port 5250 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 2.87 GBytes 24.7 Gbits/sec 14 1.31 MBytes [ 4] 1.00-2.00 sec 691 MBytes 5.80 Gbits/sec 0 1.31 MBytes [ 4] 2.00-3.00 sec 2.73 GBytes 23.5 Gbits/sec 106 1.19 MBytes [ 4] 3.00-4.00 sec 638 MBytes 5.35 Gbits/sec 0 1.19 MBytes [ 4] 4.00-5.00 sec 2.53 GBytes 21.7 Gbits/sec 178 889 KBytes [ 4] 5.00-6.00 sec 508 MBytes 4.26 Gbits/sec 0 947 KBytes [ 4] 6.00-7.00 sec 2.57 GBytes 22.1 Gbits/sec 83 933 KBytes [ 4] 7.00-8.00 sec 679 MBytes 5.69 Gbits/sec 0 1.01 MBytes [ 4] 8.00-9.00 sec 2.53 GBytes 21.7 Gbits/sec 72 1.22 MBytes [ 4] 9.00-10.00 sec 515 MBytes 4.32 Gbits/sec 0 1.22 MBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec 453 sender [ 4] 0.00-10.00 sec 16.2 GBytes 13.9 Gbits/sec receiver iperf Done.
with a weird ip/down alternance.
If i run iperf3 on B to connect to P on LAN VIP interface, I have this results :
Connecting to host 172.19.18.254, port 5250 [ 4] local 172.19.18.33 port 43800 connected to 172.19.18.254 port 5250 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 258 MBytes 2.16 Gbits/sec 515 267 KBytes [ 4] 1.00-2.00 sec 313 MBytes 2.63 Gbits/sec 592 296 KBytes [ 4] 2.00-3.00 sec 313 MBytes 2.63 Gbits/sec 488 249 KBytes [ 4] 3.00-4.00 sec 309 MBytes 2.59 Gbits/sec 542 209 KBytes [ 4] 4.00-5.00 sec 319 MBytes 2.68 Gbits/sec 513 293 KBytes [ 4] 5.00-6.00 sec 315 MBytes 2.64 Gbits/sec 752 419 KBytes [ 4] 6.00-7.00 sec 313 MBytes 2.63 Gbits/sec 555 269 KBytes [ 4] 7.00-8.00 sec 318 MBytes 2.67 Gbits/sec 670 296 KBytes [ 4] 8.00-9.00 sec 319 MBytes 2.68 Gbits/sec 463 297 KBytes [ 4] 9.00-10.00 sec 314 MBytes 2.64 Gbits/sec 527 287 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 3.02 GBytes 2.59 Gbits/sec 5617 sender [ 4] 0.00-10.00 sec 3.02 GBytes 2.59 Gbits/sec receiver iperf Done.
with no up/down, but slower speed.
We can see that each time I test on CARP interface, it's slower, and by a lot on the LAN interface considering that we are on the same ESX ...
But the worst is If I try to test between A and B :
Connecting to host raoult.test.esante-bfc.fr, port 5201 [ 4] local 172.17.254.247 port 47080 connected to 172.19.18.33 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 26.2 MBytes 220 Mbits/sec 139 24.0 KBytes [ 4] 1.00-2.00 sec 20.1 MBytes 168 Mbits/sec 53 36.8 KBytes [ 4] 2.00-3.00 sec 26.5 MBytes 222 Mbits/sec 52 36.8 KBytes [ 4] 3.00-4.00 sec 18.3 MBytes 154 Mbits/sec 56 15.6 KBytes [ 4] 4.00-5.00 sec 27.3 MBytes 229 Mbits/sec 76 36.8 KBytes [ 4] 5.00-6.00 sec 19.6 MBytes 165 Mbits/sec 38 38.2 KBytes [ 4] 6.00-7.00 sec 9.23 MBytes 77.4 Mbits/sec 146 28.3 KBytes [ 4] 7.00-8.00 sec 17.2 MBytes 144 Mbits/sec 70 31.1 KBytes [ 4] 8.00-9.00 sec 3.90 MBytes 32.7 Mbits/sec 56 52.3 KBytes [ 4] 9.00-10.00 sec 18.1 MBytes 152 Mbits/sec 126 26.9 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 186 MBytes 156 Mbits/sec 812 sender [ 4] 0.00-10.00 sec 186 MBytes 156 Mbits/sec receiver iperf Done.
Connecting to host 172.17.254.247, port 5201 [ 4] local 172.19.18.33 port 57548 connected to 172.17.254.247 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 182 MBytes 1.53 Gbits/sec 5085 161 KBytes [ 4] 1.00-2.00 sec 195 MBytes 1.64 Gbits/sec 3487 134 KBytes [ 4] 2.00-3.00 sec 140 MBytes 1.18 Gbits/sec 2529 67.9 KBytes [ 4] 3.00-4.00 sec 152 MBytes 1.28 Gbits/sec 2993 204 KBytes [ 4] 4.00-5.00 sec 138 MBytes 1.16 Gbits/sec 2839 136 KBytes [ 4] 5.00-6.00 sec 164 MBytes 1.38 Gbits/sec 3113 86.3 KBytes [ 4] 6.00-7.00 sec 132 MBytes 1.11 Gbits/sec 2538 263 KBytes [ 4] 7.00-8.00 sec 178 MBytes 1.49 Gbits/sec 4870 256 KBytes [ 4] 8.00-9.00 sec 174 MBytes 1.47 Gbits/sec 4615 151 KBytes [ 4] 9.00-10.00 sec 170 MBytes 1.42 Gbits/sec 3471 140 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 1.59 GBytes 1.36 Gbits/sec 35540 sender [ 4] 0.00-10.00 sec 1.59 GBytes 1.36 Gbits/sec receiver iperf Done.
I can not explain this horrible perf :/
I tried a lot of twist, mostly with the Offload parameters, and th fastest is with all offload uncheck in advenced->networking. The LRO had the most impact : if I check it, the speed is divided by 2 on VIP and 4 on direct interface ! I tried to change the packetrouting/balance from the DvSwitch, but was not able to dtermine the best one in my case.
I searched a lot, but most speed problem refer to offload, and from what I read, CARP should not been impacted with pfSense parameters, but with network configuration. I tried to look and change settings on dvswitch, but so far nothing. If someone have any idea, I am ready to listen.
-
After more test, the more balancer perf I can get are finally with the LRO offload check : it decrease my iperf with the firewall interface a lot (2-3Gb/s instead of 15-20Gb/s), but increase the iperf going throught the firewall, between A and B (2-3Gb/s instead or less than 500Mb/s).
I did all these test on the same ESX, so where are my speed ???