How to improve firewall throughput of virtualized pfSense
-
Hey all,
I am asking for your input on what to try to improve my Firewall throughput which should be a solid 1gbits but is actually around 500-700 maybe 800 in good moments.
My setup:
ml350p gen8 with 2x E5-2690 ~200Gb ram, VMware ESXi, 7.0.3, 19482537
the virtual pfsense has 10 cores of one of the chips for its own and 12Gb of RAM.as per the guide i created a virtual switch in esxi and created a port group for each vlan I need. the pfsense is connected to each of these port groups as well as the other VMs
(ignore vlan0400)the virtual switch in turn is connected to a real switch through 4 of the servers interfaces (LACP, trunk). On this real switch I reside for example.
pfsense is running 22.05-RELEASE, but the problem persisted since I started 4 years ago
packages enabled that probably have the most impact are snort and pfblockerthe vlans are separated by firewall rules, for testing purposes I enabled a rule to allow ipfers 5201 from vlan0020 to vlan0010 and reverse.
situation:
alright, if I now test from a vmin vlan0010 to a device connected on the real switch that is also in vlan0010, i get more or less a solid gigabit. so I can rule out that the virtual switch, the servers ports, my real switch, or my device to test is source of the issue.If I now try the same thing from a vm in vlan0020 to my test device on the real switch in vlan0010 i only get the described 500-750 mbps.
So it appears to be the firewall.
what I tried:
- I tried to change the recommended VMXNET3 Interface to E1000 and back
- I disabled in pfSense:
-
- Hardware Checksum Offloading
-
- Hardware TCP Segmentation Offloading
-
- Hardware Large Receive Offloading
-
- the aforementioned higher impact packages like snort
strangely, when I run the test long enough I can see in htop that one core maxes out to 100% while the others are more or less idling at 10-20%. I know my cores are ten years old but is this really an issue her? another issue that often gets listed is drivers but i tried my very best to get the latest SPPs for my HP, i am using 4-port 331T Adapters and esxi seems to have no issue with it otherwise (hence the 1gps when not going through fw)
... any input is highly appreciated!
-
in the turn of writing this I checked a few things and googled the newest state of things. I unchecked " disable Hardware Checksum Offloading", and " disable Hardware TCP Segmentation Offloading" via the UI and added
net.inet.tcp.tso=0
andnet.inet.udp.checksum=0
to my system tunables. Now speed through pfSense reaches the 90MByte/s mark, almost there. Seems like the UI settings did not properly set the setting? I dont know...If I ENABLE "Hardware Large Receive Offloading (LRO)", my speed through the firewall is abysmal low, like 2mbits/s. But I noticed that there are no maxed out cores anymore when doing this.
So I guess, my pfSense is not properly using the LRO. I am not sure why or what is going on. But if I disable LRO I have the 90MByte/s but a maxed out core (which is probably the reason for not getting full gigabit speed) and if I enable it I can only get 10mbit.
(LRO is enabled in esxi settings.)
-
another thing i noticed is that directly after a reboot, the throughput is almost a gigabit but over a few dozens seconds after the boot it starts to degrade and after maybe 5 minutes we are back at 700-800 and sometimes lower
-
ok, seems like a lot is going on here. after the setbacks of the last post I went ahead and disabled packages that I was not suspecting of having a great impact on performance, like BandwidthHD and Darkstat. And that seemed to improved things somewhat. now, even hours after a reboot I get up to 900 mbits/s, so almost gigabit. Though, when running iperf, the numbers are all over the place, as you can see here:
[ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-1.00 sec 76.1 MBytes 638 Mbits/sec 858 660 KBytes [ 5] 1.00-2.00 sec 86.2 MBytes 723 Mbits/sec 0 754 KBytes [ 5] 2.00-3.00 sec 97.5 MBytes 818 Mbits/sec 35 625 KBytes [ 5] 3.00-4.00 sec 95.0 MBytes 797 Mbits/sec 0 732 KBytes [ 5] 4.00-5.00 sec 90.0 MBytes 755 Mbits/sec 14 611 KBytes [ 5] 5.00-6.00 sec 87.5 MBytes 734 Mbits/sec 0 714 KBytes [ 5] 6.00-7.00 sec 71.2 MBytes 598 Mbits/sec 6 570 KBytes [ 5] 7.00-8.00 sec 93.8 MBytes 786 Mbits/sec 0 684 KBytes [ 5] 8.00-9.00 sec 108 MBytes 902 Mbits/sec 0 796 KBytes [ 5] 9.00-10.00 sec 109 MBytes 912 Mbits/sec 0 894 KBytes [ 5] 10.00-11.00 sec 96.2 MBytes 807 Mbits/sec 156 694 KBytes [ 5] 11.00-12.00 sec 77.5 MBytes 650 Mbits/sec 1 576 KBytes [ 5] 12.00-13.00 sec 108 MBytes 902 Mbits/sec 0 704 KBytes [ 5] 13.00-14.00 sec 104 MBytes 870 Mbits/sec 0 809 KBytes [ 5] 14.00-15.00 sec 106 MBytes 891 Mbits/sec 55 679 KBytes [ 5] 15.00-16.00 sec 104 MBytes 870 Mbits/sec 8 576 KBytes [ 5] 16.00-17.00 sec 98.8 MBytes 828 Mbits/sec 0 697 KBytes [ 5] 17.00-18.00 sec 108 MBytes 902 Mbits/sec 0 806 KBytes [ 5] 18.00-19.00 sec 92.5 MBytes 776 Mbits/sec 46 660 KBytes [ 5] 19.00-20.00 sec 95.0 MBytes 797 Mbits/sec 0 766 KBytes
When I disable openvpn I get slightly better results, as well as when I disable pfBlocker, but for me they are worth the impact, for now.
Still, I am convinced that if I get to make Hardware LRO work my throughput would improve. Any input on how to make that work is highly appreciated.