Bad performance on high volume traffic

fusionp

Hi all,

I'm an ISP with 800 odd users, my traffic in the evenings reaches and maintains around 280Mb total download, this is spread over 12 upstream ISP circuits.
Pfsense is running virtualized on ESXi6, it's got 4cpu, 3.4ghz and 2gb ram. It works perfectly fine up until the traffic reaches it's peak then IPTV streams start buffering and customers start complaining. At this time though the resource usage on the dashboard all look fine, usage is very low, CPU around 10%to 20%, state tables around the same usage, mbuf around 10%. All my lines are balancing fine, none of them are dropping packets, the latency on my upstream circuits are normal. The users that are experiencing performance issues are noticing the performance hits in waves, works perfect for 20 minutes then almost non existent internet for the next few minutes.

I had this device in testing with a dozen people connected and all was well, seems like it's battling with volumes however all indicators seem fine.

The only tweaks I've done since building this firewall are:

I've turned off the firewall logging for "log firewall blocks"

FW maximum states is set to 220000

under system tunables

net.inet.ip.intr_queue_maxlen Tuning intr_queue 3000 (I will try push this up to 5000)

kern.ipc.nmbclusters Mbuf 131072

ram disk enabled for /tmp and /var, default sizes.

I will now try set the firewall optimization options to "aggressive" but not sure if this will have any affect, just something to try.

Any suggestions.

heper

check your cpu usage in esxi. (this will show that you are using much more cpu-power then what pfSense in showing)
what version of pfSense are you running? what vNics are you using?

fusionp

Hi Heper, I'm using vmnex 3 for all my 32Mb connections, I have 1x 100Mb fibre and I have one card with direct passthrough for this circuit, the LAN is vmnex3 though. vm tools is installed.

I've added an additional CPU now, so 2 sockets with 2 cores, 4x cpu's.

I'm running 2.2.4-RELEASE (amd64).

At present I have around 150mb crossing over the VM's WAN links, cpu reading on the host is averaging 15%, this evening the traffic will get closer to double.

I tried using "aggressive" firewall state table optimization but I immediately had some adverse affects, so I'm trying the opposite now and have set this to conservative. Of course my state table usage is now at 33%, at what percentage would it be best to increase this? Is it fine to allow this to run up to 70%?
I've also enabled and set the sticky connections to 5s. I'm not to sure if this will help in any way, but I feel I need to try something different to my current setup and see if it alleviates the issue, of course making too many changes at once also just muddies the waters here.

Any suggestions? change the vnic to e1000? Turn off the sticky connections? Set the state-table opt back to "normal"?

heper

reinstall and do NOT use vmware-tools and not even the opensource variant of them.

please report back after testing.

KOM

Also, ESXi 6.0.0 Update 1a came out just a few days ago.