PfSense Dual 10GbE ESXi 6U2 Slow

alfredo

Dear List,

We are using a server with two 10 GbE Optical Cards connected to a vSwitch 'EXTERNAL/WAN', where pfSense is the only other connection - with vmxnet3. So theoretically, pfSense should be able to get 10-20 GbE. A similarly connected server has accomplished 800MB/s (non-vm,non pfSense) without using jumbo frames, and, as such, we expect us to achieve the same. However, with pfSense, we only get a max of 275 MB/s and the clients behind pfSense 250 MB/s.

By Changing System -> Advanced -> Networking -> Unchecked (e.g. enabled)

Disable hardware TCP segmentation offload
Disable hardware large receive offload

Then pfSense was able to get 600 MB/s, however the clients behind it reduced significantly to only 80 KB/s. So not a good idea.

However we know that things can go 600 MB/s or better, so what should we do to have 600+ on to firewall -and- the clients behind it?

TEST: curl http://lg.core-backbone.com/files/10000MB.test > /dev/null

Note: we have currently activated 8 (out of 56) cores, and don't think CPU is an issue.

Thanks for any help in getting the performance we are looking for.

sherkas

This is a related post (not a VM setup but still about speeds and such). You should take a read and try out what they did and see if it helps.

https://forum.pfsense.org/index.php?topic=113011.0

alfredo

Please note that we are talking about Mega Bytes per Second. Changing the offloading upped it from 300 to 600 but then the clients behind it suffered. Changing:

hw.pci.enable_msix=0
hw.pci.enable_msi=0

did not help:

Interestingly, a 'top -SH' reveals

last pid: 9532; load averages: 0.51, 0.20, 0.11 up 0+04:57:10 19:03:27
159 processes: 11 running, 118 sleeping, 30 waiting
CPU: 0.8% user, 0.0% nice, 5.6% system, 6.8% interrupt, 86.7% idle
Mem: 20M Active, 121M Inact, 225M Wired, 57M Buf, 7557M Free
Swap: 2047M Total, 2047M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 128K CPU4 4 296:40 100.00% idle{idle: cpu4}
11 root 155 ki31 0K 128K CPU7 7 296:37 100.00% idle{idle: cpu7}
11 root 155 ki31 0K 128K CPU1 1 295:31 100.00% idle{idle: cpu1}
11 root 155 ki31 0K 128K RUN 3 296:35 96.97% idle{idle: cpu3}
11 root 155 ki31 0K 128K CPU5 5 296:32 93.99% idle{idle: cpu5}
11 root 155 ki31 0K 128K RUN 6 296:27 89.99% idle{idle: cpu6}
11 root 155 ki31 0K 128K CPU2 2 296:37 86.96% idle{idle: cpu2}
12 root -92 - 0K 512K CPU0 0 3:55 55.96% intr{irq258: vmx0}
86328 root 52 0 56664K 6828K select 3 0:11 48.97% curl{curl}
11 root 155 ki31 0K 128K RUN 0 292:34 48.00% idle{idle: cpu0}

alfredo

Any Ideas?

alfredo

bump

heper

as far as i know the freebsd kernel will not be able to achieve such troughput at this time (specially not virtualized)
there is a big difference between sending/receiving that ammount of traffic & routing/firewalling such ammount of throughput

read up on netmap-fwd that will fix this:
https://blog.pfsense.org/?p=1866

alfredo

Is this something we can install on pfSense?

We consider our hardware in unlimited for such task. We could go up to 56 cores, if needed.

On the net map-fwd GitHub page, we saw values of only 600 Mbps; we are already at 300 MB/s (2400 Mbps) and are looking for 700 MB/s + in order to get closer to the full 10GbE.

What do the pfSense experts have to say?

heper

600Mbps on a quadcore atom

alfredo

Sure; we have faster HW, but how do we make it work?

cmb

@alfredo:

On the net map-fwd GitHub page, we saw values of only 600 Mbps; we are already at 300 MB/s (2400 Mbps) and are looking for 700 MB/s + in order to get closer to the full 10GbE.

You're looking at the completely wrong number. Mbps/Gbps means nothing at all, pps is what matters. That's 600 Mbps at minimum size packets, over 1 Mpps. That'd be upwards of 10 Gbps at the average packet size of typical Internet traffic.

@alfredo:

Is this something we can install on pfSense?

Not at this time.

You might be able to squeeze a bit more than what you're currently getting through ESX, but I think the best I've seen or heard of at large packet sizes inside ESX is roughly 4 Gbps at 1500 MTU.

alfredo

Hi Chris,

Thanks for your response. I guess Mbps is not always the same. ;)

Could you provide some instructions or would we have engage your services to obtain -at least - the 4Gps? This is important to us.

Thanks so kindly,

Alfredo.

alfredo

Instructions please.

cmb

Check the "go faster" box.

There are no instructions, or anything I'm aware of to impact what you're getting.

Soyokaze

@alfredo:

Interestingly, a 'top -SH' reveals

last pid: 9532; load averages: 0.51, 0.20, 0.11 up 0+04:57:10 19:03:27
159 processes: 11 running, 118 sleeping, 30 waiting
CPU: 0.8% user, 0.0% nice, 5.6% system, 6.8% interrupt, 86.7% idle
Mem: 20M Active, 121M Inact, 225M Wired, 57M Buf, 7557M Free
Swap: 2047M Total, 2047M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 155 ki31 0K 128K CPU4 4 296:40 100.00% idle{idle: cpu4}
11 root 155 ki31 0K 128K CPU7 7 296:37 100.00% idle{idle: cpu7}
11 root 155 ki31 0K 128K CPU1 1 295:31 100.00% idle{idle: cpu1}
11 root 155 ki31 0K 128K RUN 3 296:35 96.97% idle{idle: cpu3}
11 root 155 ki31 0K 128K CPU5 5 296:32 93.99% idle{idle: cpu5}
11 root 155 ki31 0K 128K RUN 6 296:27 89.99% idle{idle: cpu6}
11 root 155 ki31 0K 128K CPU2 2 296:37 86.96% idle{idle: cpu2}
12 root -92 - 0K 512K CPU0 0 3:55 55.96% intr{irq258: vmx0}
86328 root 52 0 56664K 6828K select 3 0:11 48.97% curl{curl}
11 root 155 ki31 0K 128K RUN 0 292:34 48.00% idle{idle: cpu0}

Try with only 1 vCPU. Just try.

Does plaing with "Disabling checksum offload" change anything?