Multi-Core Advantages in pfSense
-
Hi all,
I realize this has been discussed a few times on these boards, but I'm curious what the current view is (especially with pfSense now running on FreeBSD 14) of how much advantage there is (if any) to running multiple slower speed processor cores vs. a few faster processor cores. For instance, can pfSense and FreeBSD effectively take advantage of 16 cores (or higher) for routing/firewalling, or would one be better off getting just 4 or maybe 8 faster processor cores? More specifically, what's the better alternative - having e.g. 16 cores operating at 2 GHz, or 8 cores operating at 2.5 GHz? I understand that are some packages that are single threaded and thus perform better with faster individual cores, but this is more of a general question on the advantages of parallelism in the current version of pfSense/FreeBSD. Thanks in advance.
-
This might be not so easy and short to answer as you ask
for it or you mean. It will be from my point of view more
many things comes and/or play together in that case.Xeon E3, Xeon E5 and E7 are more strong then the smaller
Intel Atom cores, but the Intel Atom cores are also different, the generation C2000 will be outperformed
by the C3000 series and the new C5000 and P5000 series
will be offering more capabilities then the both before.FreeBSD is capable of using more then one CPU core
and also supports HT and Turbo Boost. So if you are
running pfSense and you may not using PPPoE that is
only single threaded, without PPPoE it is multi threaded
and over each CPU core a network queue can be run, so
with much CPU cores and HT it might be gaining or more
profiting from the many CPU cores. On top comes that if you are using packets they are CPU multi threaded and you are not using PPPoE, I mean if this comes together
you will be seeing a performance gain for sure.If other things comes on top of this situation you are in,
like DPDK capable CPUs and/or eth ports you may get
one times more a benefit from. If QuickAssist will be
then on both sites working in the VPN part you could
see once more something will be more liquid and smooth
running, all in all it is more a game play and what is nice playing together or not.It depends also on other numbers like, how big is your entire network, how many clients you have to serve, how
many services, traffic and/or servers are inside, is there
mixed traffic such as WiFi, VOIP, File and Mail traffic or
are also one or more DMZ´s in the network, is pfSense routing all alone (the entire network) or are there other
routing devices inside? What is the topology of your network? (Central, decentral or distributed) What layers
are in the network? (Core, Distribution, Access) what
protocols are in use, VRRP, OSPF, eBGP/iBGP,.....A small Xeon E3-12xxv3 4C/8T at >3.xGHz may be a good choice, it is made for 24h/7Days and supports a large
amount of RAM up to 32 GB.Intel Atom C3000 from 2 till 16 cores might be mid ranged
placed, but for what? From what we are talking here about? Only you know it!Supermicro SYS-E300-9D-8CN8TP Server with 8C/16T
from 2.3GHZ to 3.0GHz that can be sorted with U2 SSDs
for cache and logfiles, SIM & modem and a WiFi card all
in all but small in footprint and able to add a NIC with 2.5
GBit/s ports on top.So you may see it is not easy to answer you question in short without you provide us with more informations.
-
@dobby_ - thank for the detailed reply. What prompted this question was me comparing and contrasting these two 1U systems from Supermicro:
Intel Atom C3958 based (16C): https://www.supermicro.com/en/products/system/1U/5019/SYS-5019A-FN5T.cfm
Intel Xeon D-1541 based (8C/16T): https://www.supermicro.com/en/products/system/1U/5018/SYS-5018D-FN4T.cfm
The Atom is slightly newer, but the Xeon has a faster single core speed. Processor TDP is slightly better on the Atom (31W vs 45W) and the Atom also comes with Quick Assist (QAT). Both systems come with 10Gbit RJ45 interfaces, which is a requirement. Given the choice between these two options, which would you choose and why? I see these priced quite similarly currently.
The only other requirements (besides having 10Gbit copper ports ) are a reasonable thermal budget (50W or less processor TDP would be ideal) and enough power to route WAN speeds between 1Gbit/s and 10Gbit/s. Also some remote access VPN via OpenVPN but only handful of clients. Would either of these systems work well, or would you recommend a third alternative?
Thanks again for your help and insight.
-
@tman222 the latter looks like the CPU in the 1541: https://shop.netgate.com/collections/rack-appliances/products/1541-base-pfsense
So you could compare specs.Re:packages, OpenVPN and Snort are single core.
-
@tman222 said in Multi-Core Advantages in pfSense:
@dobby_ - thank for the detailed reply. What prompted this question was me comparing and contrasting these two 1U systems from Supermicro:
Meeting all of your criteria's may be not easy to do.
This would be my choice of the C3958 series:
Supermicro SYS-E300-9A-16CN8TP - IntelAtom
processor C3958 16 Cores, 2.0GHz - 2x 10GbE SFP+, 2x 10GbE LAN, 4x 1GbE LAN, dedicated LAN for IPMI 2.0
Pros:
- 4 LAN Ports more
(2x SFP+ / 2x 10GBe / 4x 1GBit/s) - WiFi capable for using the captive portal!
Cons:
- 85 watt
- ~1500 €/$
- No DPDK
- No TurboBoost
- 1 PCIe slot less
This would be my choice of the Intel Xeon D-xxxx series:
SuperServer 5019D-FN8TPPros:
- 2x SFP+ / 2x 10GbE / 4x 1 GBit/s
(Intel x557 and Intel i350-AM4 are DPDK capable) - Intel QAT / AES-NI / TurboBoost / HT onboard
- SIM & Modem slot + M.2 Slot + miniPCIe slot
- WiFi capable for using the captive portal!
- from 2.3GHz to 3,0GHz
- fast DDR4 RAM
Cons:
- 80 Watt
- High price ~1900 €/$
- Only one PCIe slot
If the money is not there often it is better then to wait some month to spare the money and get something
that is not that cheap, but it is 100 % able to deal with
the 10 GBit/s and is a bit more, or let us say best as able
to realize it, futureproof.I don´t know at this moment how FreeBSD and or pfSense
are playing together with the performance Cores of some
CPUs so it is not that easy to answer, but things can be
find out by doing a deeper research about it.Don´t get me wrong here, but if you are using 3rd party
hardware, you may need to fine tune this hardware matching your companies traffic exactly from both
mainboards the SFP+ ports are directly connected
to the Intel SoC so they will be able to use as the DMZ
or LAN ports, but both will be offering also one PCIe
slot for adding a card if needed is no problem. I know
electric power usage is today more then a green thinking
owed to the higher prices in many countries, but if you
want to deal with 10 GBit/s and need them really as a
present throughput, it is not that point you should be
aware of. - 4 LAN Ports more
-
Thank you @Dobby_ and @SteveITS for your replies and recommendations.
I did actually look at the Netgate offerings and saw that the SYS-5018D-FN4T was was essentially the same specs as the 1541. I also saw that the Netgate 8200 has somewhat similar performance to the 1541, but uses at Atom chip instead (C3758R). This made me wonder how the C3958 would perform in comparison to the C3758R, with twice the number of cores, but operating at a slightly slower frequency (i.e. 2GHz vs. 2.4GHz):
https://www.intel.com/content/www/us/en/products/compare.html?productIds=97927,204840
It is a bit tough to find performance numbers on the C3958, but I did see it listed in the benchmark charts of this recent review at ServeTheHome:
https://www.servethehome.com/supermicro-x12sdv-10c-spt4f-review-intel-xeon-d-1749nt-motherboard/2/
I realize these benchmarks may not be the best representation of how the CPU would perform in a firewall setting, but the chip does tend to hold up well against some of the Xeon D peers.
I guess this brings me back to the original question - is there an advantage to having an extra 8 cores, e.g. for packet processing for NIC TX and RX queues so that the overall firewall throughput (pps) increases? I realize for anything that is single threaded there would be a performance hit given the lower single core frequency.
Thanks again for all your help.
-
If you run IDS/IPS then its faster CPU clock.
A XEON 4C 3,6 GHz CPU will outperform an 8C 2,2GHz CPU any time.
-
@tman222 said in Multi-Core Advantages in pfSense:
is there an advantage to having an extra 8 cores, e.g. for packet processing for NIC TX and RX queues
It depends on the NICs. For example the ix NICs on the 8200 can use all cores for both Rx and Tx:
ix0: <Intel(R) X553 N (SFP+)> mem 0x80400000-0x805fffff,0x80604000-0x80607fff at device 0.0 on pci9 ix0: Using 2048 TX descriptors and 2048 RX descriptors ix0: Using 8 RX queues 8 TX queues ix0: Using MSI-X interrupts with 9 vectors ix0: allocated for 8 queues ix0: allocated for 8 rx queues ix0: Ethernet address: 90:ec:77:47:5c:e6 ix0: eTrack 0x8000084b PHY FW V65535 ix0: netmap queues/slots: TX 8/2048, RX 8/2048
But the igc NICs only uses 4:
igc0: <Intel(R) Ethernet Controller I226-V> mem 0x81300000-0x813fffff,0x81400000-0x81403fff at device 0.0 on pci4 igc0: Using 1024 TX descriptors and 1024 RX descriptors igc0: Using 4 RX queues 4 TX queues igc0: Using MSI-X interrupts with 5 vectors igc0: Ethernet address: 90:ec:77:47:5c:e8 igc0: netmap queues/slots: TX 4/1024, RX 4/1024
However that's still 8 queues total so in a router even if using igc for WAN and LAN that could still use 8 cores effectively given multiple connections.
Steve
-
Thanks @Cool_Corona and @stephenw10 for the replies.
From what I can tell the NICs I would be using do support a large enough number of RX and TX queues so that all cores could be utilized (from what I read, the X550 based NICs can support up to 64 even). I currently don't have a need for IDS/IPS, but I'm concerned about routing throughput (i.e. pps processing capability). Could a 16 core chip at 2GHz effectively route up to 10Gbit/s given multiple processing cores and NIC RX/TX queues, or would higher single core clock speed ultimately end up being a more important factor? Thanks again for all your help.
-
It depends what the traffic is. Given a large number of connections and routing/firewalling only then it should be possible to take advantage of a large number of cores.
If you want to see the highest throughput when testing from a single client against speedtest.net then fewer cores at a higher frequency is going to give better results.
That also applies traffic passing any other process that is single threaded as mentioned. So typically Snort or OpenVPN.