CPU Usage when network used
-
Thanks for the update. Beyond what @stephenw10 already suggested, you might also consider changing (increasing) the processing limits on the ix interfaces using the following tunables:
dev.ix.Y.tx_processing_limit
dev.ix.Y.rx_processing_limitwhere Y = 0.....N and N is the number of ix interfaces in your system minus 1. Setting the rx and tx processing limit to -1 essentially removes the limit (i.e. makes it unlimited).
However, even with updated tunables, it appears challenging to make up almost 6.5 Gbit/s or throughput (though I could be wrong). I have a couple more questions:
-
What are specs of the machines on either side that you are using for testing? If you put them both on the same subnet, are they able to talk at 10 Gbit/s to each other?
-
Are you running any other add-on packages on pfSense currently or is this a barebones install?
Hope this helps.
-
-
I've tried setting the hw.ix.rxd values as well as the flow control but I'm not seeing the values change even after a reboot. In PFSense I created the values and put 4096.
sysctl hw.ix.rxd
hw.ix.rxd: 2048Same with this, its set to 0.
sysctl hw.ix.flow_control
hw.ix.flow_control: 3Am I doing this wrong?
The last one appears to already be 0.
sysctl dev.ix.0.fc
dev.ix.0.fc: 0Where should these be set? Are these also via sysctl?
dev.ix.Y.tx_processing_limit
dev.ix.Y.rx_processing_limit- The destination is a NAS and the source is a linux distro on a Z800 workstation. Yes when I tried them on the same vlan they reached 10G instantly. They will also reach it network-to-network when PF is disabled.
- Not much running on PFSense right now. Barely any firewall rules, mostly only configured basic connectivity, dns, pppoe/nat, and 1 VPN, most of this is not used for the reaching each network (all internal)
Thanks all for the help.
Cheers!FYI:
-
Any thoughts on the above? Hoping to make sure I've at least done this correctly.
Cheers!
-
You probably need to add them as loader variables rather than system tunables as shown here:
https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html#adding-to-loader-conf-localSteve
-
Thanks. I've put this. Does this look correct before I reboot?
#Improve Cache size
hw.ix.rxd: 4096
hw.ix.txd: 4096
#Change processing limit -1 is unlimited
dev.ix.-1.tx_processing_limit
dev.ix.-1.rx_processing_limitCheers!
-
@qwaven said in CPU Usage when network used:
Thanks. I've put this. Does this look correct before I reboot?
#Improve Cache size
hw.ix.rxd: 4096
hw.ix.txd: 4096Nope, read the syntax in the documentation again.
#Change processing limit -1 is unlimited
dev.ix.-1.tx_processing_limit
dev.ix.-1.rx_processing_limitAlso nope, read it again:
@tman222 said in CPU Usage when network used:dev.ix.Y.tx_processing_limit
dev.ix.Y.rx_processing_limitwhere Y = 0.....N and N is the number of ix interfaces in your system minus 1. Setting the rx and tx processing limit to -1 essentially removes the limit (i.e. makes it unlimited).
-
Thanks. Not sure I've seen documentation just going on what was posted earlier. However I've changed to this?
hw.ix.rxd="4096"
hw.ix.txd="4096"Also I found this, I am not clear what the difference between hw and dev is.
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"Cheers!
-
@qwaven said in CPU Usage when network used:
Thanks. Not sure I've seen documentation just going on what was posted earlier. However I've changed to this?
hw.ix.rxd="4096"
hw.ix.txd="4096"Also I found this, I am not clear what the difference between hw and dev is.
hw.ix.tx_process_limit="-1"
hw.ix.rx_process_limit="-1"Cheers!
Looks better, hw. is global while dev. is per device.
-
Yeah looks good.
Not sure why but the flow control global setting doesn't seem to work for ixgbe. It needs to be set per dev using the dev.ix.X values. That may apply to the process limits, I've never tested it.Steve
-
So thanks all for your efforts. I'm pretty much thinking I'm sol here. :)
Rebooted with those settings, confirmed I can see them applied. Tried some tests with iperf3
single stream, -P10, and 3 separate streams on different ports. Nothing has changed, still about 3G speed.In regards to flow control it looked like it was already set to 0 before so I have not forced anything via loader...etc.
Cheers!
-
flowcontrol set to 0 in hw.ix or dev.ix.X?
If the link shows as
media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
it's still enabled.Steve
-
Interesting yes it appears to be.
ifconfig | grep media media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>)
added this into the loader.conf.local but it appears to have had no effect.
dev.ix.0.fc=0 dev.ix.1.fc=0 dv.ix.2.fc=0 dev.ix.3.fc=0
then tried
hw.ix.flow_control="0"
which seems to have worked.
ifconfig | grep media media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect media: Ethernet autoselect (10Gbase-Twinax <full-duplex>) media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (1000baseT <full-duplex>) media: Ethernet autoselect (10Gbase-Twinax <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>) media: Ethernet autoselect (10Gbase-T <full-duplex>)
however after running similar iperf3 tests as before the best I've seen is about this:
[ 5] 30.00-30.04 sec 15.6 MBytes 3.11 Gbits/sec
Cheers!
-
Interesting. The opposite if what I have previously seen. Hmm.
Disappointing it did help. Might have reach the end of the road. At least for low hanging fruit type tweaks.
Steve
-
Yeah appreciate all the help from everyone. Learned a thing or two anyhow. :)
I'll likely blow this install out when I have a bit more time and virtualize it with some other stuff. Go the L3 switch route which seems like that should work.Cheers!
-
@qwaven were those latest tests you made with MTU 1500 or MTU 9000? Perhaps try again with MTU9000 set on all parts of that network segment? The results strike me some similarity as:
https://calomel.org/network_performance.html
As for other optimizations: you could check the loader.conf and sysctl.conf values setup on
https://calomel.org/freebsd_network_tuning.html
and adjust yours carefully in that direction. -
Thanks for the info. I have applied some of this "freebsd network tuning" and I seem to have managed to make it slower. :P
May play around with it a little more, will let you know if it amounts to anything.
Cheers!
-
Something here still seems off to me that you're hitting this 3Gbit/s limit and can't get further. I know the Atom CPU you're using is slower than the Xeon D I have my box, but I don't expect it to be that much slower.
A couple more questions that come to mind right now:
- Have you tried different set of SFP+ modules and/or fiber cables? If using a direct attached copper connection instead, have you tried a different cable?
- Try this for me: Open two SSH sessions to your pfSense box. On one session launch the iperf3 server, i.e. "iperf3 -s". On the other session run an iperf3 test to localhost, i.e. "iperf3 -c localhost". What do the results look like? Hopefully they're nice and high.
- Have you looked in the system's BIOS to make sure everything is configured properly? For instance, do you have any power saving setting enabled? Might want to disable them for testing (i.e. go max performance).
Hope this helps.
-
@tman222 said in CPU Usage when network used:
Something here still seems off to me that you're hitting this 3Gbit/s limit and can't get further. I know the Atom CPU you're using is slower than the Xeon D I have my box, but I don't expect it to be that much slower.
That's where I'm still confused. I have 16x2Ghz core's. Most of them are not used when I do anything which means to me that the system is not working very hard unless there is some bottleneck elsewhere I have not seen.
I'm using DAC cables. I have tried replacing them from from used ones to brand new ones. I did not see anything change in doing so. :P
Won't be able to do the test until later but I'll try and let you know.
Re bios I did check things out but I did not see anything obvious to change. I'll check again anyway.
Cheers!
-
@qwaven said in CPU Usage when network used:
That's where I'm still confused. I have 16x2Ghz core's. Most of them are not used when I do anything which means to me that the system is not working very hard unless there is some bottleneck elsewhere I have not seen.
You should really read the contents of the sites you are pointed at, from https://calomel.org/network_performance.html
No matter what operating system you choose, the machine you run on will determine the theoretical speed limit you can expect to achieve. When people talk about how fast a system is they always mention CPU clock speed. We would expect an AMD64 2.4GHz to run faster than a Pentium3 1.0 GHz, but CPU speed is not the key, motherboard bus speed is.
I/O operations don't count as CPU usage, but these are the main limiting factor in packet based operations like networking and especially when it comes to routing.
-
Thanks! For reference I have read all these pages and is why I changed to using this board from my original PFSense board which has a fairly weak PCIe bus. :)
The page talks mostly about bus speed. So to break it down to what I know...
My system has PCI Express 3.0 interface (20 lanes) according the the cpu info. I can't really find any specific reference on the manufactures page. However I would also assume the 20 lanes are actually HSIO lanes.
http://www.cpu-world.com/CPUs/Atom/Intel-Atom%20C3958.htmlThe closest reference on the network performance page is 3.0 x16.
PCIe v3 x16 = 15.7 GB/s (128 GT/s) Fine for 100 Gbit firewall
However according to Intel it probably means my PCIE bus is 8x.
PCIe v3 x8 = 7.8 GB/s ( 64 GT/s) Fine for 40 Gbit FirewallThe PCIe slot (not used) which is clearly stated from the manufacture, from the chart:
PCIe v3 x4 = 3.9 GB/s ( 32 GT/s) Fine for 10 Gbit firewallI do not see anything specifically mentioning what speed the built-in network ports are connected to the bus at. I can only assume Supermicro would at least give them enough to achieve their rated speed.
I talk about CPU cores because I would think that data would be translated across all 16 of them instead of 2 or 3 when the desired throughput has not yet been met. IE if PF is "processing" the packets, I am assuming my CPU is involved? Or what is the purpose of putting a nic queue per core if not to load balance across all the cores?
Also recall I can reach 10G speeds just fine w/o PF enabled. This would suggest that my bus is not the problem.
Cheers!