Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?
-
It's definitely a trade off and and the type of workload / environment matters as well. By favoring power savings over performance I also noticed an increase in latency, but it was very small: maybe an additional 200 microseconds (0.2ms) or so to the first hop (i.e. the dpinger gateway ping), which is probably due to the CPU sitting an min frequency the majority of the time. On the WAN side I have a symmetric 2Gbit Fiber circuit and have not seen any increase in latency or decrease in performance there. Where I have noticed a decrease in performance is when running an iperf3 test between two 10Gbit hosts located on different internals network segments (i.e. basically just L3 routing). Prior to adjusting speed shift settings, I could max out the bandwidth (~9.4Gbit/s) between the hosts with one iperf3 stream (P=1). By favoring power savings, I see closer to 8.5-9.0Gbit/s (on average) now with a single stream. Increasing the streams to two or greater results in the bandwidth maxed out again. Perhaps the load of just one stream (or maybe iperf3 in general) isn't great enough to push the CPU into the highest frequencies when power saving is favored. In any case, it's currently not limiting in any way because I don't have a need to route at 10Gbit line speed. Once WAN speeds increase to that level I will probably have to adjust / tweak the speed shift settings again. In the meantime, with a system that's fairly lightly loaded most of the time, I'm fine with accepting a slight increase in latency and slight decrease in performance in exchange for 2-3 degrees C lower temperatures (on average), along with decreased power consumption.
-
@tman222
With mine set at 80% I don't see any change in throughput when routing, with 1 or 4 streams. It is a pure flat line at 9.90 Gbps.If I generate 10GbE using the router as the server or client I can detect a small ripple in the graph with 1 stream but the 9.90 Gbps average remains:
4 streams is still a flat line though:
If I simultaneously ping a gateway using the same link I do see a small increase, as you would expect when saturating it with an iPerf test:
️
-
Hi @RobbieTT - that's interesting! Are those tests done using 1500 byte packets or a larger MTU? Do you get similar results if testing with a client behind the router vs. from the router?
-
Going via the router or to or from the router makes no difference at 10 GbE, other than the slight ripple you can see in the graphs when you are getting the router to produce the traffic.
Same for larger MTU (which I make use of) as the PPS is high enough for small packets within the 10 GbE cap I run at.
I don't have suitable equipment to produce results at 25 GbE but I would expect to see a clear delta between pure LAN to LAN routing and generating traffic from the router, like you do on low-power CPUs (albeit at much low speeds in those cases), when running above 10 GbE.
Real-world traffic through the WAN would be a lot lot lower due to the firewall, PPPoE limits and the limitations of pfSense/BSD at high bandwidths. I don't have a trustworthy means of testing, just enough to produce indicative results.
The platform and pfSense could do more if Netgate invests time and effort into using QAT in userspace as, at the moment, pfSense is still using the CPU for things that really should be directed at the dedicated silicon for. To still have the PPPoE weakness after all these years is another black-mark against pfSense/BSD.
️
-
Figured I would join in on the fun.
I do notice a little added latency on speedtest.net going from 4-5ms to 7-8ms ping on download but in web browsing and downloading I still get the same speeds.
As you can see from my graphs setting to 80 allows the CPU to clock down to 800mhz when idle and only scale up to 4.6ghz when under regular speed test load.
This also has a nice effect on temps.
Before you can see the clock speed was a constant 4.6ghz (Intel 9700k) and it pretty much never left that speed, at least whenever telegraf queried it.
I am going to leave this at 80 and see how it feels during normal usage, having this level of control over CPU usage really shifts the mindset to having a much more powerful CPU since it would be there when you need it and doesn't really cost much to have it there and waiting.
BTW - this is telegraf, pfblockerng, suricata, and ntopng all running in the background which was probably what held the CPU at 4.6ghz @ 50.
-
@bigjohns97 - thanks for sharing those results. I'd be curious - since you have an Intel "K" CPU, do you have MultiCore Enhancement (or similar name; this is what Asus calls it) enabled in the motherboard's BIOS? As I understand it, this setting turbos the cores more aggressively (i.e. allows all cores to run at maximum turbo frequency) and is now often enabled by default. Overall I see this potentially useful for gaming or other heavily CPU bound workloads, but maybe not for a router/firewall. I did disable this setting recently for a desktop machine and temps dropped another 2-3 degrees Celsius with no noticeable impact to performance.
-
@tman222 I do have multicore enhancement enabled in the BIOS, I want the CPU to be able to scale as fast as Intel intended but only do it when it is absolutely necessary.
I am hoping this 80 setting does exactly that.
All my pfSense machines are old gaming boxes :)
-
After going close to 24 hours here are the high level changes moving from 50 to 80 did for my 9700k.
Temps dropped around 5 degrees on low load / idle and almost 10 degrees under load.
Before
After
Clocks were much more balanced across the available frequencies
Before
After
Notice before it was pretty much pegged at 4.6ghz all the time and now it can still hit that mark but the average is much closer to the middle of the range and it is very balanced.
I couldn't be happier with this setting of 80, appreciate the thread everyone.
Happy New Year!
-
@RobbieTT said in Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?:
@tman222
With mine set at 80% I don't see any change in throughput when routing, with 1 or 4 streams. It is a pure flat line at 9.90 Gbps.If I generate 10GbE using the router as the server or client I can detect a small ripple in the graph with 1 stream but the 9.90 Gbps average remains:
4 streams is still a flat line though:
If I simultaneously ping a gateway using the same link I do see a small increase, as you would expect when saturating it with an iPerf test:
@RobbieTT
Please, is the last screenshot the PingPlotter for Mac?If answer is “yes”, please told me how You find it ACCURATE (if RUNNING AS A SERVICE) rater than SmokePing ?
(I know that this is different class of software, and the best thing that PingPlotter opposite to SmokePing have scripted and EASY TO USE reactions on triggers (and forward pages in a doc…), but anyway…)
-
@Sergei_Shablovsky said in Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?:
@RobbieTT
Please, is the last screenshot the PingPlotter for Mac?If answer is “yes”, please told me how You find it ACCURATE (if RUNNING AS A SERVICE) rater than SmokePing ?
(I know that this is different class of software, and the best thing that PingPlotter opposite to SmokePing
Yes, PingPlotter on my Mac server so it runs 24/7 on my WAN and occasionally to other things under test. I don't have an opinion on the choices out there or how they compare with it as I have just run this for years.
️
-
@RobbieTT said in Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?:
@Sergei_Shablovsky said in Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?:
@RobbieTT
Please, is the last screenshot the PingPlotter for Mac?If answer is “yes”, please told me how You find it ACCURATE (if RUNNING AS A SERVICE) rater than SmokePing ?
(I know that this is different class of software, and the best thing that PingPlotter opposite to SmokePing
Yes, PingPlotter on my Mac server so it runs 24/7 on my WAN and occasionally to other things under test. I don't have an opinion on the choices out there or how they compare with it as I have just run this for years.
Thank You for answering!
Ping plotter are really the MUST HAVE for Mac server. ;)
So, do You make some scripting?
Do You start PingPlotter AS SERVICE?
-
-
On a N100 it seems impossible to get it to idle at min speeds, even setting to 100 (with per core), it will idle at around 1.4ghz.
However it needs to be no higher than about 70 to reliably exceed 2.5ghz clock speeds under openssl benchmark.
I might need to lower it further though as I have been seeing small CPU spikes enough to cause noticeable jitter on traffic, which I suspect isn't generating enough clocks.
It is PPPoE so no doubt that is making things harder.
-
How are you checking? The gui code will always make it ramp up. In my testing even running sysctl increased across each core as it read it.
-
@stephenw10
Surely that will be CPU & load specific, with weaker CPUs needing to work harder with minor tasks?️
-
Yup. However SpeedShift appears so fast (sensitive) that even on something relatively powerful it will react where SpeedStep would not.
-
@stephenw10 with the sysctl freq variables.
Ironically I have dropped it to 60 now from 70, as 70 wasnt ramping up to highest clock speed during high throughput whilst 60 does.
These are the lowest clocks I have managed to see, I managed to get 1 core below 1ghz :) This is by luck, usually its above 1.4ghz.
Also I was logged out of UI.
# sysctl dev.cpu.0.freq dev.cpu.1.freq dev.cpu.2.freq dev.cpu.3.freq dev.cpu.0.freq: 926 dev.cpu.1.freq: 1125 dev.cpu.2.freq: 1337 dev.cpu.3.freq: 1410
-
@chrcoluk The CPU capabilities are probably more dominant than the fine controls provided by pfSense.
I also happen to have my SpeedShift set at 70 and with my router doing regular work (no VPNs or invasive tasks) the 8 physical cores sit at the figures below (NB I have disabled hyper-threading as it arguably gets more in the way than actually helping):
[24.03-RELEASE][admin@Router-7]/root: sysctl dev.cpu.0.freq dev.cpu.1.freq dev.cpu.2.freq dev.cpu.3.freq sysctl dev.cpu.4.freq sysctl dev.cpu.5.freq sysctl dev.cpu.6.freq sysctl dev.cpu.7.freq dev.cpu.0.freq: 799 dev.cpu.1.freq: 799 dev.cpu.2.freq: 799 dev.cpu.3.freq: 799 dev.cpu.4.freq: 799 dev.cpu.5.freq: 799 dev.cpu.6.freq: 799 dev.cpu.7.freq: 799 [24.03-RELEASE][admin@Router-7]/root:
I suspect if I switched over to my Netgate 6100 I would see frequencies routinely above the minimum values.
️
-
@RobbieTT Sensitivity is CPU dependent it seems.
Also the N100 has no HT.
-
Some more observation.
This I think backs up stephenw10's observation that the sysctl commmand itself is ramping up clock speeds.
If I run the sysctl command to check clock speed, there is no major difference between 100 and 60, I got the sub 1ghz output when it was set to 60.
At values below 50 it will be visibly different.
However if I look at CPU temperatures, when it is set to 70 as the baseline, dropping it to 60 (which I did to allow higher clock speeds under heavy throughput) the average CPU temp increases by around 2C, a small difference indicating average clocks with the system mostly idle are affected, and when I increased it all the way to 100, even though the sysctl command gives similar output there was a whopping 14C drop on plotted temperatures.
Interestingly setting it to a very low number like 0 or 10, the temps are lower compared to setting it to 60. I am guessing the CPU going back to an idle state quicker has a positive effect. Really low they similar to using a value of 100. -
@RobbieTT said in Enhanced Intel SpeedStep / Speed Shift - Are they fully supported?:
@tman222
If I generate 10GbE using the router as the server or client I can detect a small ripple in the graph with 1 stream but the 9.90 Gbps average remains:4 streams is still a flat line though:
Which Mac app You using here to generate traffic of this kind?