Throughput expectations on celeron + igb driver system
-
I picked up a Protectli 2 port Celeron J3060 system and have been using it for a couple months now. I have done various testing and want to make sure I have my expectations set accordingly.
I have run through the Netgate hardware, performance and troubleshooting docs and played with settings, but none seem to make any difference in my observed performance. On one hand I suspect this points at my hardware itself being the source of limitations I see, but also I see some settings that when I make them don't actually take effect (specifically the hw.igb.num_queues sysctl value, which seems to be pegged at "1" always)
My primary testing is using iperf3 from inside my PFSense LAN and out to an AWS EC2 t2.micro instance. I test with both TCP and UDP. Here are some other details:
- LAN-side I have a single interface with multiple VLANs (WAN port is dedicated to its purpose)
- I see a max of 420 Mbit/s with TCP (50 parallel streams) and max of 770Mbit/s UDP
- trying to set
hw.igb.num_queues="0"
always results inhw.igb.num_queues="1"
after reboot (I suspect this is some enforcement of system policy since this box only has 2 physical cores and with more than 1 queue it would be possible for the two CPUs to pull from multiple queues and brown-out the other userland processes from running)
Observing
top -aSH
while running tests this is the most interesting thing I see:0 root -92 - 0K 304K CPU1 1 3:15 100.00% [kernel{igb0 que}]
... which tells me that the single CPU core pulling from the single igb queue is the limiting factor for my throughout.
Based on all my research, it seems like the performance I am getting is exactly what I should expect given this system's configuration. I just wanted to run this by the community to see if my interpretation of the data is consistent with what others know to be true, and if maybe there is something I'm missing.
Specifically, if anyone can confirm or correct my interpretation of the
hw.igb.num_queues="1"
behavior I'm seeing that would be greatly appreciated.Thanks!
-
Well, somehow
hw.igb.num_queues="1"
was in my/boot/loader.conf.local
which explains my specific confusion. Really not sure how that got there.Regdless, I set the value to
0
successfully and am doing some additional testing... -
I neglected to mention earlier I also added
hw.igb.rx_process_limit="-1"
to the/boot/loader.conf.local
which had been set to100
(thehw.igb.tx_process_limit
was already set to-1
)OK, so here's with TCP:
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0K 336K CPU0 0 1:47 64.41% [intr{irq259: igb0:que 0}] 12 root -92 - 0K 336K WAIT 1 3:14 61.33% [intr{irq260: igb0:que 1}] 12 root -92 - 0K 336K WAIT 1 0:43 23.41% [intr{irq264: igb1:que 1}] 12 root -92 - 0K 336K RUN 0 0:47 20.66% [intr{irq263: igb1:que 0}] 11 root 155 ki31 0K 32K RUN 0 30:23 14.18% [idle{idle: cpu0}] 11 root 155 ki31 0K 32K RUN 1 29:25 14.12% [idle{idle: cpu1}]
This test ran at right about 600Mbit/s, which is what all indications are what I should expect from a dual core celeron running <2GHz (with adequate memory).
UDP looks a little different, this is what I caught from
top -aSH
during the testPID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 32K RUN 0 37:00 78.13% [idle{idle: cpu0}] 12 root -92 - 0K 336K CPU1 1 3:49 69.11% [intr{irq260: igb0:que 1}] 11 root 155 ki31 0K 32K RUN 1 35:52 28.32% [idle{idle: cpu1}] 12 root -92 - 0K 336K WAIT 0 0:56 9.02% [intr{irq263: igb1:que 0}]
My suspicion in the original post was correct, and with the additional queues other userland processes, like
top
, don't get scheduled as often when throughput is saturated. That is, during the testing, for TCPtop
only refreshed once and allowed me to capture the first bit I captured above, but during the UDP test it was clear the system wasn't under nearly as much stress, as is apparent in the bit I captured fromtop
for the UDP test.UDP throughput maxed out around 820Mbit/s. FWIW, I was able to get 980Mbit/s down on this connection using my macbook pro connected directly to the modem.
So, I answered my own question mostly, and hopefully this will be a reference for anyone testing a similar system.
-
@thedude42 said in Throughput expectations on celeron + igb driver system:
Celeron J3060
That does seem about right for that CPU. Do you have powerd enabled? Not sure how that plays with 'burst mode' on those, it is required on some for 'turbo mode'. You may be limited to 1.6GHz.
Steve
-
Thanks for the reply @stephenw10 !
Yes one of the troubleshooting steps was to ensure powerd was enabled (it wasn't) but when I enabled it I didn't notice any significant improvement after reboot... but that was before I was able to track down the config in
/boot/loader.conf.local
so whether it made much difference overall, I'm not sure.The only thing I have left to investigate is why I see such a discrepancy between speedtest.net tool performance to various servers (I use the python cli client from a host on wired ethernet on a dell 1U system) and iperf to an AWS instance. I see about a 100Mbit/s difference between the two testing methods, and I'm wondering if that has more to do with the speedtest software environment than the link.
-
I found the speedtest CLI tool (the one available for pfSense) to be a bit flaky at higher speeds. I assume you're running both tests from the host behind the firewall? Testing fro the firewall will always give lower numbers.
Steve
-
Yeah good point, I'll see about using the browser client on wired ethernet and see if there is a difference. Thanks!
-