Throughput expectations on celeron + igb driver system

thedude42

I picked up a Protectli 2 port Celeron J3060 system and have been using it for a couple months now. I have done various testing and want to make sure I have my expectations set accordingly.

I have run through the Netgate hardware, performance and troubleshooting docs and played with settings, but none seem to make any difference in my observed performance. On one hand I suspect this points at my hardware itself being the source of limitations I see, but also I see some settings that when I make them don't actually take effect (specifically the hw.igb.num_queues sysctl value, which seems to be pegged at "1" always)

My primary testing is using iperf3 from inside my PFSense LAN and out to an AWS EC2 t2.micro instance. I test with both TCP and UDP. Here are some other details:

LAN-side I have a single interface with multiple VLANs (WAN port is dedicated to its purpose)
I see a max of 420 Mbit/s with TCP (50 parallel streams) and max of 770Mbit/s UDP
trying to set hw.igb.num_queues="0" always results in hw.igb.num_queues="1" after reboot (I suspect this is some enforcement of system policy since this box only has 2 physical cores and with more than 1 queue it would be possible for the two CPUs to pull from multiple queues and brown-out the other userland processes from running)

Observing top -aSH while running tests this is the most interesting thing I see:

0 root -92 - 0K 304K CPU1 1 3:15 100.00% [kernel{igb0 que}]

... which tells me that the single CPU core pulling from the single igb queue is the limiting factor for my throughout.

Based on all my research, it seems like the performance I am getting is exactly what I should expect given this system's configuration. I just wanted to run this by the community to see if my interpretation of the data is consistent with what others know to be true, and if maybe there is something I'm missing.

Specifically, if anyone can confirm or correct my interpretation of the hw.igb.num_queues="1" behavior I'm seeing that would be greatly appreciated.

Thanks!

thedude42

Well, somehow hw.igb.num_queues="1" was in my /boot/loader.conf.local which explains my specific confusion. Really not sure how that got there.

Regdless, I set the value to 0 successfully and am doing some additional testing...

thedude42

I neglected to mention earlier I also added hw.igb.rx_process_limit="-1" to the /boot/loader.conf.local which had been set to 100 (the hw.igb.tx_process_limit was already set to -1)

OK, so here's with TCP:

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   12 root       -92    -     0K   336K CPU0    0   1:47  64.41% [intr{irq259: igb0:que 0}]
   12 root       -92    -     0K   336K WAIT    1   3:14  61.33% [intr{irq260: igb0:que 1}]
   12 root       -92    -     0K   336K WAIT    1   0:43  23.41% [intr{irq264: igb1:que 1}]
   12 root       -92    -     0K   336K RUN     0   0:47  20.66% [intr{irq263: igb1:que 0}]
   11 root       155 ki31     0K    32K RUN     0  30:23  14.18% [idle{idle: cpu0}]
   11 root       155 ki31     0K    32K RUN     1  29:25  14.12% [idle{idle: cpu1}]

This test ran at right about 600Mbit/s, which is what all indications are what I should expect from a dual core celeron running <2GHz (with adequate memory).

UDP looks a little different, this is what I caught from top -aSH during the test

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
   11 root       155 ki31     0K    32K RUN     0  37:00  78.13% [idle{idle: cpu0}]
   12 root       -92    -     0K   336K CPU1    1   3:49  69.11% [intr{irq260: igb0:que 1}]
   11 root       155 ki31     0K    32K RUN     1  35:52  28.32% [idle{idle: cpu1}]
   12 root       -92    -     0K   336K WAIT    0   0:56   9.02% [intr{irq263: igb1:que 0}]

My suspicion in the original post was correct, and with the additional queues other userland processes, like top, don't get scheduled as often when throughput is saturated. That is, during the testing, for TCP top only refreshed once and allowed me to capture the first bit I captured above, but during the UDP test it was clear the system wasn't under nearly as much stress, as is apparent in the bit I captured from top for the UDP test.

UDP throughput maxed out around 820Mbit/s. FWIW, I was able to get 980Mbit/s down on this connection using my macbook pro connected directly to the modem.

So, I answered my own question mostly, and hopefully this will be a reference for anyone testing a similar system.

stephenw10

@thedude42 said in Throughput expectations on celeron + igb driver system:

Celeron J3060

That does seem about right for that CPU. Do you have powerd enabled? Not sure how that plays with 'burst mode' on those, it is required on some for 'turbo mode'. You may be limited to 1.6GHz.

Steve

thedude42

Thanks for the reply @stephenw10 !

Yes one of the troubleshooting steps was to ensure powerd was enabled (it wasn't) but when I enabled it I didn't notice any significant improvement after reboot... but that was before I was able to track down the config in /boot/loader.conf.local so whether it made much difference overall, I'm not sure.

The only thing I have left to investigate is why I see such a discrepancy between speedtest.net tool performance to various servers (I use the python cli client from a host on wired ethernet on a dell 1U system) and iperf to an AWS instance. I see about a 100Mbit/s difference between the two testing methods, and I'm wondering if that has more to do with the speedtest software environment than the link.

stephenw10

I found the speedtest CLI tool (the one available for pfSense) to be a bit flaky at higher speeds. I assume you're running both tests from the host behind the firewall? Testing fro the firewall will always give lower numbers.

Steve

thedude42

Yeah good point, I'll see about using the browser client on wired ethernet and see if there is a difference. Thanks!