CPU Usage when network used

qwaven

Hi all,

I've got a fairly new install of Pfsense w/ the latest update. I have not installed or setup anything major like IDS...etc. Fairly basic firewall rules and some NAT.

I have installed a 10G SFP+ PCIE card. CHELSIO COMMUNICATIONS T520-SO-CR which I have understood to be compatible with PFSense. Upon first boot it did appear to update some firmware on the card and detect fine.

I do have a few VLAN's configured on the SFP port.

Generally speaking the card "works" however I've noticed that my CPU utilization goes way up when doing just about nothing on the box except transferring data.

For example: I've setup iperf3 as a client on PFSense and pushing data to my NAS.

Pfsense --> SFP+ NAS (VLAN) --> directly connected switch --> NAS also on the same switch

I notice two things:

The speed is pretty weak for 10G at about 445 Mbits/sec.
The CPU graph on PFSense seems to climb from 0-5% to around 60-70% utilization once iperf starts.

Something does not seem to working very well and I am trying to determine where the issue is. Throughput wise could be any number of things external to the fw but the CPU utilization I would imagine is all PFSense. I am hoping to correct both however for now would be happy to see the CPU go back to a little more normal.

My hardware is:

CPU Type Intel(R) Pentium(R) CPU N3700 @ 1.60GHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: Yes (active)
16GB of RAM
all graphs on the dashboard seem minimal to no usage when idle.

Status
up
MAC Address
00:07:43:4c:66:48
IPv4 Address
10.10.254.1
Subnet mask IPv4
255.255.255.128
IPv6 Link Local
fe80::207:43ff:fe4c:6648%cxl1.254
MTU
1500
Media
10Gbase-Twinax <full-duplex,rxpause,txpause>
In/out packets
74293466/127365445 (33.39 GiB/124.34 GiB)
In/out packets (pass)
74293466/127365445 (33.39 GiB/124.34 GiB)
In/out packets (block)
2825/72 (252 KiB/4 KiB)
In/out errors
0/199
Collisions
0

Also note I have not setup anything like jumbo frames...etc either. I have pretty much left the network settings / tuning at system default. I do not see any errors on the network.

Hoping someone(s) can assist with digging deeper to determine if anything can be corrected?

Cheers!

stephenw10

Hmm, you're running a low power laptop CPU. Is it running in it's turbo mode?

You might try enabling powerd in Sys > Adv > Misc.

Try running at the command line whilst you test: top -aSH

See what is using your CPU cycles and how that's spread across the cores.

Do you have other copper NICs you can test against?

Steve

qwaven

Thanks for the reply.

I am running a low power cpu but it is not a laptop :)

I am not aware of how to check turbo mode, unless this is also part of powerd? I have this on hiadaptive.

Its come to my attention that part of my issue might be that the adapter card I have used is plugged into a pcie port to slow for its speed (10) which might be the issue. I would have figured this would just not give me as much speed when transferring though. I do not recall having cpu issues before when using the built in 1G ports.

I will try some tests when possible via the command you gave.

Cheers!

stephenw10

Plugging it into a narrower PCI bus, say an 8x card in a 1x slot should still work. The total throughput will obviously be lower but the bandwidth of a single PCI lane is still more than 500Mbps, even for v1.

Some CPUs/boards need powerd to use turbo mode. Some are disabled by default in the BIOS.
Turbo mode frequencies are not usually reported directly. You might see it shown as 1600MHz normally and 1601MHz when turbo is enabled.

For example:

Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
Current: 2600 MHz, Max: 3201 MHz
4 CPUs: 1 package(s) x 4 core(s)
AES-NI CPU Crypto: Yes (active)

Steve

qwaven

I will do some more tests to confirm this.

For reference:

The board is: https://www.supermicro.com/products/motherboard/X11/X11SBA-LN4F.cfm
Which has PCIE 2.0 -1x which is 500MB/s throughput. Unfortunately this means I need a better board at some point if I ever want to actually utilize my 10G setup since running dual 10G interfaces seems kinda silly right now. :)

CPU wise though I do see that mine is running at the lowest range compared to what is listed on intels site which says burst can be 2.56Ghz.

https://ark.intel.com/content/www/us/en/ark/products/91830/intel-pentium-processor-n3710-2m-cache-up-to-2-56-ghz.html

Cheers!

Grimson

@qwaven said in CPU Usage when network used:

CPU wise though I do see that mine is running at the lowest range compared to what is listed on intels site which says burst can be 2.56Ghz.

Burst is up to 2.56 GHz, but it's up to the board/BIOS designer to decide if and how high the CPU is allowed to burst. Those decisions are usually based on how the cooling is designed, with passive cooling you often won't get the full burst speed or even no burst at all. Consult the documentation of your hardware, or their support, to see if and how far the CPU can use burst on that board.

qwaven

So yeah tried a few things. Enabled/disable powerd/powerd options to maximum...etc
Bios anything I could find to speed up the cpu.

Nothing seems to change it from 1.6. Looking likely that I cannot change this further.

Random snippet of my CPU. Was more around 50% utilization this time.

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -92 - 0K 688K - 3 1:03 95.53% [kernel{igb0 que (qid 0)}]
11 root 155 ki31 0K 64K RUN 1 19:19 78.63% [idle{idle: cpu1}]
11 root 155 ki31 0K 64K CPU0 0 19:23 67.33% [idle{idle: cpu0}]
11 root 155 ki31 0K 64K CPU2 2 19:13 64.38% [idle{idle: cpu2}]
11 root 155 ki31 0K 64K CPU3 3 19:12 30.00% [idle{idle: cpu3}]
12 root -92 - 0K 816K WAIT 1 0:19 19.35% [intr{irq265: t5nex0:1a0}]
12 root -92 - 0K 816K WAIT 2 0:15 17.80% [intr{irq267: t5nex0:1a2}]
12 root -92 - 0K 816K WAIT 1 0:15 16.78% [intr{irq266: t5nex0:1a1}]
12 root -92 - 0K 816K WAIT 3 0:15 15.07% [intr{irq268: t5nex0:1a3}]
12 root -92 - 0K 816K WAIT 0 0:16 14.61% [intr{irq269: igb0:que 0}]

Cheers!

stephenw10

What were you doing at that point? You have nearly 100% on one NIC queue. Other load looks to be spread nicely though.

Steve

qwaven

this was just a simple download from the internet.

Internet --> PFSense WAN --> (Nat) --> PFsense 10G interface --> Switch ---> Destination Host

igb0 would be the WAN nic (1g port)
I believe t5nex0 is the chelsio 10g card.

Cheers!

stephenw10

Hmm, well it looks like if the restriction is anywhere it's on WAN.

Were you able to test with just the igb NICs? Remove the 10G from the test?

Steve

qwaven

ok so it took some changing things around a bit but I have now switched to using only 1G interfaces.

Unfortunately I am not sure the results are much different.

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -92 - 0K 688K CPU3 3 5:25 94.14% [kernel{igb0 que (qid 0)}]
11 root 155 ki31 0K 64K CPU1 1 24.8H 58.08% [idle{idle: cpu1}]
11 root 155 ki31 0K 64K CPU2 2 24.8H 55.88% [idle{idle: cpu2}]
11 root 155 ki31 0K 64K RUN 0 24.8H 44.88% [idle{idle: cpu0}]
12 root -92 - 0K 816K WAIT 0 1:15 36.41% [intr{irq287: igb3:que 0}]
11 root 155 ki31 0K 64K RUN 3 24.8H 32.95% [idle{idle: cpu3}]
12 root -92 - 0K 816K CPU1 1 1:15 30.30% [intr{irq288: igb3:que 1}]
78054 root 34 0 266M 218M bpf 2 0:48 22.20% /usr/local/bin/ntopng -d

For reference the transfer was going about 40 megabytes/sec.

Cheers!

stephenw10

If you expand the window to get more output from top do you actually see more than one queue on igb0?

You said you have mostly default settings, I assume you did not set the number of igb queues? Or any other loader tunable?

Steve

qwaven

Hi Steve,

I still had the shell open from the same transfer. Here is a more complete view.
I am not clear if
kernel{igb0 que (qid 0)} is different than intr{irq269: igb0:que 0} however for igb3 I see [intr{irq288: igb3:que 1}] and [intr{irq287: igb3:que 0}] which still seems low given I have 4 cores no? I have not adjusted anything manually like this.

PID USERNAME PRI NICE SIZE 11 root 155 ki31 11 root 155 ki31 11 root 155 ki31 0 root -92 - 11 root 155 ki31 12 root -92 - 12 root -92 - 12 root -92 - 78054 root 78054 root 78054 root 78054 root 78054 root 78054 root 78054 root 41253 unbound 36170 root 20 root -16 - 0 root -92 - 12 root -92 - 78054 root 198 root 75724 root 23537 root 12 root -72 - 50030 root 22585 root 12 root -60 - 65534 root 0 root -92 - 339 root 74721 root 81162 root 78054 root 49333 dhcpd 12 root -92 - 78054 root 19 root -16 - 44931 root 23537 root 36968 root 36442 root 12 root -88 - 37136 root 15 root -68 - 78054 root 15 root -68 - RES STATE C TIME WCPU COMMAND
0K 64K CPU3 3 25.4H 74.96% [idle{idle: cpu3}]
0K 64K RUN 1 25.4H 54.03% [idle{idle: cpu1}]
0K 64K RUN 0 25.3H 41.49% [idle{idle: cpu0}]
0K 688K CPU2 2 10:46 35.19% [kernel{igb0 que (qid 0)}]
0K 64K RUN 2 25.3H 33.86% [idle{idle: cpu2}]
0K 816K CPU1 1 3:36 31.32% [intr{irq288: igb3:que 1}]
0K 816K WAIT 0 3:40 29.27% [intr{irq287: igb3:que 0}]
0K 816K WAIT 0 5:50 17.34% [intr{irq269: igb0:que 0}]
30 0 266M 221M RUN 1 2:13 16.83% /usr/local/bin/ntopng -d /v
22 0 266M 221M uwait 3 0:12 9.10% /usr/local/bin/ntopng -d /v
25 0 266M 221M uwait 0 0:11 7.71% /usr/local/bin/ntopng -d /v
23 0 266M 221M uwait 3 0:11 7.62% /usr/local/bin/ntopng -d /v
23 0 266M 221M nanslp 3 1:31 4.48% /usr/local/bin/ntopng -d /v
21 0 266M 221M nanslp 1 0:48 4.16% /usr/local/bin/ntopng -d /v
20 0 266M 221M nanslp 0 0:39 1.45% /usr/local/bin/ntopng -d /v
20 0 65412K 44220K kqread 0 0:01 0.67% /usr/local/sbin/unbound -c
21 0 98680K 39040K accept 3 0:06 0.62% php-fpm: pool nginx (php-fp
0K 16K - 0 0:37 0.57% [rand_harvestq]
0K 688K - 1 0:04 0.42% [kernel{igb3 que (qid 0)}]
0K 816K WAIT 3 0:34 0.34% [intr{irq290: igb3:que 3}]
20 0 266M 221M bpf 1 0:03 0.25% /usr/local/bin/ntopng -d /v
20 0 9860K 4776K CPU0 0 0:07 0.25% top -aSH
20 0 8428K 4984K kqread 0 0:04 0.21% redis-server: /usr/local/bi
20 0 12912K 13032K usem 0 0:00 0.16% /usr/local/sbin/ntpd -g -c
0K 816K WAIT 3 0:14 0.14% [intr{swi1: netisr 0}]
20 0 9464K 5868K select 3 0:10 0.14% /usr/local/sbin/miniupnpd -
20 0 23592K 8804K kqread 3 0:01 0.12% nginx: worker process (ngin
0K 816K WAIT 0 1:21 0.11% [intr{swi4: clock (0)}]
20 0 6600K 2356K bpf 3 0:07 0.08% /usr/local/sbin/filterlog -
0K 688K - 2 0:00 0.07% [kernel{igb3 que (qid 1)}]
36 0 98552K 39340K accept 1 0:13 0.07% php-fpm: pool nginx (php-fp
20 0 50888K 35668K nanslp 3 0:02 0.07% /usr/local/bin/php -f /usr/
20 0 6392K 2540K select 1 0:04 0.06% /usr/sbin/syslogd -s -c -c
20 0 266M 221M nanslp 0 0:00 0.05% /usr/local/bin/ntopng -d /v
20 0 12576K 7924K select 3 0:01 0.05% /usr/local/sbin/dhcpd -user
0K 816K RUN 2 0:20 0.04% [intr{irq289: igb3:que 2}]
20 0 266M 221M select 0 0:00 0.04% /usr/local/bin/ntopng -d /v
0K 16K pftm 0 0:22 0.03% [pf purge]
20 0 12904K 8152K select 0 0:01 0.03% sshd: root@pts/0 (sshd)
20 0 12912K 13032K select 0 0:08 0.03% /usr/local/sbin/ntpd -g -c
20 0 6900K 2444K nanslp 1 0:00 0.02% [dpinger{dpinger}]
20 0 6900K 2444K nanslp 1 0:00 0.02% [dpinger{dpinger}]
0K 816K WAIT 0 0:06 0.01% [intr{irq257: xhci0}]
20 0 6900K 2444K nanslp 1 0:00 0.01% [dpinger{dpinger}]
0K 80K - 3 0:05 0.01% [usb{usbus0}]
20 0 266M 221M nanslp 0 0:00 0.01% /usr/local/bin/ntopng -d /v
0K 80K - 2 0:05 0.01% [usb{usbus0}]

Cheers!

stephenw10

@qwaven said in CPU Usage when network used:

[intr{irq290: igb3:que 3}]

It looks like you have 4 queues for igb3 which is what I expect for a 4 core CPU but I only see one for igb0.
You might try running vmstat -i to confirm you do have the expected queues for each NIC. I thought they were all on-chip in that CPU but maybe igb0 is different in which case you might try using igb3, or one of the others, as WAN.

Steve

qwaven

So with vmstat I see the correct number:

irq269: igb0:que 0 57225866 135
irq270: igb0:que 1 421673 1
irq271: igb0:que 2 425910 1
irq272: igb0:que 3 421212 1
irq273: igb0:link 11 0

irq287: igb3:que 0 94141932 223
irq288: igb3:que 1 45221540 107
irq289: igb3:que 2 27199303 64
irq290: igb3:que 3 35826209 85
irq291: igb3:link 5 0

Cheers!

stephenw10

Mmm, but all the interrupt loading is on one queue. Do you have a PPPoE WAN?

The single thread performance of the N3700 is... not good. And potentially much worse if turbo/burst is not working.

Do you see any significant improvement if you disable ntop-ng?

Steve

qwaven

yes the WAN is PPPoE. Would there be something I can do to use more queues properly?

I can try and turn ntop off later to see what happens.

Cheers!

stephenw10

Ah! OK then, currently, you are limited to a single queue on the PPPoE interface and hence a single core.

See: https://redmine.pfsense.org/issues/4821

And the upstream: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203856

You can probably get some performance by setting the sysctl net.isr.dispatch to deferred in Sys > Adv > System Tunables. That will require a reboot.

https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html#pppoe-with-multi-queue-nics

Steve

qwaven

tried the dispatch
sysctl net.isr.dispatch
net.isr.dispatch: deferred

cpu seemed to about 50% utilization.

interrupt total rate
cpu0:timer 122117 254
cpu2:timer 121707 253
cpu3:timer 116674 243
cpu1:timer 115728 241
irq256: ahci0 11720 24
irq257: xhci0 2850 6
irq258: hdac0 2 0
irq260: t5nex0:evt 2 0
irq269: igb0:que 0 659069 1372
irq270: igb0:que 1 1457 3
irq271: igb0:que 2 516 1
irq272: igb0:que 3 515 1
irq273: igb0:link 3 0
irq274: pcib5 1 0
irq280: pcib6 1 0
irq286: pcib7 1 0
irq287: igb3:que 0 453042 943
irq288: igb3:que 1 573830 1194
irq289: igb3:que 2 755133 1572
irq290: igb3:que 3 438318 912
irq291: igb3:link 3 0
irq292: pcib8 1 0
Total 3372690 7020

qwaven

Also now tried disabling ntop cpu usage looks to be maybe 8-10% less.