Abysmal Performance after pfSense hardware upgrade

Gblenn

@stephenw10 Indeed it is, and with 12 cores I am able to run a few other things as separate VM's without affecting throughput, (NtopNG being one of them).

Are you thinking that if I shift to inline mode for Suricata, I would start seeing interrupt going up? @8ayM doesn't seem to have Suricata activated but perhaps Ntop would have the same effect?

BTW, I changed the HW offloads this morning (none activated now) and although time of day may affect speedtest results, I did manage to get similar speeds just now.

Also tried disabling Suricata but I don't see any difference in performance...

stephenw10

Mmm, the interrupt loading is interesting. What I expect to see is the task queue group values as you are seeing them.

I have to think it's ntop putting the NIC in promiscuous mode doing something there. I don't see that on a C3K system here:

last pid: 39097;  load averages:  0.67,  0.30,  0.21                                            up 2+08:27:39  21:29:14
340 threads:   6 running, 290 sleeping, 44 waiting
CPU 0:  5.5% user,  0.0% nice, 20.0% system,  0.0% interrupt, 74.5% idle
CPU 1:  2.4% user,  0.0% nice, 10.2% system,  0.0% interrupt, 87.5% idle
CPU 2:  3.1% user,  0.0% nice,  5.5% system,  0.0% interrupt, 91.4% idle
CPU 3:  3.1% user,  0.0% nice,  5.1% system,  0.0% interrupt, 91.8% idle
Mem: 98M Active, 215M Inact, 521M Wired, 3002M Free
ARC: 133M Total, 33M MFU, 93M MRU, 1121K Anon, 976K Header, 5440K Other
     99M Compressed, 244M Uncompressed, 2.47:1 Ratio
Swap: 1024M Total, 1024M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    64K CPU2     2  55.4H  90.08% [idle{idle: cpu2}]
   11 root        187 ki31     0B    64K RUN      3  55.4H  89.88% [idle{idle: cpu3}]
   11 root        187 ki31     0B    64K CPU1     1  55.4H  85.69% [idle{idle: cpu1}]
   11 root        187 ki31     0B    64K CPU0     0  55.3H  76.17% [idle{idle: cpu0}]
    0 root        -60    -     0B  1648K -        2   0:03   4.75% [kernel{if_io_tqg_2}]
    0 root        -60    -     0B  1648K -        1   0:02   3.55% [kernel{if_io_tqg_1}]
    0 root        -60    -     0B  1648K -        3   0:04   2.29% [kernel{if_io_tqg_3}]
10536 root          4    0    84M    33M RUN      3   0:00   2.06% /usr/local/bin/python3.11 /usr/local/bin/speedtest{p
10536 root         56    0    84M    33M usem     1   0:01   1.87% /usr/local/bin/python3.11 /usr/local/bin/speedtest{p

Though it's also clearly not anywhere near the same throughput.

8ayM

@stephenw10

Would you want me to test something on my unit?

I just finished updating to the 5.6.x build so I may have some slightly different results over factory ntopng which is usually behind

8ayM

@8ayM

Latest test running top -HaSP
ntopng Community v.5.6.240304 rev.0 running in background. It had been disabled for most of our testing after it was suggested to do so

https://streamable.com/ir3e6j

stephenw10

Did you try testing with ntop-ng disabled? Also try with bandwidthd and darkstat disabled.

8ayM

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

Did you try testing with ntop-ng disabled? Also try with bandwidthd and darkstat disabled.

I'll try again when I get home

8ayM

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

Did you try testing with ntop-ng disabled? Also try with bandwidthd and darkstat disabled.

As requested

Preformed the test disabling one at a time announcing which ones. Then a final test again Turing all back on
https://streamable.com/77ahrq

stephenw10

Hmm, so still interrupt load with all three disabled? There must be something else set there. You have any custom sysctls set?

8ayM

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

Hmm, so still interrupt load with all three disabled? There must be something else set there. You have any custom sysctls set?

Not that I recall, then again this has been an evolution of my early usage of pfSense which is going for about 15 years at this point. Always seemed to complicated to start over, and over time that feeling continued to grow.

Here is my current System Tunables:

stephenw10

Hmm, nothing unexpected there. You have any custom loader values in /boot/loader.conf.local?

What other packages do you have installed?

8ayM

@stephenw10

[2.7.2-RELEASE][admin@pfSense-Edge01.scs.lan]/boot: vi loader.conf
kern.cam.boot_delay=10000
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
kern.ipc.nmbclusters="1000000"
kern.ipc.nmbjumbo9="524288"
kern.ipc.nmbjumbop="524288"
opensolaris_load="YES"
zfs_load="YES"
opensolaris_load="YES"
zfs_load="YES"
kern.cam.boot_delay=10000
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
kern.ipc.nmbclusters="1000000"
kern.ipc.nmbjumbo9="524288"
kern.ipc.nmbjumbop="524288"
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
cryptodev_load="YES"
zfs_load="YES"
boot_serial="NO"
autoboot_delay="3"
hw.hn.vf_transparent="0"
hw.hn.use_if_start="1"
net.link.ifqmaxlen="128"
machdep.hwpstate_pkg_ctrl="1"
net.pf.states_hashsize="4194304"

stephenw10

No loader.conf.local file though?

8ayM

@stephenw10

Not that I'm seeing. I could create one if there are persistent items needed to be added.

stephenw10

Ok good, nothing unexpected hiding there.

Is Snort running on the interfaces passing traffic during the test? I don't see it in any of your output.

The interrupt load shown really seems to line up with the ntop load though. It makes me wonder if if something there is actually still enabled.

8ayM

@stephenw10

I'm not running snort ATM, but I do have pfBlockerNG running

stephenw10

Hmm, pfBlocker doesn't run continually against all traffic like that. Any load created by large lists just appears as firewall load in the task queues.

It's almost as if the NICs are running in a different mode.

8ayM

@stephenw10

Let me know if there is anything you can think of me trying or something else you'd like to to check.

stephenw10

The throughput you're seeing now is as expected though?

8ayM

@stephenw10

It is. You just now have me curious what is causing the interrupts.

I'm considering getting the 1u version of my new router. If I do, I'll preform a clean install and look to rebuild my system one brick at a time to see if I can figure out what is causing the GUI slowdown I've had since I moved to my last hardware. I can try to keep an eye on the interrupts as well.

Here is the stats from Status -> Interfaces

I seem to be a little beyond the interrupt range you said shouldn't be "unusual".

https://forum.netgate.com/topic/179674/netgate-6100-significant-interface-interrupt-rates/9

stephenw10

Mmm, but no where near 10K! I agree though I find it odd that you see the interrupt loading in the top output and I do not on a similar C3K system. Like whilst passing 1Gbps iperf traffic on a 5100:

last pid: 57718;  load averages:  0.55,  0.36,  0.34                                              up 0+06:27:17  22:57:28
339 threads:   7 running, 288 sleeping, 44 waiting
CPU 0:  0.0% user,  0.0% nice, 28.6% system,  0.0% interrupt, 71.4% idle
CPU 1:  0.0% user,  0.0% nice, 23.1% system,  0.0% interrupt, 76.9% idle
CPU 2:  0.4% user,  0.0% nice, 24.7% system,  0.0% interrupt, 74.9% idle
CPU 3:  0.0% user,  0.0% nice, 34.1% system,  0.0% interrupt, 65.9% idle
Mem: 45M Active, 258M Inact, 505M Wired, 3028M Free
ARC: 127M Total, 28M MFU, 93M MRU, 416K Anon, 962K Header, 4535K Other
     92M Compressed, 229M Uncompressed, 2.48:1 Ratio
Swap: 1024M Total, 1024M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    64K RUN      0 377:22  73.68% [idle{idle: cpu0}]
   11 root        187 ki31     0B    64K RUN      2 376:55  73.23% [idle{idle: cpu2}]
   11 root        187 ki31     0B    64K RUN      1 377:47  72.55% [idle{idle: cpu1}]
   11 root        187 ki31     0B    64K CPU3     3 376:45  72.51% [idle{idle: cpu3}]
    0 root        -60    -     0B  1648K -        0   0:06  19.51% [kernel{if_io_tqg_0}]
    0 root        -60    -     0B  1648K -        3   0:05  18.78% [kernel{if_io_tqg_3}]
    0 root        -60    -     0B  1648K CPU1     1   0:04  18.60% [kernel{if_io_tqg_1}]
57718 root         34    0    19M  8644K CPU0     0   0:03  17.05% iperf3 -c 172.21.16.8 -P 3 -t 30{iperf3}
57718 root         36    0    19M  8644K sbwait   3   0:04  16.93% iperf3 -c 172.21.16.8 -P 3 -t 30{iperf3}
57718 root         40    0    19M  8644K sbwait   1   0:03  16.74% iperf3 -c 172.21.16.8 -P 3 -t 30{iperf3}
    0 root        -60    -     0B  1648K -        1   0:36   0.14% [kernel{if_config_tqg_0}]
78943 root         20    0    14M  4716K CPU2     2   0:00   0.12% top -HaSP
    7 root        -16    -     0B    16K pftm     0   0:09   0.03% [pf purge]