Abysmal Performance after pfSense hardware upgrade

8ayM

@Gblenn said in Abysmal Performance after pfSense hardware upgrade:

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

Does HW offloading play a part in this, which I have activated as I'm running Suricata in legacy mode (Intel X520 NIC).

My previous system had been on a dual sfp+ x520. I have another dual in my Desktop I've been testing with and FreeNAS box

@SteveITS said in Abysmal Performance after pfSense hardware upgrade:

@Gblenn said in Abysmal Performance after pfSense hardware upgrade:

Does HW offloading play a part in this, which I have activated as I'm running Suricata

https://docs.suricata.io/en/suricata-7.0.2/performance/packet-capture.html#offloading
"11.2.3. Offloading

Network cards, drivers and the kernel itself have various techniques to speed up packet handling. Generally these will all have to be disabled.

LRO/GRO lead to merging various smaller packets into big 'super packets'. These will need to be disabled as they break the dsize keyword as well as TCP state tracking.

Checksum offloading can be left enabled on AF_PACKET and PF_RING, but needs to be disabled on PCAP, NETMAP and others."

On pfSense inline mode uses NETMAP, and my notes from long ago said to disable offloading if using legacy due to false positives.

So is your thought/suggestion to check/disable these feature

The clean install's web UI is substantially more responsive than what I've grown accustomed to.

SteveITS

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

So is your thought/suggestion to check/disable these feature

Yes we check the three "offloading" checkboxes. Those need a restart.

8ayM

Well I've gone through my stack of SFP+ Dac cables short of a 1' as I really don't have room to cable the unit, and be within reach for the short data cable.

So I broke out 2 x Gtek 10G SFP+ Modules and attached a OM4 fibre cable between the two

I'm at a loss as the DAC's have worked up until I did this firewall upgrade.

@SteveITS said in Abysmal Performance after pfSense hardware upgrade:

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

So is your thought/suggestion to check/disable these feature

Yes we check the three "offloading" checkboxes. Those need a restart.

I'll revert these changes in a moment and see how that makes a difference

Is this the state that your sugesting?

After that I'll slap in my original mirrored m.2's as originally intended and see how things fare.

You almost can't be a normal person with out spare parts to trouble shoot a lot of this. 99% of the time I'm just a packrat until events like this occur.

Although I would like to get back to DAC's for the lower power draw and heat.

stephenw10

The ix NICs in the C3K chipset a missing the lines that allow reading the link status from SFP modules. That's why you see RJ-45-copper modules are not supported.

DAC cables normally work but you often see the link status shown as 'unknown'.

The X520 does not have that issue and I suspect that's what you're seeing here.

8ayM

@stephenw10

I wasn't aware I couldn't utilize a SFP+ to RJ45 module

Making the changes as suggested by @SteveITS yielded similar results as before speed wise.

I'm going to try swapping back to my mirrored m.2 and see how that goes. with the sfp+ modules and fiber back to the switching infrastructure.

stephenw10

For reference:
https://downloadmirror.intel.com/732258/readme.txt

In addition, SFP+ devices based on the Intel(R) Ethernet Connection X552 and
Intel(R) Ethernet Connection X553 do not support the following features:
* Speed and duplex auto-negotiation.
* Wake on LAN
* 1000BASE-T SFP Modules

Though in reality we have seen some modules will work.

8ayM

@stephenw10
Explains the Module not supported message I'm getting at the console for using a 10Gtek Ubiquity 10G SFP+ module.

Might look at some new DAC's

Anyway restored old config, made the hardware offloading changes as mentioned above, and things are looking better. Also remove traffic shaping as I'm struggling to image hitting that limit short of benchmarking.

stephenw10

A 10G DAC cable will usually work in my experience. A DAC connected to 1G at the other end will almost always fail and doesn't allow setting 1G manually.

Gblenn

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

@stephenw10
Explains the Module not supported message I'm getting at the console for using a 10Gtek Ubiquity 10G SFP+ module.

Might look at some new DAC's

Anyway restored old config, made the hardware offloading changes as mentioned above, and things are looking better. Also remove traffic shaping as I'm struggling to image hitting that limit short of benchmarking.

So it seems the one thing that made the difference was that you turned off HW checksum offload (the first item in the list)??

@SteveITS said in Abysmal Performance after pfSense hardware upgrade:

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

So is your thought/suggestion to check/disable these feature

Yes we check the three "offloading" checkboxes. Those need a restart.

Is there no benefit at all having any of the HW offloading active, even with e.g. X520 NIC? I think I have always had all three turned on, on both my sites (other site has i211 NIC's),

8ayM

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

A 10G DAC cable will usually work in my experience. A DAC connected to 1G at the other end will almost always fail and doesn't allow setting 1G manually.

Threw on one of my 10g DAC's again just for giggles

I'm going back to the modules after testing the 1' DAC. When not working on it I can get by with that. Otherwise I'll deal with the coil of fiber for now.

Gblenn

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

The altq setting only affects hn NICs.

What output do you actually see? Suricata will likely be top of the list of you're running it. If you hit q while it's running it leaves the output on the console so you can copy/paste it out.

Yes absolutely, Suricata comes out at the top when I run speedtest...

SteveITS

@Gblenn said in Abysmal Performance after pfSense hardware upgrade:

Is there no benefit at all having any of the HW offloading active, even with e.g. X520 NIC? I think I have always had all three turned on, on both my sites (other site has i211 NIC's),

We run Suricata so disable it. There's potentially a benefit otherwise but if it causes false positives what's the point? E.g. packets have no checksum as they arrive in Suricata IIRC, so trip that rule unless the rule is disabled.

@Gblenn said in Abysmal Performance after pfSense hardware upgrade:

Suricata comes out at the top when I run speedtest

It will eat a lot of CPU as it has to process every packet through its ruleset. It can be a noticeable difference on fast Internet connections with a slow CPU. 7G is pretty good though. :) CPU usage also can vary widely depending on what rules are enabled. (e.g. no sense running web server rules without a web server...or really Suricata in general without any servers...most outgoing traffic is encrypted otherwise)

8ayM

@Gblenn said in Abysmal Performance after pfSense hardware upgrade:

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

@stephenw10
Explains the Module not supported message I'm getting at the console for using a 10Gtek Ubiquity 10G SFP+ module.

Might look at some new DAC's

Anyway restored old config, made the hardware offloading changes as mentioned above, and things are looking better. Also remove traffic shaping as I'm struggling to image hitting that limit short of benchmarking.

So it seems the one thing that made the difference was that you turned off HW checksum offload (the first item in the list)??

@SteveITS said in Abysmal Performance after pfSense hardware upgrade:

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

So is your thought/suggestion to check/disable these feature

Yes we check the three "offloading" checkboxes. Those need a restart.

Is there no benefit at all having any of the HW offloading active, even with e.g. X520 NIC? I think I have always had all three turned on, on both my sites (other site has i211 NIC's),

I made those changes yes, but the item that appears to have resolved my issue is I stopped using the 10G DAC's and installed SFP+ Modules and OM4 fiber cable.

I tried 2 x Cisco SFP-H10GB-CU3M, and 5 various 10Gtek cables all SFP+

The only DAC that looks to be working is a 1' CAB-10ZGSFP-P0.3M.

So it looks like I might be the proud owner of 7 questionable SPF+ DACs

Results with the 1' / 0.3m DAC currently in use

Gblenn

@8ayM said in Abysmal Performance after pfSense hardware upgrade:

I tried 2 x Cisco SFP-H10GB-CU3M, and 5 various 10Gtek cables all SFP+

The only DAC that looks to be working is a 1' CAB-10ZGSFP-P0.3M.

So it looks like I might be the proud owner of 7 questionable SPF+ DACs

Quite the bummer! And I guess it's clear that it is the X553 that doesn't like the DAC's, and not the switch (which sounds unlikely)?

[EDIT] A bit of googling reveals some problems with linking up X553. Checking out a few of them it seems to boil down to driver updates. And I found a reference to OPNsense working fine with X553. https://vyos.dev/T5619
But I also found this on servethehome.
https://forums.servethehome.com/index.php?threads/10gbit-interface-compatibility-intel-x553-mellanox-connectx-2.21477/
Look at the very bottom where someone has been able to make a parameter change in the kernel module to make it work by setting "allow_unsupported_sfp". Not sure if that would change anything given that you get a link up, it's just the performance that sucks. But I remember having seen a similar thing, but wrt firmware in OEM Intel cards, back when I was looking to upgrade my machine.

I have no idea if this possibility even exists in pfsense, perhaps @SteveITS or @stephenw10 knows?

stephenw10

You can set 'allow unsupported SFP' but that won't help here. It's already allowing the module it's just unable to read or set the link speed. As far as I know there's nothing we c an do about that.

@Gblenn What CPU are you using that's passing 8Gbps with Suricata enabled?

Gblenn

@stephenw10 said in Abysmal Performance after pfSense hardware upgrade:

You can set 'allow unsupported SFP' but that won't help here. It's already allowing the module it's just unable to read or set the link speed. As far as I know there's nothing we c an do about that.

Yeah, that's what I thought, as the link is actually up. But it seems to differ depending on HW connected, given that at least one DAC is working. And as others have reported, with the right drivers it seems to work.

@Gblenn What CPU are you using that's passing 8Gbps with Suricata enabled?

It's an i5-11400, but I am running suricata in legacy mode. I can't remember exactly, but I believe I got around 3.5Gbps running inline mode.
And I have it virtualized on Proxmox, set to host CPU with 4 cores assigned.

stephenw10

Ah, yes so significantly more powerful than any C3K CPU.

It is interesting that you see no interrupt load though, I agree. I suspect you would see that with Suricata in in-line mode.

You do see the expected kernel mode iflib task queue processes though. That's where the traffic and pf load usually appears.

Gblenn

@stephenw10 Indeed it is, and with 12 cores I am able to run a few other things as separate VM's without affecting throughput, (NtopNG being one of them).

Are you thinking that if I shift to inline mode for Suricata, I would start seeing interrupt going up? @8ayM doesn't seem to have Suricata activated but perhaps Ntop would have the same effect?

BTW, I changed the HW offloads this morning (none activated now) and although time of day may affect speedtest results, I did manage to get similar speeds just now.

Also tried disabling Suricata but I don't see any difference in performance...

stephenw10

Mmm, the interrupt loading is interesting. What I expect to see is the task queue group values as you are seeing them.

I have to think it's ntop putting the NIC in promiscuous mode doing something there. I don't see that on a C3K system here:

last pid: 39097;  load averages:  0.67,  0.30,  0.21                                            up 2+08:27:39  21:29:14
340 threads:   6 running, 290 sleeping, 44 waiting
CPU 0:  5.5% user,  0.0% nice, 20.0% system,  0.0% interrupt, 74.5% idle
CPU 1:  2.4% user,  0.0% nice, 10.2% system,  0.0% interrupt, 87.5% idle
CPU 2:  3.1% user,  0.0% nice,  5.5% system,  0.0% interrupt, 91.4% idle
CPU 3:  3.1% user,  0.0% nice,  5.1% system,  0.0% interrupt, 91.8% idle
Mem: 98M Active, 215M Inact, 521M Wired, 3002M Free
ARC: 133M Total, 33M MFU, 93M MRU, 1121K Anon, 976K Header, 5440K Other
     99M Compressed, 244M Uncompressed, 2.47:1 Ratio
Swap: 1024M Total, 1024M Free

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        187 ki31     0B    64K CPU2     2  55.4H  90.08% [idle{idle: cpu2}]
   11 root        187 ki31     0B    64K RUN      3  55.4H  89.88% [idle{idle: cpu3}]
   11 root        187 ki31     0B    64K CPU1     1  55.4H  85.69% [idle{idle: cpu1}]
   11 root        187 ki31     0B    64K CPU0     0  55.3H  76.17% [idle{idle: cpu0}]
    0 root        -60    -     0B  1648K -        2   0:03   4.75% [kernel{if_io_tqg_2}]
    0 root        -60    -     0B  1648K -        1   0:02   3.55% [kernel{if_io_tqg_1}]
    0 root        -60    -     0B  1648K -        3   0:04   2.29% [kernel{if_io_tqg_3}]
10536 root          4    0    84M    33M RUN      3   0:00   2.06% /usr/local/bin/python3.11 /usr/local/bin/speedtest{p
10536 root         56    0    84M    33M usem     1   0:01   1.87% /usr/local/bin/python3.11 /usr/local/bin/speedtest{p

Though it's also clearly not anywhere near the same throughput.

8ayM

@stephenw10

Would you want me to test something on my unit?

I just finished updating to the 5.6.x build so I may have some slightly different results over factory ntopng which is usually behind