Upgrade to 23.01 WAN speed halved
-
Hey folks,
Upgraded from 22.05 on my Lenovo ST50 server running a 6C Xeon 2176G, 16GB ECC memory, and 2 Chelsio T520 cards (CR and BT), with the CR (LAN) connected via 10Gig twinax to my Juniper Ex2300 and the BT (WAN) connected to my Nokia GPON ONT @ 1Gig. System has been rock-steady for quite some time (last 300 days 'ish).
Completed upgrade via console and everything went really smooth. Rebooted the machine several times after the upgrade but as of now, this is what's new:
- WAN performance is halved @ 470 Mbps and was always a stout @ 940Mbps on any wired gigabit device on the network via Speedtest
- CPU usage is stuck at around 9-10% now and has never gone above 2-3% unless under heavy load previously
-CPU interrupt is now showing @ 9-10% and has ALWAYS been at 0%.
Checked all my interfaces and they retained previous settings (Full duplex, Tx, Rx) and Juniper ex2300 xe SFP+ port still showing 10G full duplex on LAN.
No pfsense packages currently installed from base install.
Any ideas what could've changed here to cause the WAN speed to halve? Are these Chelsio cards no longer any good and perhaps drivers being deprecated somehow?
Thanks for any input.
Partial snapshot of vmstat -i. Not sure what's on IRQ16 but it looks to be a potential culprit..
[23.01-RELEASE][root@pfSense]/root: vmstat -i
interrupt total rate
irq4: uart0 650 0
irq16: ig4iic0+ 84027864 36922 --> Believe this is IRQ reserved for the Intel chipset on mobo
cpu0:timer 133628 59
cpu1:timer 123888 54
cpu2:timer 2547516 1119
cpu3:timer 115151 51
cpu4:timer 94963 42
cpu5:timer 117391 52EDIT UPDATE: Decided to unplug my interfaces from the Lenovo ST50 last night and into my up and running Netgate 8200 Plus (still running 22.05.1) and configured same as the Lenovo. As expected, everything came right up and WAN performance again pegged across all devices @ 950Mbps.
Also, I kept the Lenovo server running overnight albeit disconnected from all interfaces. Not sure if this was CRON or some other cleanup happening behind the scences but CPU interrupts are gone completely and CPU usage is back down to normal.
However, swapped all connections back into the Lenovo this AM and while system is seemingly humming along beautifully, WAN interfaces is still stuck at half previous performance (from 950Mbps to 480Mbps). Really at a loss on the WAN speed drop...
EDIT 2: Since reconnected interfaces back into Lenovo, CPU usage back to 10% and CPU interrupt back at 9%. ??!
EDIT 3: Selected 22.05 boot environment and reloaded server. Came right back up, CPU usage low to nonexistent, CPU interrupts .2%, WAN speed to 950Mbps up/down. (Me...facepalming myself)
-
Hmm, not seen that but it looks likely to be the NIC /driver. Are you able to try a different NIC?
-
Thanks Stephen.
Yes, I can plug back in an Intel X550-T2 (in place of the Chelsio card) which I would use for the WAN interface. Can I do that without having to reinstall from CE-->plus? Can I halt the system, swap cards, and reboot and config?
-
It will change the NDI but won't prevent it booting so you can test.
You might also try speed test through just one NIC. So iperf to pfSense itself from the LAN. Or run speedtest (or iperf) from the pfSense CLI to test only the WAN.
-
I too seem to have had my speed roughly cut in half for my x550-T2 (down from 5gb to the low 2+ gb range after going from 22 to 23. Haven't done much experimenting yet. I'll try stepping back to 22 at some point over the weekend. On 22, it never said what my speed was negotiated at (simply saying it was unknown), but looking at the 5gb port on the AT&T fiber hardware on the other side of the link showed it at the proper 5gb port speed, plus I could speedtest up to 5gb (4.75 or so), no problem. Now, when set to auto, it will only auto-negotiate to 1gb. If I set it to 5gb, that's when it won't go any higher than 2.3 or so. So yeah, now it can report link speeds other than 1gb or 10gb, but performance has taken a major hit.
-
If you set to 5Gbps specifically the other side must also be set to that otherwise you will end up with a negotiation failure at one end. It's probably defaulting to 2.5G. Are you seeing errors or collisions on the interface?
This seems unrelated to what the OP here is seeing though.Steve
-
OK, did some experiments. I set the interface on the AT&T side from auto to 5gb. Even restarted both pfsense and the AT&T equipment and still getting no more than half the speed I used to before the upgrade to 23. Setting both to auto (which they were both at on 22 and working fine up to 5gb speeds) will only get me a 1gb negotiated link every time. And with both manually set to 5gb, I get no better than half the speed I used to. I'm seeing a ton of interrupts on both interfaces on the x550-T2 in pfsense. Don't recall what that was like in v22, as I didn't really look at that when things were working great. Think I'm gonna fall back to v22 (gonna have to poke around about doing that, as I've not needed to undo an update in pfsense before), which despite not actually indicating link speed, worked great for performance. Here are a couple screenshots of both the interface on the AT&T equipment and the pfsense x550-T2.
-
Hmm, check the boot logs to see if the NIC is coming up with the expected number of queues.
There is also a sysctl you can use to set the advertised speeds available for link negotiation. That might allow it to negotiate at 5G correctly.
-
I have not had the chance to update the firmware on one of X550-T2's yet, but will do so and test on my end. For the time being, I kept both my Chelsio cards installed, did a fresh install from 2.6CE to 22.05 (and installed the 23.01 security patches) and the system is just running super snappy, WAN speeds full symmetrical 950Mbps.
Will grab another nvme SSD, update the X550 firmware, reinstall --> 23.01 and retest WAN speeds sometime this week to verify if it's my Nokia GPON ONT that is the issue with the updated drivers in 23.01 for the Chelsio (and potentially X550) and report back.
-
Are any of you using traffic shaping? I have a post in the traffic shaping forum about my speed being cut in half with traffic shaping. I turned it off and that solved my problem. Unfortunately, due to another (known) bug, I can’t turn it back on to do more experimentation.
-
Thanks for sharing. Unless traffic shaping is enabled by default on a fresh install (which I would assume No) then I am currently not.
-
@brachy33 Correct. It is not.
-
Can also confirm I am experiencing the same issue. I am running pfsense as VM on unRaid and passing through Ethernet controller: Chelsio Communications Inc T520-CR. I get half of my fios 1gb speed on Wan. Rolled back to 22.05 for now.
-
@stephenw10 Not the best expert at what it should be reporting, but here are the interface lines from the boot log:
ix0: <Intel(R) X550-T2> mem 0xd1000000-0xd11fffff,0xd1400000-0xd1403fff at device 0.0 on pci1
ix0: Using 2048 TX descriptors and 2048 RX descriptors
ix0: Using 4 RX queues 4 TX queues
ix0: Using MSI-X interrupts with 5 vectors
ix0: allocated for 4 queues
ix0: allocated for 4 rx queues
ix0: Ethernet address: ********************
ix0: PCI Express Bus: Speed 8.0GT/s Width x4
ix0: eTrack 0x80000492 PHY FW V523
ix0: netmap queues/slots: TX 4/2048, RX 4/2048
ix1: <Intel(R) X550-T2> mem 0xd1200000-0xd13fffff,0xd1404000-0xd1407fff at device 0.1 on pci1
ix1: Using 2048 TX descriptors and 2048 RX descriptors
ix1: Using 4 RX queues 4 TX queues
ix1: Using MSI-X interrupts with 5 vectors
ix1: allocated for 4 queues
ix1: allocated for 4 rx queues
ix1: Ethernet address: ******************
ix1: PCI Express Bus: Speed 8.0GT/s Width x4
ix1: eTrack 0x80000492 PHY FW V523
ix1: netmap queues/slots: TX 4/2048, RX 4/2048 -
That's fine, 4 Tx/Rx queues on each. More than enough to handle 5Gbps unless your CPU is extremely low power.
What CPU is it? Does it report the expected clock speed? -
@stephenw10 It's an i5-6500 with 8gb RAM running on a 256 NVMe. Under full load, the CPU doesn't break a sweat. Interrupts on the interfaces in the status screen continue to pile up. Any other things to check/tweak? May roll back to v22 soon so that I can get proper network performance back.
https://imgur.com/a/6hBSAn1
Thanks
-
If the CPU speed reported correctly by:
sysctl dev.cpu | grep freq
?I've seen one report of another 6th gen CPU stuck at 800MHz.
Seeing interrupts on the interfaces like that is expected. If it's moving a lot of traffic that rate isn't that high either. It would be interesting to see how that value compares in 22.05 but I suspect it will be similar.
Steve
-
@stephenw10 Hmmm, CPU freq seems OK to me:
dev.cpu.3.freq_levels: 3200/-1
dev.cpu.3.freq: 1152
dev.cpu.2.freq_levels: 3200/-1
dev.cpu.2.freq: 1194
dev.cpu.1.freq_levels: 3200/-1
dev.cpu.1.freq: 1149
dev.cpu.0.freq_levels: 3200/-1
dev.cpu.0.freq: 1190And I wasn't aware of what those interrupt value should look like. Not something I ever really paid attention to when things were working well. :)
-
Yup looks like what I expect with speed shift support, which is new in 23.01.
-
Quick update. Installed a new X550-T2 card and flashed to latest firmware, installed and configured from scratch 2.60 CE-->22.05 with security patches-->23.01. WAN is connected to same Metronet GPON Nokia ONT and...full WAN performance is back. 950Mbps/950Mbps symmetrical.
Of course now...I'm seeing small errors IN forming on the WAN interface (I had this happening before with this NIC as WAN interface) but it's working at correct speeds. The errors IN on this card is why I went to the Chelsio T520-BT in the first place, and that card plays nice with my ONT, but the speeds were halved on 23.01 when I first upgraded.
Next step is to double check firmware on the Chelsio card, update Lenovo server BIOS to latest, then go through entire install process again and see if I can get the speeds on the Chelsio card to remain full-speed on 23.01.
UPDATE: With the system using the x550-T2 for WAN on 23.01... after a restart no more errors IN on WAN from GPON ONT! Almost 1TB of data transferred since that reboot and no errors on either WAN or LAN. Beauty!! Looks like 23.01 is gravvy.