[SOLVED] Upgraded to 2.5.0 Now Seeing Ping Spikes
-
After spending a couple more days pouring over network settings and hardware tunables, unfortunately I have still not been able to resolve this ping spike issue. However, it does look like it is limited to the Chelsio card in my system so not sure if this is a driver issue or something related to FreeBSD 12.2 that changed from FreeBSD 11.3. Thankfully the intermittent spikes appears to be limited to ICMP traffic - i.e. I haven't noticed any slow down or excessive jitter on regular UDP/TCP traffic. One thing I might try is a fresh install of 2.5.0 to see if that resolves it unless someone out there has any suggestions on what to try next to troubleshoot this further. Thanks again for your help.
-
Ran a few ping flood tests between two hosts on different subnets (each subnet connected to a different port on the Chelsio card) and over the course of 25k packets or so there was 0% packet loss (good news) but a decent amount variability (jitter) from sub 0.1 ms to 100ms+ (bad news). Watching the ping tests I could see that ping requests were being sent and queued up a few packets deep before being answered and queue being emptied. This repeated over and over during the test. The hardware I'm using is plenty fast - could this be pointing towards some sort of buffering or CPU interrupt problem perhaps?
-
Found a useful tool called qperf (similar to iperf) that allows one to measure latency and transfer speeds between two hosts on a network:
https://www.opsdash.com/blog/network-performance-linux.html
Ran a number of UDP and TCP qperf tests between two 10Gbit hosts that are located in different network segments (which then uplink via a separate port on the Chelsio card to the pfSense firewall). Transfer speeds and latency looked perfectly normal across a variety of different data transfer sizes and test lengths. At this point I do think the ping spike/jitter I'm observing may just be limited to ICMP packets which, while a bit annoying, is something I can deal with. Is it possible for those packets to be getting de-prioritized somehow? Thanks again for your help.
-
Well, perhaps a step closer to finding a resolution. Found out today that the Chelsio firmware / driver was updated in FreeBSD 12.2 (on which pfSense 2.5.0 is based):
https://www.freebsd.org/releases/12.2R/relnotes/
So, all this could be a driver related issue - will try to reach out to Chelsio support directly.
-
We use neither Snort nor pfBlocker-NG and we’re seeing the same behaviour on a beefy system with Intel NIC. The spikes become more frequent the more users are on during the day. I’ll report back if we find the underlying cause of it.
-
Update: I heard back from Chelsio support this morning and was told that there was a bug in the driver's transmit path that they have fixed recently (looks like a couple weeks ago), which seems directly applicable given the behavior I've observed / described:
cxgbe(4): Fixes to tx coalescing. - The behavior implemented in r362905 resulted in delayed transmission of packets in some cases, causing performance issues. Use a different heuristic to predict tx requests. - Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing entirely. It can be changed at any time. There is no change in default behavior.
FreeBSD 14-current and FreeBSD 13:
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002084.html
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002263.htmlFreeBSD 12.2 (relevant to pfSense 2.5.0):
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002794.html
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002795.htmlHow do I apply this driver fix to my current install of pfSense 2.5.0? Is there an easy way for me to do that using the System Patches package? Or do I need to raise a request on Redmine to include in the next release?
Thanks again for all your help.
-
That is already in master: https://github.com/pfsense/FreeBSD-src/commits/devel-12/sys/dev/cxgbe
So should be in 2.6 snapshots. Can you test one? Other things may be broken there....
Steve
-
@stephenw10 said in Upgraded to 2.5.0 Now Seeing Ping Spikes:
That is already in master: https://github.com/pfsense/FreeBSD-src/commits/devel-12/sys/dev/cxgbe
So should be in 2.6 snapshots. Can you test one? Other things may be broken there....
Steve
Unfortunately I only have one production system with that card that I would reluctant to upgrade to the unstable development branch. Do I have any other options bring this fix into my current 2.5.0 environment? Thanks again.
-
So I tried grabbing the Chelsio drivers modules from the latest FreeBSD 12.2 stable release:
t4fw_cfg.ko t5fw_cfg.ko t6fw_cfg.ko if_cxgbe.ko
I copied them over to
/boot/modules
made sure permissions were correct and then added the necessary load lines toloader.conf.local
as described here:However, upon reboot the updated driver modules aren't actually being loaded, which I can see from
kldstat
. It seems that the system still prefers the older Chelsio driver that was compiled into the kernel vs. these updated driver modules. If I try to load the updated modules manually usingkdload
I get error messages that the interfaces already exist in the kernel.Am I out of luck here, or is there any way for the system to use the newer driver modules instead of what was compiled into the kernel?
Thanks in advance for your help.
-
Well, unless I'm misunderstanding this thread, I think I may just be out of luck unless I recompile the kernel with the newer modules (which wouldn't really be an option):
https://forums.freebsd.org/threads/custom-kernel-driver-modules.14998/
Does anyone else have any thoughts / suggestions? Thanks in advance.
-
Happy to say that this issue has been fixed in the latest 2.5.1 snapshots:
https://redmine.pfsense.org/issues/11602
Before - 2.5.0:
--- X.X.X.X ping statistics --- 500 packets transmitted, 500 received, 0% packet loss, time 701ms rtt min/avg/max/mdev = 0.175/25.373/109.791/27.343 ms, pipe 8
After - 2.5.1-RC:
--- X.X.X.X ping statistics --- 500 packets transmitted, 500 received, 0% packet loss, time 96ms rtt min/avg/max/mdev = 0.181/0.267/0.365/0.040 ms
A big thank you to entire Netgate / pfSense team for addressing this so quickly.