[SOLVED] Upgraded to 2.5.0 Now Seeing Ping Spikes
-
Hi all,
I upgraded from 2.4.5p1 to 2.5.0 this morning. Everything went smooth and system and all packages/services came back up fine. Everything seems to be working ok except that I'm now seeing ping spikes when:
- Pinging the firewall directly
- Pinging through the firewall (i.e. machine in one LAN subnet pinging a machine in another LAN subnet)
- Pinging websites on the broader internet from behind the firewall (e.g. Google, Microsoft, etc.)
The behavior I'm seeing is that when I'm issuing the standard 1 ping packet per second, I'll usually see a spike maybe every 5 - 10 packets (sometimes a bit more than this), but if I increase the velocity of the pings (e.g. let's say 1 ping every 0.1 seconds) I'll pretty consistently see the ping spike every 5 packets or so. When I say spike, I mean that pings can from sub 1ms all the way up to 50 - 100ms.
I never had issues with this under 2.4.5p1 or any versions prior to that. Here are the rough firewall specs:
Supermicro 5018D-FN8T
Chelsio T540-SO-CR used for LAN interfaces (cxgbe driver)
Intel i210 used for WAN interface (used to be igb driver, but now em/iflib in FreeBSD 12+)In terms of packages, I do have Snort and pfBlockerNG installed. I did try disabling Snort, but that seemed to make no difference, however.
Does anyone know what the problem might be or how I could go about troubleshooting this further? Thanks in advance for your help.
-
The other thing I also notice is when running a high velocity ping (i.e. with sub second frequency) is that by the end the pipe parameter is non zero, i.e. there are outstanding unanswered ping packets. This is quite perplexing to me, are ICMP packets somehow lower priority all of a sudden? Thanks again.
-
After spending a couple more days pouring over network settings and hardware tunables, unfortunately I have still not been able to resolve this ping spike issue. However, it does look like it is limited to the Chelsio card in my system so not sure if this is a driver issue or something related to FreeBSD 12.2 that changed from FreeBSD 11.3. Thankfully the intermittent spikes appears to be limited to ICMP traffic - i.e. I haven't noticed any slow down or excessive jitter on regular UDP/TCP traffic. One thing I might try is a fresh install of 2.5.0 to see if that resolves it unless someone out there has any suggestions on what to try next to troubleshoot this further. Thanks again for your help.
-
Ran a few ping flood tests between two hosts on different subnets (each subnet connected to a different port on the Chelsio card) and over the course of 25k packets or so there was 0% packet loss (good news) but a decent amount variability (jitter) from sub 0.1 ms to 100ms+ (bad news). Watching the ping tests I could see that ping requests were being sent and queued up a few packets deep before being answered and queue being emptied. This repeated over and over during the test. The hardware I'm using is plenty fast - could this be pointing towards some sort of buffering or CPU interrupt problem perhaps?
-
Found a useful tool called qperf (similar to iperf) that allows one to measure latency and transfer speeds between two hosts on a network:
https://www.opsdash.com/blog/network-performance-linux.html
Ran a number of UDP and TCP qperf tests between two 10Gbit hosts that are located in different network segments (which then uplink via a separate port on the Chelsio card to the pfSense firewall). Transfer speeds and latency looked perfectly normal across a variety of different data transfer sizes and test lengths. At this point I do think the ping spike/jitter I'm observing may just be limited to ICMP packets which, while a bit annoying, is something I can deal with. Is it possible for those packets to be getting de-prioritized somehow? Thanks again for your help.
-
Well, perhaps a step closer to finding a resolution. Found out today that the Chelsio firmware / driver was updated in FreeBSD 12.2 (on which pfSense 2.5.0 is based):
https://www.freebsd.org/releases/12.2R/relnotes/
So, all this could be a driver related issue - will try to reach out to Chelsio support directly.
-
We use neither Snort nor pfBlocker-NG and we’re seeing the same behaviour on a beefy system with Intel NIC. The spikes become more frequent the more users are on during the day. I’ll report back if we find the underlying cause of it.
-
Update: I heard back from Chelsio support this morning and was told that there was a bug in the driver's transmit path that they have fixed recently (looks like a couple weeks ago), which seems directly applicable given the behavior I've observed / described:
cxgbe(4): Fixes to tx coalescing. - The behavior implemented in r362905 resulted in delayed transmission of packets in some cases, causing performance issues. Use a different heuristic to predict tx requests. - Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing entirely. It can be changed at any time. There is no change in default behavior.
FreeBSD 14-current and FreeBSD 13:
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002084.html
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002263.htmlFreeBSD 12.2 (relevant to pfSense 2.5.0):
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002794.html
https://lists.freebsd.org/pipermail/dev-commits-src-all/2021-February/002795.htmlHow do I apply this driver fix to my current install of pfSense 2.5.0? Is there an easy way for me to do that using the System Patches package? Or do I need to raise a request on Redmine to include in the next release?
Thanks again for all your help.
-
That is already in master: https://github.com/pfsense/FreeBSD-src/commits/devel-12/sys/dev/cxgbe
So should be in 2.6 snapshots. Can you test one? Other things may be broken there....
Steve
-
@stephenw10 said in Upgraded to 2.5.0 Now Seeing Ping Spikes:
That is already in master: https://github.com/pfsense/FreeBSD-src/commits/devel-12/sys/dev/cxgbe
So should be in 2.6 snapshots. Can you test one? Other things may be broken there....
Steve
Unfortunately I only have one production system with that card that I would reluctant to upgrade to the unstable development branch. Do I have any other options bring this fix into my current 2.5.0 environment? Thanks again.
-
So I tried grabbing the Chelsio drivers modules from the latest FreeBSD 12.2 stable release:
t4fw_cfg.ko t5fw_cfg.ko t6fw_cfg.ko if_cxgbe.ko
I copied them over to
/boot/modules
made sure permissions were correct and then added the necessary load lines toloader.conf.local
as described here:However, upon reboot the updated driver modules aren't actually being loaded, which I can see from
kldstat
. It seems that the system still prefers the older Chelsio driver that was compiled into the kernel vs. these updated driver modules. If I try to load the updated modules manually usingkdload
I get error messages that the interfaces already exist in the kernel.Am I out of luck here, or is there any way for the system to use the newer driver modules instead of what was compiled into the kernel?
Thanks in advance for your help.
-
Well, unless I'm misunderstanding this thread, I think I may just be out of luck unless I recompile the kernel with the newer modules (which wouldn't really be an option):
https://forums.freebsd.org/threads/custom-kernel-driver-modules.14998/
Does anyone else have any thoughts / suggestions? Thanks in advance.
-
Happy to say that this issue has been fixed in the latest 2.5.1 snapshots:
https://redmine.pfsense.org/issues/11602
Before - 2.5.0:
--- X.X.X.X ping statistics --- 500 packets transmitted, 500 received, 0% packet loss, time 701ms rtt min/avg/max/mdev = 0.175/25.373/109.791/27.343 ms, pipe 8
After - 2.5.1-RC:
--- X.X.X.X ping statistics --- 500 packets transmitted, 500 received, 0% packet loss, time 96ms rtt min/avg/max/mdev = 0.181/0.267/0.365/0.040 ms
A big thank you to entire Netgate / pfSense team for addressing this so quickly.