Massive jitter issues after upgrading to 23.0x
-
Hello,
First time posting here. Been running netgate gear for years for a fiber transit WAN router, and has always been excellent. Recent box is a 7100 which has had been runnning on pfsense plus 21.xx for just under 3 years. Decided to do a network wide upgrade about a month back and made the jump from v21 to v23.01 (and now upgraded to 23.05 RC). Upon the 23.01 upgrade, everything seemed fine at first but we noticed a ton of ping spikes from our main monitoring machine, running PingPlotterPro on around 50 targets, internally and externally. After ruling out many other things on our network, we've been able to narrow it down for certain that it is the 7100 causing the issue, and it start occurring right after the upgrade. Have tried all of the following:
- reducing network traffic
- wiring directly into the pfsense on a switched port on the WAN block from the monitoring machine
- approaches for bufferfloat, including limiters and such
- disabling the firewall and NAT completely
- disabling HCO, TSO, HLRO and toggling ALTQ
- monitoring system loads
- toggling jumbo frames, flow control and all sorts of other settings
Nothing has made a dent. I do have an old netgate box lying around not in use with an older pfsense plus version, but wanted to at least make a post before ripping apart our edge network cause it is a live production environment. I do have very extensive PingPlotterPro data that I can post up if need be as well. In short though, my main monitoring machine on the network used to have a local ping to both the OPT and WAN interfaces on the netgate box of less than 1ms with virtually no jitter. Basically, ruler flat. Afterwards, it still averages less than 1ms, but we have ping spikes that can show up every 10-90s that can reach up to 80ms. When looking through google as to what sort of characteristics the graphs resemble, it almost looks like a textbook Intel Puma jitter spike graph, although our ISP provides a dedicated direct 1gbit up/down MMF connection from the node, with no modem to speak of.
Any help would be extremely appreciated, as I've been at this for nearly a month now.
-
@thekrynn That jitter sounds very much like of two issues:
Since 22.05 I have noticed exactly that kind of jitter if you are using NtopNG or other promiscious mode capture packages like fx. Snort/suricata. There seems to be quite a interrupt penalty every once in a while causing the ping spike.
I have also noticed the sensitivity of the 1Gbe link autonegotiation have gone up with the new drivers. At least at my place I had to fix my switch and pfsenseport to 1Gbit full-duplex to avoid intermittend latency/throughput drops because autonegotiation seemed to “run again”, and cause issues.
-
J jimp moved this topic from Problems Installing or Upgrading pfSense Software on
-
@keyser Thank you for the reply. Went through your list of recommendations.
Currently don't have any specialty packages. The installed package list is totally empty, and I confirmed the default promiscious mode settings was off on the interfaces. As far as the autonegotiation, that was a great idea since thinking about it, I have seen issues like that in the past with a few machines on my network. I did try to set the interface I'm using to monitor everything to manual 1Gbit FDX, and matched it on the pfsense interface that's internally facing, but it didn't have an effect... the issue persists.
Adding 3 screenshots as well to this post.
First one shows the v23 upgrade on 5/4 at around 4am and how it's clean before, and has the traffic after. Of the 5 things being monitored:
#1: a public IP around 5ms away
#2: the pfsense fiber transit WAN IP on our end
#3: the pfsense fiber gateway, 2ms away
#4: the pfsense internal OPT IP on our allocated block that our pingplotter monitoring interface is also on
#5: 1.1.1.1Second one shows the recent 60 minutes. the 3 packet drops are me toggling autonegotiation to manual on a few of the ports. Four things monitored:
#1: a device on our internal WAN block which is unaffected as we can ping it without going through the pfsense, and pinging it from all the other machines on the network are clean as well
#2: the pfsense fiber transit WAN IP on our end
#3: the pfsense fiber gateway, 2ms away
#4: the pfsense internal OPT IP on our allocated block that our pingplotter monitoring interface is also onThird one shows a ping to 1.1.1.1, 8.8.8.8, 9.9.9.9
A curious thing about all this is, some destinations seem more affected than others and its very random and chaotic. -
Do you see pfSense logging anything when you hit a latency spike? Like filter reloads perhaps?