Chasing latency
-
I've spent a bit of time chasing latency to/from pfSense recently, and I thought it would be worth posting about. Perhaps it will save someone else a few hours of frustration.
The issue started with trying to track latency spikes on the WAN connection. In an effort to eliminate various components before talking with the ISP, I set up various latency monitoring points, both on the firewall and off the firewall. The surprising results seen in the local network quickly became an investigation all on its own…
What I initially saw is show in the first two images. The first graph shows the latency when pinging from the firewall (SG-4860) to a directly connected (no switch) host in the DMZ. The second graph shows the latency when pinging from a host in the LAN through the firewall to the same host in the DMZ.
As you can see, both are nice sawtooth graphs. Clearly cyclic, but with a long period. 20 minutes or so. Pretty unusual. Even more unusual, was that on rare occasion it would just flatten out for 5 or 10 minutes. No discernible pattern. And any attempt at interactive diagnosis would cause an immediate reversion to sawtooth. I spent time combing through logs trying to correlate events. "Really, did an IPSEC rekey just cause my latency to drop?!? A pfBlocker list update? Seriously?"
The answer turns out to be yes. I finally found the cause: powerd with Hiadaptive. On the SG-4860 at least, it's bad news. After disabling powerd the latency appears as one would expect (graphs 3 and 4). And disabling powerd had little or no effect on core temperature. Win win.
![latency directly connect host with powerd.png](/public/imported_attachments/1/latency directly connect host with powerd.png)
![latency directly connect host with powerd.png_thumb](/public/imported_attachments/1/latency directly connect host with powerd.png_thumb)
![latency through pfsense with powerd.png](/public/imported_attachments/1/latency through pfsense with powerd.png)
![latency through pfsense with powerd.png_thumb](/public/imported_attachments/1/latency through pfsense with powerd.png_thumb)
![latency directly connect host without powerd.png](/public/imported_attachments/1/latency directly connect host without powerd.png)
![latency directly connect host without powerd.png_thumb](/public/imported_attachments/1/latency directly connect host without powerd.png_thumb)
![latency through pfsense without powerd.png](/public/imported_attachments/1/latency through pfsense without powerd.png)
![latency through pfsense without powerd.png_thumb](/public/imported_attachments/1/latency through pfsense without powerd.png_thumb) -
If you have a SoC or a CPU that is able to run on more the only one CPU frequency and disabled
PowerD (hi adaptive) you could run into other problems and issues, if only one CPU frequency
is now used and this is let us say something around 600MHz and then you will get more load
and need the full CPU frequency but it is only running on the 600MHz you will be not happy with. -
With powerd disabled, the cpu runs at half speed (1200 Mhz). Yes, this will hamper performance when a task comes along that requires intensive CPU for an extended period. But the trade off is worth it for the across the board latency reduction.
Another approach is to enable powerd and set it to Maximum, but this does bump the temp a degree or two. It may be worth it. More research and testing…
-
More research and testing…
I will consider this would the right way in my eyes, the SG-xxxx units are coming pre-tuned
and the developers are knowing much more then us about doing those things. Perhaps some
one from the staff is looking over this thread here. I would also try it once with adaptive instead
of high adaptive.NIC tuning
According to the drivers that you NICs are using or to the mbuf size
Squid tuning
If you are using Squid you might be also looking at this page
Squid performance tuning
From the middle of the page there are also nice tunings tips to get it better workingBy the way what you are using in front of the SG-xxx unit? A pure modem or a router?
-
@BlueKobold:
and the developers are knowing much more then us about doing those things. Perhaps some
one from the staff is looking over this thread here.i'm fairly certain that the coder of dpinger has a red phone he can use to contact the devs ;)
it makes sense that there is some added latency the moment the cpu throttles up/down as there will be some oddness on those cycles. Not too many people will see this as a big issue. (<1ms)
-
The default config for the SG-4860 has powerd enabled with HiAdaptive. I tested Adaptive briefly, but the baseline latency was 30% or so higher.
I am currently testing with powerd enabled with Max, and will probably leave it that way. Initial indications are that this gives a small additional reduction in latency, with a small bump in temperature. I haven't looked at power costs, but the SG-4860 doesn't draw a lot to begin with so I'm not very concerned about that. I am planning to monitor it over time and re-evaluate when I install 2.3.
The real mystery to me is the source of the 20 minute period with HiAdaptive. It doesn't line up with the CPU utilization at all. I hate mysteries, but don't have time to pursue this one.
I'm not sure I understand what you mean by "in front", but the test configuration looks like this:
LAN host (RCC-VE 4860) <--> TP-Link Switch (TL-SG2216) <--> pfSense (SG-4860) <--> DMZ host (SG-2440)
The LAN and DMZ hosts are running Linux, 4.1.5 kernel.
@BlueKobold:
I will consider this would the right way in my eyes, the SG-xxxx units are coming pre-tuned
and the developers are knowing much more then us about doing those things. Perhaps some
one from the staff is looking over this thread here. I would also try it once with adaptive instead
of high adaptive.By the way what you are using in front of the SG-xxx unit? A pure modem or a router?
-
I'm not sure I understand what you mean by "in front", but the test configuration looks like this:
I was asking for a modem, a router or perhaps something likes an ONT for fiber line.
LAN host (RCC-VE 4860) <--> TP-Link Switch (TL-SG2216) <--> pfSense (SG-4860) <--> DMZ host (SG-2440)
And please, what OS is running on what device?
Where is the Internet link, on which side I mean?The LAN and DMZ hosts are running Linux, 4.1.5 kernel.
- Ok but what OS is there installed? IPFire, IPCop, DD-WRT, OpenWRT, RouterOS,
fli4l, ClearOS, or a plain Ubuntu or other Linux distro? - Is there anything bridged together? Any Ports are bridged in this scenario?
- Are all of this units are doing NAT or firewall rules?
- Ok but what OS is there installed? IPFire, IPCop, DD-WRT, OpenWRT, RouterOS,
-
There wasn't an internet connection involved in the tests shown. The tests were local ethernet only only. The LAN connection involved a switch, but the pfSense to DMZ host latency is literally just over a cross connect. No bridging was involved in the graphs I posted, but for what it's worth, I've also tested in a bridged LAN setup and it appears to have no impact, positive or negative. The distro of Linux is Gentoo. Over the course of several weeks other tests were run to individually check latency of the various components involved, switch, mulitple hosts, etc. The sawtooth with pfSense was consistent throughout.