Snort performance issues
-
Hi All,
I am running pfsense 2.5.2 on proxmox. I am running 5 interfaces (LAN, 2 VLANs, 2 Gateways) with limiters and for the most part it works well. I get about 2Gbps throughput between VLANs and up to the gateway, which is more than enough to max out my 1Gbps fibre internet connection.
However, this is with snort disabled. Once I turn on snort it drops down to 360-400Mbps (legacy mode). Inline is even worse.
I've had a look at the system processes while running speed tests, and the snort process does take up a lot of CPU (80% of a single core, having a total of 4 cores from an old 8 core Xeon E5), but there is still around 40% idle across each of the 4 cores remaining. I have tried the following changes to try and tweak performance:-
- Increasing CPU to 8 cores - negligible impact
- Doubling RAM from 4GB to 8GB - negligible impact (RAM was never an issue anyway - had 40% free even on 4GB)
- Changing the snort detection / search method - have tried all options - AC did increase average throughput by about 20Mbps, but the peak was only slightly higher.
- Turning on 'search optimise' and 'checksum check disable' within snort - negligible impact
- changing the active rules - started with around 18 rules active. have tried reducing that to 4, and then to none - negligible impact
What else should I be trying to improve performance? A 60% performance hit to throughput seems odd when the hardware doesnt appear to be overloaded. Apart from turning off all the pre-processors which give warnings about breaking dependencies, not sure what to do.
Any help appreciated, thanks.
-
Snort is a single-threaded application, so it will only run on a single core. It does not matter how many CPUs or cores you have, Snort is only going to use a single one.
A drop in throughput is to be expected because it takes CPU cycles to grab packets and analyze them. As the number of enabled rules goes up, the performance hit will become more significant. Inline IPS Mode operation will be the biggest hit to performance, especially in a virtual machine because everything is a software construct (software is emulating all the hardware). Are you using hardware pass-through for the NICs assigned to the firewall?
I agree the 60% performance hit is a bit extreme, though. First question is how are you running the speed tests? If you are using the firewall as either the "source" or "destination" of the speed test traffic, that will negatively impact your result. That's because pfSense is a firewall and not a server. As a firewall, it is optimized for passing traffic "through it" from one interface to another. It is not optimized to run applications "on it" like a server would be. So running an Internet speed test client on the firewall is not the optimal way to test. The correct way to test is to have a client on the LAN and a different client either on another firewall local interface (say a DMZ or something), or talk to a client out on the WAN. But that latter method then brings the Internet itself into play.
-
@bmeeks Ive been running speed tests through the firewall - both using iperf3 on different VLANs that run through pfsense, and speedtest.net to my ISPs speedtest server. The internet would come into play using speedtest.net, but the results are very obvious (e.g running 4 speedtests sequentially, alternating between snort off/on and the results come back as 910Mbps, 360Mbps, 890Mbps, 380Mbps)
I'm using virtualised NICs, not passthrough. Seeing that I get reasonably high throughput without snort I didnt think it would be a NIC issue? I could try that to test it, but its not really a suitable long term solution as it would mean I lose the aggregated uplink between proxmox server and switch if I need to pass one or more cards through.
Anyway, will try that and report back
-
Tests I've seen from others in the past usually show somewhere between a 20% and 30% drop in performance with Snort enabled. The number (and even type) of enabled rules has a big impact on the performance hit.
But it's mostly just how things are with an IDS/IPS, especially when everything is virtualized.
You could try switching over to Suricata. It is multithreaded, and so long as you don't get elephant flows, it can distribute the load across all available CPUs and cores. It is important that you have traffic that generates enough unique Toeplitz hashes to spread the load across cores, though. Typically a single speed test won't do it as that is one flow and will therefore be restricted to a single core. All traffic from a given flow must be processed by the same core, or you will have packet ordering and reassembly problems. Multithreaded IDS/IPS applications need to see a lot of different flows in order to effectively utilize all the CPU processing available.
-
@bmeeks some interesting results to report back.
by running the NICs as passthough throughput went up by about 150Mbps, so high 400/low 500Mbps speeds on average.
by removing the speed limiter (which was set at 800Mbps so shouldnt have had any impact in theory), the speeds went up by another 100Mbps - so low 600Mbps.
That would land me on around 30% performance impact, so within the expected range. Not great but at least a trade-off worth considering I guess.
Thanks for your help
-
@kanemari said in Snort performance issues:
@bmeeks some interesting results to report back.
by running the NICs as passthough throughput went up by about 150Mbps, so high 400/low 500Mbps speeds on average.
by removing the speed limiter (which was set at 800Mbps so shouldnt have had any impact in theory), the speeds went up by another 100Mbps - so low 600Mbps.
That would land me on around 30% performance impact, so within the expected range. Not great but at least a trade-off worth considering I guess.
Thanks for your help
It would have been helpful if you had mentioned in the original post that limiters were configured. They can most definitely have an influence, as you saw when they were disabled.
Because virtual machines generally run so well, we tend to gloss over the fact "everything" about them is a piece of running software. So things that are seamlessly handled at very high speeds in hardware on bare metal (like NIC interrupts, DMA transfers, etc.) require lots of CPU software instructions to replicate in a virtual machine. That's because the CPU is duplicating all of the hardware registers, and even what's normally onboard firmware in the NIC, with lots of software instructions running in the hypervisor. So with the NICs not passed-through so the VM can actually offload some of these things to real hardware, the CPU finally hits the limit of single-core performance when running the OS, emulating all of the hardware and hardware actions, running Snort, and handling traffic coming and going on the interfaces at near line rate. Removing some of the workload, such as stopping Snort, frees up enough cycles for the throughput to inch up a little.
Depending on the exact NIC type in your VM, you might increase performance a bit more by tweaking any
sysctl
parameters exposed by the NIC hardware driver. Google research would be your friend there.The test values I was quoting came from user tests on bare metal, so not virtualized.
-
@bmeeks said in Snort performance issues:
It would have been helpful if you had mentioned in the original post that limiters were configured.
I did, first sentence (edit- second sentence actually :)) :-
"I am running pfsense 2.5.2 on proxmox. I am running 5 interfaces (LAN, 2 VLANs, 2 Gateways) with limiters"
will have a look at the sysctl parameters. thanks again for your help.
-
@kanemari said in Snort performance issues:
@bmeeks said in Snort performance issues:
It would have been helpful if you had mentioned in the original post that limiters were configured.
I did, first sentence (edit- second sentence actually :)) :-
"I am running pfsense 2.5.2 on proxmox. I am running 5 interfaces (LAN, 2 VLANs, 2 Gateways) with limiters"
will have a look at the sysctl parameters. thanks again for your help.
Sorry, I missed that word when reading.
-
The Netgate team did some throughput testing some time back shortly after the Inline IPS Mode option was added to the Snort package. I no longer have the emails we swapped, but if I recall correctly there was about a 100 MBits/sec penalty with Snort running with a moderate rule set. This was likely on one of their higher-end appliances (but bare metal, so not apples-to-apples when comparing to virtualized).
One thing that also matters is the type of traffic in the throughput test. Small packets versus large packets, for example. I'm talking about the payload size when I say "small" and "large". Every packet has a certain amount of CPU overhead in order to get processed. That overhead stays pretty constant no matter if the packet is a 64-byte icmp-request, or if it is a full-frame 1500-byte TCP packet. The overhead I'm talking about is wrapped up in pulling the packet off the wire and copying it into kernel memory for processing.
But when looking at throughput, most all speed tests are just measuring total bits or bytes received over some unit of time (typically per second). As an example, about 42 1500-byte packets will deliver the same amount of "data" per second as 1000 64-byte icmp-reply packets. Obviously the CPU is going to be doing a lot more work over the same time interval to process 1000 packets as compared to only 42. The true measure of throughput is pps (packets per second). This removes the "packet size" variable from the equation. So one thing some throughput tests do is mix up the composition of packets in the test data stream. They generate a mix of small, medium, and maximum payload packets in an attempt to more closely mimic real world traffic.
So my point with the long description above is that you should not assign too much importance to the results of a speed test. Instead, examine the real day-to-day performance of your network. Do things seem to still work at the same speed? Do web sites still load at the same rate? Are users complaining about speed slowdowns? An IDS/IPS will slow down your packet processing by some amount. That's just unavoidable, because you have added a large amount of extra work for the CPU to do. It has to pull the packets into the IDS/IPS, analyze them by comparing each packet to all of the signatures, and then make a go/no go decision for the packet. The CPU cycles expended doing that have to come from somewhere.