Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0

tman222

Thanks @bmeeks for this additional info - it's very helpful!

I do have some good news! I was able to make some more progress and got everything working in native netmap mode on the Chelsio interfaces. It turns out the that the missing tunable was hw.cxgbe.fl_pktshift which needs to be set to 0:

https://lists.freebsd.org/pipermail/freebsd-net/2016-January/044433.html

I had considered setting this tunable before but didn't change it because I thought the default value was already 0. Turns out I was reading the wrong version of the FreeBSD documentation and in 11.3 it is actually set to 2 by default (Doh! It's always something simple isn't it? :)). In any case, once I set the pktshift value to 0, traffic started passing fine.

So to summarize, to use native netmap support with Chelsio cards set the following tunables in your loader.conf.local:

hw.cxgbe.num_vis=2
hw.cxgbe.fl_pktshift=0

This will create a virtual interface for each physical interface, e.g. cxl0 will have vcxl0 and so on. These virtual interfaces support netmap natively. You can confirm this by running dmesg and looking for netmap in the output, e.g.:

dmesg | grep netmap

You should see netmap support for the Chelsio virtual interfaces, including any other interfaces in your firewall that support netmap natively. For sample output, please see my previous post above. Finally you need to assign the virtual interface(s) to your network segment(s) and enable Snort Inline mode.

==================

Unfortunately, performance is not much better for me. Even when using netmap natively on the Chelsio card (as opposed to emulated mode) and enabling Snort Inline (netmap) mode, I was still only able to pass about 200-300Mbit/s between two different 10Gbit LAN interfaces (separate physical subnets, no VLAN's) using an iperf3 test. This is very similar to what I saw in emulated netmap mode. With Snort legacy (pcap) mode, I usually get around 4Gbit/s. WAN performance is even worse and limited to 50-80 Mbit/s down/up on a symmetric 1Gbit fiber connection. I do not think the issue lies with the Chelsio netmap support because if I enable Snort inline IPS mode solely on the WAN interface which is Intel igb, and not on any of the Chelsio interfaces, I see the exact same WAN down/up speed limitation.

@bmeeks you had mentioned this above:

Lastly, there is a different netmap interface API version in FreeBSD-11 versus FreeBSD-12. Only FreeBSD-12 supports multiple host rings in netmap. FreeBSD-11 exposes only a single pair of Tx/Rx host rings.

Do you think I may be hitting this (single host ring) limitation? Is there any way to tune Snort? Also, do you think it might be worth trying Suricata instead if they handle this interfacing differently (better?) Or would I be better off trying pfSense 2.5.0 at this point?

Thanks again for your help and insight I really appreciate it.

bmeeks

@tman222 said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@bmeeks you had mentioned this above:

Lastly, there is a different netmap interface API version in FreeBSD-11 versus FreeBSD-12. Only FreeBSD-12 supports multiple host rings in netmap. FreeBSD-11 exposes only a single pair of Tx/Rx host rings.

Do you think I may be hitting this (single host ring) limitation? Is there any way to tune Snort? Also, do you think it might be worth trying Suricata instead if they handle this interfacing differently (better?) Or would I be better off trying pfSense 2.5.0 at this point?

Thanks again for your help and insight I really appreciate it.

Perhaps, but there is also the issue of Snort being single-threaded. I'm not a kernel threading guru, so I'm not sure whether multi-threaded operation would help. My gut says "yeah", but there are other places where bottlenecks can occur besides just with the netmap packet processing. I don't know if you have duplicate hardware, but if you did, you could test using pfSense-2.5 snapshots. They are based on FreeBSD-12 and have the multiple host ring support (but only if the native NIC also exposes multiple rings).

Also, with Legacy Mode Blocking using PCAP, you would need to enable all the Snort stats to see if any packets are in fact getting dropped by Snort. You would not notice this on your speed test throughput because Snort is working on copies of the packets while the originals have gone on to the kernel when using Legacy Blocking. So the PCAP engine might be dropping some of the copied packets at high line rates. I think there are some stats for that when Statistics Mode is enabled (on the PREPROCESSORS tab). But I might be confusing some Suricata features here, so I don't want to be quoted on that.

When I first deployed the Inline IPS option with netmap on pfSense-2.5, the Netgate team did some testing of their own. If I recall correctly, they were able to get close to Gigabit line rates, but I don't know how many rules they had enabled. I'm thinking probably not very many. The Snort binary is likely not optimized for inline operation. Remember that Snort's history has been IDS first and foremost. The inline mode was added later as I understand it. And it's likely that code is not streamlined. Snort natively, through its DAQ library, supports two different inline modes on FreeBSD. One uses ipfw while the other uses netmap. The change I made to DAQ on pfSense was to allow the netmap mode to support host stack interfaces. Natively, DAQ only supported physical interfaces, so setting up an inline IPS mode between a NIC and the OS itself was not possible. You could only specify two different physical NIC ports as the netmap endpoints. This was not optimum for pfSense, so I added the ability to open and use host stack rings in DAQ.

tman222

Thanks @bmeeks - I decided to take a step back and just enable inline mode on the WAN interface (Intel igb) and ran iperf3 test between my network and a cloud based VM. Doing so I see the same 200-300Mbit/s up/down transfer limit I saw when doing local testing between the two Chelsio interfaces (usually I can get ~900/900 up/down to this VM when I have Snort legacy or pcap mode enabled). This makes me wonder now if the limitation simply comes down to Snort not being fully optimized for IPS mode, my firewall's CPU is not fast enough, or a combination of both. The CPU in my firewall is a Xeon D-1518, which is a quad core 2.2GHz chip with Hyperthreading. I do see CPU usage close to maxed when using Snort inline (netmap) mode on the igb interface, but I also see the same when using Snort legacy (pcap) mode. So I'm not fully convinced yet that it's just raw CPU cycles that are limiting performance.

As such, do you think it's worth giving Suricata a try? I did have a couple questions:

Suricata does support multi-threading, correct?
Is Suricata better suited to be an IPS than Snort?
Is it easy enough to migrate enough between the two, i.e. would my suppression list(s) transfer over pretty easily?

Thanks again for all your help.

bmeeks

@tman222 said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

As such, do you think it's worth giving Suricata a try? I did have a couple questions:

Suricata does support multi-threading, correct?

Is Suricata better suited to be an IPS than Snort?

Is it easy enough to migrate enough between the two, i.e. would my suppression list(s) transfer over pretty easily?

Thanks again for all your help.

You can certainly test, but I'm not sure Suricata will fare much better. Even its multithreaded operation still has certain bottlenecks. This used to be the big arguing point between the Snort and Suricata guys when Suricata first came out. The Snort team poo-pooed multithreaded operation -- that is until they decided to introduce it in Snort3, where they now play it up ... , so who knows what to believe.

Suricata is not necessarily better than Snort in IPS mode. Also, Suricata currently does NOT have multiple host rings support on FreeBSD-12. They are still using the older NETMAP_API version 13 interface. Only NETMAP_API version 14 has the multiple host rings.

As for your Suppress Lists, they can be freely copied and pasted if you save them off on your client PC, but Suricata will not automatically "find them" and use them when you install it. Snort and Suricata are totally independent in terms of configuration parameters. And the Suppress List is stored within the config.xml file of the firewall in the configuration section associated with the particular package the list is used with. So when I say copy and paste, I literally mean open your list for editing in Snort, copy all of the contents out and paste into say a Notepad file in Windows. Then install Suricata, create a Suppress List, and paste in the text from the Notepad file. If you do that, don't forget to go to the INTERFACE SETTINGS tab and assign the Suppress List to the Suricata interface where it belongs.

tman222

Thanks @bmeeks - I did end up running a few more speed tests with inline mode enabled on just the WAN (igb) interface and (oddly) saw quite a range of numbers: From as little as 100Mbit/s up/down (speedtest.net) to almost full line speed (Netflix's Fast.com) with several tests coming in between 200 to 300 Mbit/s (similar to what I saw with iperf3). I did only have a handful of ET rules enabled on the WAN interface, but even adding in the Connectivity rule set didn't have too big of an impact on speed (just a slight decrease). I also took some time to study the netmap documentation:

https://www.freebsd.org/cgi/man.cgi?query=netmap&sektion=4&manpath=FreeBSD+11.3-RELEASE

Thanks for providing some additional insight above about how you choose to change DAQ to use a netmap connection to the host stack. I'm trying to visualize how this all fits together so I can narrow down the relevant bottlechecks - would be it be something like this?

NIC --> Netmap --> Snort --> Netmap --> Host Stack

I'm still not convinced yet that a lack of processing power is the primary issue for me - the bottleneck could very well be the single host ring between netmap and the host stack, as the (Chelsio NIC) does support multiple netmap queues on it's virtual interfaces. I did enable preprocessing statistics as well on the WAN (igb) interface and looked at the output, but it's not giving me much insight where the majority of CPU time is being spent (to help narrow down potentially alleviate any bottlenecks). I thought I also read somewhere that there is a way to profile Snort in order to see that - do you know if that's possible to do on the pfSense / FreeBSD implementation?

In any case, I think further exploratory analysis is probably warranted before switching over to Suricata. I do have some additional hardware lying around here to build a test system for pfSense 2.5, including an additional Intel and Chelsio NIC, along with a CPU that will have better single thread performance (Intel i3-8100 3.6GHz quad core). I would first need to test with 2.4.5 to see if the CPU is indeed the primary bottleneck and then move up to 2.5 to see if now having the additional host rings will increase throughput. In the meantime, I will probably explore the netmap tunable parameters a bit more to see if any of them might make a noticeable difference on performance (especially parameters related to buffer and ring sizing).

Thanks again for all your help!

bmeeks

@tman222
I don't know the specific setup of your iperf3 testing, but when using iperf you should test THROUGH the firewall and not TO the firewall. In other words, do not put either the iperf client or server on the firewall. They both should instead be on workstations or servers hanging off two different firewall interfaces so that you test through the firewall.

When you run ipferf on the firewall itself, whether it's the client or server piece, there is some CPU time and potential thoughput chewed up by the iperf executable.

I mention this just because some other users in the past have tested with iperf installed on the firewall and have gotten misleading throughput numbers as a result.

tman222

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@tman222
I don't know the specific setup of your iperf3 testing, but when using iperf you should test THROUGH the firewall and not TO the firewall. In other words, do not put either the iperf client or server on the firewall. They both should instead be on workstations or servers hanging off two different firewall interfaces so that you test through the firewall.

When you run ipferf on the firewall itself, whether it's the client or server piece, there is some CPU time and potential thoughput chewed up by the iperf executable.

I mention this just because some other users in the past have tested with iperf installed on the firewall and have gotten misleading throughput numbers as a result.

Hi @bmeeks - sorry for not being clear in my original post, but the iperf3 test was performed between a (bare metal) Linux host on my home network and a cloud based VM (i.e. I was testing through the firewall).

One thing I forgot to ask - when you have a moment, do you mind asking the folks at Netgate what hardware and any specific sysctl tuning they used that allowed them to get close to 1Gbit/s line rate using Snort in IPS (inline) mode? I realize it was for 2.5, but it would be great to have another data point to calibrate my results to and help me further narrow down what might be causing the performance issues. Thanks again!

tman222

@bmeeks - well, I might have been wrong. It is starting to look like the CPU may be the bottleneck here. I experimented and reduced the number of rules on the WAN interface and network throughput started to increase. By time I was just using the basic Connectivity IPS policy (and no ET rules), I was able to get very close to line speed using an iperf3 test (again between a local Linux host and cloud based VM). As I started to increase the number of ET rules, throughput dropped again accordingly.

From all this I gather that Snort needs a quite a beefy CPU with great single thread performance to run in IPS (inline) mode and be able to achieve high throughput (I'm actually starting to wonder now whether 1Gbit/s is even possible with a modest rule set and Snort operating in inline mode). Perhaps it's worth trying out Suricata after all to see if the multi-threading would help?

Thanks again for your help and insight.

bmeeks

@tman222 said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@bmeeks - well, I might have been wrong. It is starting to look like the CPU may be the bottleneck here. I experimented and reduced the number of rules on the WAN interface and network throughput started to increase. By time I was just using the basic Connectivity IPS policy (and no ET rules), I was able to get very close to line speed using an iperf3 test (again between a local Linux host and cloud based VM). As I started to increase the number of ET rules, throughput dropped again accordingly.

From all this I gather that Snort needs a quite a beefy CPU with great single thread performance to run in IPS (inline) mode and be able to achieve high throughput (I'm actually starting to wonder now whether 1Gbit/s is even possible with a modest rule set and Snort operating in inline mode). Perhaps it's worth trying out Suricata after all to see if the multi-threading would help?

Thanks again for your help and insight.

The number of enabled rules and exactly what those rules are doing will have a large impact on throughput. That's why I preach to folks to carefully select the rules they enable so that they match the potential threats faced without bogging down the CPU and thus throttling throughput. Rules that are performing complex pattern matching are naturally going to be more processor intensive than some simple rule triggering on an IP or something similarly simple to examine.

One of my favorite examples to illustrate the "enable only what you need to address the attack surfaces you expose" mantra is to not enable the DNS server, web server or email server rules if you are not running such services behind your firewall. If you don't run a DNS server, then attacks against DNS servers are of no concern in your network. And there are other examples from the inventory of rules.

DAVe3283

@bmeeks I am hitting the pidfile suffix length issue as well: FATAL ERROR: Invalid pidfile suffix: _vtnet0.100. Suffix must less than 11 characters and not have ".." or "/" in the name.

I suspect it never came up in testing because it appears to require a combination of:

pfSense be running in a VM with the virtio network adapter (this is the Netgate recommended NIC type for Proxmox)
VLANs be handled in the guest, instead of the hypervisor presenting multiple virtual NICs (there are good reasons to do this in some setups)
A 3 or 4-digit VLAN ID (perhaps most people use short VLAN IDs?)

Unfortunately, it seems it isn't really possible to rename the interface on pfSense, as it gets reverted every reboot. And re-configuring the site's VLANs isn't exactly an ideal solution either.

Instead of changing the pfSense package to create a random pidfile suffix, is the Snort team open to increasing the maximum length? 12 characters would allow 10 virtio NICs and the maximum VLAN ID of 4095. 13 characters would allow 100 virtio NICs, which "should be enough for anybody" (famous last words).

bmeeks

@DAVe3283 said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@bmeeks I am hitting the pidfile suffix length issue as well: FATAL ERROR: Invalid pidfile suffix: _vtnet0.100. Suffix must less than 11 characters and not have ".." or "/" in the name.

I suspect it never came up in testing because it appears to require a combination of:

pfSense be running in a VM with the virtio network adapter (this is the Netgate recommended NIC type for Proxmox)

VLANs be handled in the guest, instead of the hypervisor presenting multiple virtual NICs (there are good reasons to do this in some setups)

A 3 or 4-digit VLAN ID (perhaps most people use short VLAN IDs?)

Unfortunately, it seems it isn't really possible to rename the interface on pfSense, as it gets reverted every reboot. And re-configuring the site's VLANs isn't exactly an ideal solution either.

Instead of changing the pfSense package to create a random pidfile suffix, is the Snort team open to increasing the maximum length? 12 characters would allow 10 virtio NICs and the maximum VLAN ID of 4095. 13 characters would allow 100 virtio NICs, which "should be enough for anybody" (famous last words).

I think I have a fix. I'm testing now to be sure there are no adverse side effects.

The 11-character limit imposed by the Snort binary stems, most likely, from the 16-character limit on process names in most Unix-based operating systems. The Snort binary prepends "snort" to any PID file suffix the user provides. So that's 5 characters right there, leaving 11 for the user's suffix. As you say, with certain combinations of long interface names and VLAN IDs, the user-supplied suffix can exceed 11 characters.

The only real need is to be able to uniquely identify the PID file for a given interface so the GUI can control the Snort binary. I can accomplish that using a different interface value generated each time you create a Snort interface. That is the UUID field. It's a random value between 1 and 65535, so it will never exceed a 5-character string and thus will be safely within the 11-character Snort limit. It is random when generated at the time the Snort interface is created, but stays static after that.

Look for the fix with this change in the near future. Hopefully later today if the pfSense team can quickly get the change reviewed and merged.

bmeeks

@Beerman and @DAVe3283, plus any others affected by the Snort startup failure with the "PID filename suffix must be less than 11 characters" error.

The fix for this has been posted for review and merging by the pfSense developer team. Look for Snort package version 4.1.2_1 in the near future.

Beerman

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@Beerman and @DAVe3283, plus any others affected by the Snort startup failure with the "PID filename suffix must be less than 11 characters" error.

The fix for this has been posted for review and merging by the pfSense developer team. Look for Snort package version 4.1.2_1 in the near future.

Thx!

bmeeks

Both pull requests have been merged. I've already updated my production system, but mh SG-5100 was not affected by the bug.

tman222

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

Both pull requests have been merged. I've already updated my production system, but mh SG-5100 was not affected by the bug.

Hi @bmeeks - after this update, Snort will no longer start for me. I see these errors:

/tmp/snort_vcxl0_startcmd.php: The command '/usr/local/bin/snort -R _3137 -D -q --suppress-config-log -Q --daq netmap -l /var/log/snort/snort_vcxl03137 --pid-path /var/run --nolock-pidfile --no-interface-pidfile -G 3137 -c /usr/local/etc/snort/snort_3137_vcxl0/snort.conf -i vcxl0^:vcxl0' returned exit code '1', the output was ''

and

FATAL ERROR: /usr/local/etc/snort/snort_3137_vcxl0/snort.conf(174) => Did not find specified IIS Unicode codemap in the specified IIS Unicode Map file.

Any ideas?

bmeeks

@tman222 said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

Both pull requests have been merged. I've already updated my production system, but mh SG-5100 was not affected by the bug.

Hi @bmeeks - after this update, Snort will no longer start for me. I see these errors:
/tmp/snort_vcxl0_startcmd.php: The command '/usr/local/bin/snort -R _3137 -D -q --suppress-config-log -Q --daq netmap -l /var/log/snort/snort_vcxl03137 --pid-path /var/run --nolock-pidfile --no-interface-pidfile -G 3137 -c /usr/local/etc/snort/snort_3137_vcxl0/snort.conf -i vcxl0^:vcxl0' returned exit code '1', the output was ''
and
FATAL ERROR: /usr/local/etc/snort/snort_3137_vcxl0/snort.conf(174) => Did not find specified IIS Unicode codemap in the specified IIS Unicode Map file.
Any ideas?

Easy fix. Your unicode.map file got clobbered somehow. Simply delete the Snort package and install it again. Do NOT use the "update" icon. Use the Delete icon and then reinstall from the Available Packages tab.

Not sure why some users hit this error. I have never gotten it in all the hundreds of times I've upgraded and/or green-field installed Snort on my test virtual machines.

The only possible explanation might be if you are not using any Snort Subscriber Rules. Those rules contain this file. The Emerging Threats rules do not. So if this file gets clobbered, and you don't have the Snort Subscriber Rules enabled, then the only way to get the file back is to reinstall Snort.

tman222

@bmeeks - thanks, that worked! After the package installed the first time I actually experienced a system crash - the error pointed to netmap rx queue over-allocation. I had been trying to tune netmap device parameters a bit for performance. I went ahead and reduced the number of netmap rx/tx queues and all has been well since. Not sure if these issues were ultimately related, but could potentially have been a contributing factor. Thanks again for all your help.

tman222

I went ahead today and changed my entire Snort configuration over from legacy (pcap) mode (IDS) to inline (netmap) mode (IPS) and learned quite a bit along the way. Inline mode does indeed require quite a bit of CPU horsepower to get decent throughput. While I had no trouble hitting 1Gbit/s+ using Snort's legacy (pcap) mode, I could barely scratch 200Mbit/s once I enabled inline mode. By running a variety of tests I concluded that the slowdown can be attributed to a combination of running out of CPU resources and a netmap limitation in FreeBSD 11.3, namely that only one rx/tx host ring exists between netmap and the host stack.

Through some additional tuning and adjustment, I was able to get that throughput number up some - below are some of my notes:

I have Snort enabled in inline mode on a Chelsio T540-SO-CR 4 port SFP+ card, where two interfaces are physical only interfaces (no VLAN's) and the other two interfaces that each have three VLAN's on them. Netmap will work natively on the two physical only interfaces as soon as virtual interfaces are enabled on the Chelsio card, as these have netmap support (see my previous post above for more details). Netmap will still work in emulated mode on the other two interfaces that have VLAN's on them (please see https://github.com/luigirizzo/netmap/issues/302). Interestingly, the performance (throughout) is actually better if netmap operates in emulated mode on the virtual interface as opposed to regular interfaces so it is worth changing the VLAN's over to a virtual interfaces as well.
I then took a very hard look at the rules I had enabled to try to reduce processing overhead. Some rule sets are particularly taxing on the CPU (I noticed this especially in the Emerging Threats data set, for example). Ultimately, I followed a approach similar to what was suggested here https://forum.ipfire.org/viewtopic.php?t=21521 and used a combination of Snort VRT, Snort Community, and ET Open rule sets. At present I have ~14000 rules active on each interface, which gives me a throughput of approximately 350-400Mbit/s up and down. With a bit more tweaking of the Snort Balanced IPS policy I hope to get this closer to 500Mbit/s. And this may very well be the realistic limit that the 2.2GHz Xeon D-1518 CPU in my firewall can handle. However, since it is a multicore CPU, this is the essentially the throughput limit per interface, i.e. with multiple interfaces sending data simultaneously I am able to max out my 1Gbit fiber circuit (since CPU cores will be dedicated to different Snort instances running on the various interfaces). To get better throughput than this I would probably have to upgrade to faster CPU in the 3.5-4GHz+ range. Having multiple rx/tx host rings in FreeBSD 12 (pfSense 2.5) should also help. I do see netmap transmit errors from time to time when I'm maxing out the throughput on a given interface, but hopefully multiple host rings will help solve this as well.

Overall this quite a humbling experience and really made me appreciate the need to carefully tune IDS/IPS rule sets to get desired performance while at the same time not trading off too much security benefit. Thanks again to @bmeeks for maintaining this package and for all your help and insight - it's truly appreciated.

Beerman

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@Beerman and @DAVe3283, plus any others affected by the Snort startup failure with the "PID filename suffix must be less than 11 characters" error.

The fix for this has been posted for review and merging by the pfSense developer team. Look for Snort package version 4.1.2_1 in the near future.

The fix is working for me. Thx! :)

DAVe3283

@bmeeks said in Upcoming Snort Package Updates for pfSense-2.4.5 and pfSense-2.5.0:

@Beerman and @DAVe3283, plus any others affected by the Snort startup failure with the "PID filename suffix must be less than 11 characters" error.

The fix for this has been posted for review and merging by the pfSense developer team. Look for Snort package version 4.1.2_1 in the near future.

Updated all my instances and everything seems to be working, thanks!