Suricata blocking IPs on passlist, legacy mode blocking both
-
Just wanted to update, still running strong on my noisier interface with alerts only blocking external IPs. I haven't had another interface alert yet to confirm they are functioning properly, but I am hopeful this was the trick to get everything back on track. I'm curious to see if it works for @btspce once tested. I'll confirm my end for sure once other interfaces alert and operate as expected.
A warning alerting the setting as a conflict would be a nice addition. I remember having read that as an issue in the past for inline, but honestly I thought I had it disabled as it was. I don't get into that setting screen on the GUI very often, so a warning would be a welcomed addition. Although I'm not likely to ever toggle it again at this point. It will stay disabled along with the other offloading settings.
-
@sgnoc @bmeeks Now testing with hardware checksum offloading off.
Disabled hardware checksum offloading on backup firewall and rebooted.
Enabled suricata on interfaces again from primary firewall.
Suricata started on backup firewall without any issues.
Disabled hardware checksum offloading on primary firewall and rebooted.
wan vip plus other vip and ip adresses on wan was instantly blocked by suricata on secondary firewall during failover due to reboot of primary firewall. (It blocked wan vip, primary fw wan ip among other things)Result: Primary firewall came back as master on all interfaces and secondary firewall was master on wan and backup on the other interfaces. Removed wan vip blocks on secondary firewall and traffic started to flow. Carp status went back to master/backup as it should when the block was removed.
So random adresses still has a chance to be blocked during failover in this newer version of pfsense and suricata. @bmeeks explained the timing issue well and it seems to hit us with enough traffic on wan.
The solution will probably be to switch to inline when using carp/ha to avoid this as we have never seen this timing issue on the older versions. Last working version was pfsense 23.05.1 and suricata 6.We had around 300Mbit of incoming traffic on wan during failover above.
No interface flapping in suricata.log and everything works for 1 hour now but failover will be an issue it seems if enough traffic is hammering the interface during failover. -
@btspce:
I believe there are CARP configuration changes with respect to timeouts that might help your issue. Essentially you would want to lengthen the time CARP allows the "leader" to appear in an offline state before switching roles.I can certainly see how CARP changing who is primary and who is secondary would cause the Suricata "flapping issue". And the resulting IP deletions and additions result in the timing windows that allow blocks to happen when you don't want them.
Perhaps one future solution is "sleeping" the interface monitoring thread in Suricata for some period of time before it begins changing out IP addresses in the Radix Tree. But then you could create a window where something like a VPN interface is brought up by the kernel AFTER Suricata had started and manually scanned the interface IPs. The VPN IP might not be in the Radix Tree at that point (because it was not present at Suricata startup) and get blocked much like what is happening with your WAN IP now. You might simply fix one issue and simultaneously create another one for other users.
In short, there is no easy solution on the Suricata side. The better way to address this would be stretch out the CARP deadtime so that Suricata has a chance to get up and running on the interfaces BEFORE the CARP daemon decides the primary is down and switches to a secondary.
-
@btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:
Now testing with hardware checksum offloading off.
Disabled hardware checksum offloading on backup firewall and rebooted.While @sgnoc says disabling checksum offloading worked for him, I don't see how it can actually impact what's happening. The checksum offloading results in Suricata (or anything monitoring on the kernel end of the network connection) seeing invalid packet checksums. It does not alter what IP addresses are or are not in the packets and how they would be found (or not found) in a Radix Tree search. I also don't see how it could cause an IP to be deleted from and then later added back to an interface.
I guess it is possible the checksum offloading is causing something funky to happen at the NIC hardware level. If that is the case, then the actual NIC driver might be cycling the interface down and back up, and something like that would cause IP addresses to be deleted and added back as the interface was cycled. But you would expect that behavior to be noted in the pfSense system log.
-
@bmeeks I thought of that earlier and as a test raised the Base Advertising Frequency from 1 to 10 on wan only but it didn't help there. But that was with hardware checksum offloading enabled. So I will probably have to redo that test and maybe with an even higher base number to rule that out. But I think the next step will be to move this suricata instance from wan to an internal interface and switching to inline and see if it runs stable on these Netgate 6100.
I'm not sure disabling hardware checksum offloading did anything in our case either.
Perhaps the combination of running ET Pro rules (longer rule loading time), amount of traffic at the time of suricata starting and carp/ha makes this setup more likely to hit the issue. -
@btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:
@bmeeks I thought of that earlier and as a test raised the Base Advertising Frequency from 1 to 10 on wan only but it didn't help there. But that was with hardware checksum offloading enabled. So I will probably have to redo that test and maybe with an even higher base number to rule that out. But I think the next step will be to move this suricata instance from wan to an internal interface and switching to inline and see if it runs stable on these Netgate 6100.
I'm not sure disabling hardware checksum offloading did anything in our case either.
Perhaps the combination of running ET Pro rules (longer rule loading time), amount of traffic at the time of suricata starting and carp/ha makes this setup more likely to hit the issue.I agree that the presence of CARP/HA is likely the cause of this problem. As I mentioned before, it's not a configuration I've ever tested with Suricata (nor Snort, for that matter). And the more traffic flowing over the interface, the more likely it is that a packet will trigger an alert while one of the interface IPs has been deleted from the Radix Tree (and before it gets added back to the tree).
So, do you not run HA on the internal interfaces? I would think that wherever CARP/HA is in place (WAN, LAN, or elsewhere) that the interface flapping would happen.
While Inline IPS Mode will eliminate permanent blocks of an interface IP, it can still result in traffic interruptions if a DROP rule triggers. But those interruptions should not impact packets associated with the CARP protocol unless a rule false positives on the traffic.
Also be aware that Inline IPS Mode is not available for all NIC types, but it should be available and work for the NICs in the SG-6100 box.
-
@bmeeks We do run HA on internal interfaces aswell. Moving the suricata instance from wan to one of the internal interfaces is simply to limit the traffic it sees when switching to inline as the load will increase. But it's not perfect either because now we have to rearrange or bypass some of the internal traffic which do not need to be scanned by suricata to limit the throughput drop on that side. I will probably do the switch this weekend if possible and report back.
We did use inline mode a few years ago on XG-7100 but it wasn't stable enough and legacy mode solved all issues at the time. But there has been a lot of development since then.
-
@btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:
We did use inline mode a few years ago on XG-7100 but it wasn't stable enough and legacy mode solved all issues at the time. But there has been a lot of development since then.
Yes, a lot of work has gone into the netmap device driver over the last couple of years, especially in regards to mutliple host rings support in Suricata.
You will almost certainly want to change the Suricata Run Mode from AutoFP to workers on the INTERFACE SETTINGS tab in the Performance section. That will usually work much better with netmap on multi-core CPUs if you also have multi-queue NICs. But experiment with both modes. For a small handful of users AutoFP has performed better. Depends a lot on the particular NIC.
-
@bmeeks I'm now up and running in inline mode on two internal interfaces and in workers mode. One of interfaces has vlans on it.
Hardware Checksum offload disabled and flow control disabled for the relevant parent interfaces.
Everything works so far except both firewalls becomes carp master for the vlan interfaces only. No alerts on the interfaces. Any idea on this issue ? -
The issue is that vlan hardware tagging has to be disabled on the nic for suricata to be able to pass the vlan tags in inline mode.
In this case it was interface igc0 so I entered the below command in a shell on both firewalls and traffic and carp was instantly happy. Is there any way to set this as a system tunable ?
ifconfig igc0 -vlanhwtag
-
@stephenw10 Do you know if we can disable vlan hardware tagging on a nic through a system tunable or loader or is it something that can be added in the next release?
It is needed for suricata to work in inline mode on parent interface with vlan's. Vlan traffic will not work without it. -
-
@btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:
The issue is that vlan hardware tagging has to be disabled on the nic for suricata to be able to pass the vlan tags in inline mode.
In this case it was interface igc0 so I entered the below command in a shell on both firewalls and traffic and carp was instantly happy. Is there any way to set this as a system tunable ?
ifconfig igc0 -vlanhwtag
Not as a system tunable, but sort of in concert with @kiokoman's suggestion, you can use the earlyshellcmd options to do this at bootup. I formerly did this to make DHCP work properly with my fiber-to-the-home connection as the ISP's device used VLAN 0 and previous pfSense versions would not see that tag unless you turned off hardware VLAN tags.
Here is a link describing the process: https://docs.netgate.com/pfsense/en/latest/development/boot-commands.html. You can install the
Shellcmd
package to make this an easy GUI task. You can configure early shell commands to execute shortly after the firewall boots to turn off (or on) any special NIC hardware features.Be advised that Inline IPS Mode and VLANs are not exactly best friends, but they do work okay so long as the Suricata instance is running on the parent physical interface. The package now sort of does that behind the scenes when it detects the interface is a VLAN. It configures Suricata to run on the parent physical interface in promiscuous mode. This VLAN coexistence is due to a limitation within the netmap kernel device and not something Suricata has control over. The current workaround is to use the physical parent interface.
-
@kiokoman @bmeeks Thanks both of you!
I know suricata was not really made for running traffic with vlan tags so this is more of a nice to have on these vlans. All important/production traffic already had their own interface as we were planning on moving back to suricata inline when it became stable enough so that made this easier now. If there is any instability this suricata instance will be disabled. -
I'm having this problem as well. Suricata is on the WAN interface operating in Legacy Mode (pfsense 23.09.1, suricata 7.0.2_3, custom hardware Supermicro C2758, IGB interfaces), and as soon as Suricata is enabled the Public IP on the WAN is instantaneously blocked. I have four additional CARP interfaces that get blocked as well. The CARP interfaces aren't 'real' failovers, but are used due to the goofy AT&T fiber setup. At any rate the WAN interface is not a CARP and getting blocked too. I've checked out the pass list file in the Suricata config folder and the correct IPs/Networks are showing up. The suicata.log is claiming these IP addresses are being added to the pass list. I've tried a custom pass list, and I've disabled the hardware offloading for the interface, and verified there aren't multiple instances of Suricata running, any advice?
-
@eldog Try moving it to LAN. That also has the advantages of alerting on LAN IPs instead of the NATted WAN, and also not bothering to scan any inbound traffic that will immediately be blocked by the pfSense firewall.
-
I might try that, but I actually prefer it on the WAN. There it can generically block IPs that are poking around and up to no good even if the traffic would never have reached my network.
-
Finally got some additional blocks on the other interfaces, and I can confirm my issue is now resolved. Disabling the hardware checksum offloading did the trick, as unlikely or inexplicable as a solution as it may be. All my interfaces have now has alerts that only blocked the external IP. The IP listed in the default pass list was not blocked. Also, not seeing the deleted IPs from the default pass list from previous interface flapping issues. The suricata.log on the various interfaces is just showing adding the IPs and no deletions.
Even though it may not make sense, hopefully this solution will help some others that come across it.
-
@sgnoc said in Suricata blocking IPs on passlist, legacy mode blocking both:
Finally got some additional blocks on the other interfaces, and I can confirm my issue is now resolved. Disabling the hardware checksum offloading did the trick, as unlikely or inexplicable as a solution as it may be. All my interfaces have now has alerts that only blocked the external IP. The IP listed in the default pass list was not blocked. Also, not seeing the deleted IPs from the default pass list from previous interface flapping issues. The suricata.log on the various interfaces is just showing adding the IPs and no deletions.
Even though it may not make sense, hopefully this solution will help some others that come across it.
Yes, we can take the win even if we don't fully understand why it works .
I'm still thinking it was the rapid interface IP deletions and additions that were at the root of the unwanted blocking problem. Maybe the hardware checksum thing was messing with some other part of Suricata's code (not the custom blocking module) and that was resulting in resets of the interface which was resulting in the long sequence of deleting and adding back the interface IP addresses.
One thing I have not cross-checked is if the number of IP deletion/addition sequences equals the number of worker threads spawned by Suricata. You have a lot of interfaces (with the VPNs and CARP), so there is a lot to digest. Have not checked this myself, but curious if say the number of times the WAN IP (to use just one interface example) was deleted and added back a total of 4 times during startup. Suricata spawned 4 worker threads in your setup if I recall correctly. Wonder if each worker thread, when it launched and started a PCAP process on the interface, resulted in the interface being "cycled" by the kernel ??
Later Edit: went back and looked through the
suricata.log
file that @sgnoc posted back up earlier in this thread to check out my new hypothesis. I think my hypothesis is correct. Here is the whole story as I now see it, and my theory appears to jive with the log evidence (and this also explains why disabling the hardware offloading "fixed' the problem for @sgnoc).Around 6 months ago Suricata upstream made some changes to the PCAP module of Suricata. Those changes were released in Suricata 7.x. Those changes involved adding some system
ioctl()
calls to disable certain hardware offloading if it was found to be enabled when running with the PCAP package capture method. Legacy Blocking Mode in Suricata on pfSense uses PCAP for acquisition. The system call to disable hardware offloading will happen thread-by-thread and interface-by-interface as Suricata starts up. The more interfaces and worker threads you have, the more pronounced this behavior will be. Worker thread count is controlled by the number of CPU cores. Theioctl()
system call apparently caused the kernel to delete and then add back IP addresses on the interface specified for hardware offloading disable. This is likely due to the kernel "resetting" that interface's capabilities as requested. Since these changes were made in Suricata native code and not in the custom pfSense blocking module, I was not tracking them and was unaware of them. This is why I was initially puzzled that turning off hardware offloading "fixed" the problem.To see if my theory was correct, I just went back and counted the number of worker threads launched by Suricata on @sgnoc's machine and compared that count to the number of times his WAN IP was deleted and added back on the
ix0
interface. Both numbers were "4". There were four threads launched, and his WAN IP was deleted and added back four separate times during Suricata startup.The new Suricata changes first test if offloading is already disabled, and if it is, then no
ioctl()
system call is made. Therefore the interface does not get reset to change the capabilities. The bounce would only happen if offloading was found to be enabled, because then the Suricata PCAP module will make a systemioctl()
call attempting to disable the offloading. -
-
@bmeeks That all makes a lot of sense from what I was seeing on my end. The only thing that doesn't make sense to me is why I was still getting internal blocking from alerts after the interface flapping stopped. Your explanation on the flapping made sense, with threads catching alerts between deleting and adding, thus blocking the internal IP in between those actions.
That part makes sense, but what I don't understand is why the blocks continued to happen once the interfaces finish starting and the flapping cease. I would have thought that the IPs would then be added back and stable and the internal IPs would no longer be blocked for future alerts, especially since there are no more deletions on the Suricata.log after the interface initial startup.
The only thing I can guess, with my limited understanding on the behind the scenes interactions you're describing, is maybe there is some kind of compatibility issue with the hardware and the way that the ioctl() calls disable hardware offloading? There must be some kind of reasoning why the alerts continued to block internal IPs.
I know it doesn't matter much since this solution seems to have fixed everything, but it is still curious to me. It made it more of a challenge too because restarting the system caused the WAN to start working, and then internal interfaces to start blocking internal IPs. Either way, Suricata and the behind the scenes interaction with pfSense definitely do not like hardware offloading!
I'm glad you were able to make some sense of this. Thanks again for your time working with me previously. Happy New Year everyone!