Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Suricata blocking IPs on passlist, legacy mode blocking both

    IDS/IPS
    7
    99
    19.9k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      sgnoc @bmeeks
      last edited by sgnoc

      @bmeeks That's great, thanks for the help. Hopefully there are only a limited number of people with this problem and not a larger portion of the Suricata community. Thanks for your maintaining this package.

      So does anyone else monitoring this topic have any advice on trying to track this issue down? I see zero evidence in any log anywhere outside of Suricata of flapping interfaces, so any advice on where/how to locate some would be welcome.

      I also see no noticeable issues with pfSense or other packages outside of IPs that are being blocked only on internal interfaces where Suricata is configured as legacy blocking mode. I'm trying to add an all inclusive 10.10.0.0/16 to a manually created Pass List, with all the default generated IPs as well, but am waiting for an alert to trigger to see if there is any difference. All the default pass lists have the subnet of these interfaces added, but are still blocking internal IP addresses. IE: an interface on the subnet 10.10.33.0/29 with that subnet on the default pass list is blocking IP 10.10.33.2.

      I have tried to enable pass list debugging for further details on troubleshooting, but every time I enable debugging, the interfaces start operating normally. While this does not make sense since to me, since debugging should only be printing additional lines to the log files, it has been tried 3 separate times with the same result. I would simply keep the debugging mode on, but don't know how to get that generated log rotated with the others to prevent my drive from filling up, otherwise that would be a solution to the problem, and each update of Suricata going forward would reset debugging back to no.

      Willing to try any additional troubleshooting options that anyone has ideas or advice on. Thanks.

      *** Edit *** Finally got an alert on the 10.10.33.0/29 interface and with the 10.10.0.0/16 added to a custom pass list, the internal 10.10.33.2 IP is still blocked, which matches with some previous poster's experience.

      S 1 Reply Last reply Reply Quote 0
      • S
        SteveITS Galactic Empire @sgnoc
        last edited by

        @sgnoc said in Suricata blocking IPs on passlist, legacy mode blocking both:

        every time I enable debugging, the interfaces start operating normally

        Some sort of threading or timing issue? As in, in code simulate the log write by adding a 10ms wait and see if it still happens? (totally brainstorming here)

        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
        Upvote 👍 helpful posts!

        S 1 Reply Last reply Reply Quote 0
        • S
          sgnoc @SteveITS
          last edited by

          @SteveITS Any idea if that is something I can do from my end either in gui or in a SSH session?

          I'm willing to give anything a shot. It doesn't take me too long to restore the whole system with a base image if I have a bad enough failure. I also found in another topic a few years back where disabling hardware checksum offloading fixed the same issue that was popping up then. It only worked for a few, but I'm giving it a shot (especially since I thought I had it already disabled). I just checked to disable it and restarted the system.

          Interesting that after the restart, I had zero logs in any of my interfaces where an interface IP was deleted, so nothing caused any interface flapping at all.

          S 1 Reply Last reply Reply Quote 0
          • S
            SteveITS Galactic Empire @sgnoc
            last edited by

            @sgnoc I’d expect it’d be in the code right where it’s writing the log file. Imagining:

            If (passlist log)
            write log
            Else
            If (passlist wait)
            sleep
            End

            Re checksum offload, I am pretty sure we’ve had that off per our setup doc from years ago because of false positives.

            Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
            When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
            Upvote 👍 helpful posts!

            S bmeeksB 2 Replies Last reply Reply Quote 1
            • S
              sgnoc @SteveITS
              last edited by

              @SteveITS @btspce I've had some luck with my config now. I have disabled the checksum offloading (System->Advanced->Networking->Check "Disable checksum offloading") and adding an all encompassing subnet (10.10.0.0/16 vs all of /29 and /24 10.10.x.0 subnets) to a custom pass list. Last night I had multiple blocks on the interface that has been blocking my web server, and the web server address was not blocked this round.

              I'm now putting the pass list back to default and letting it run today to see if it was a combination of the two changes or just the disable of checksum offloading. I thought I had already disabled it (I know I had at some point in the past), but can't remember if I enabled it for one reason or another sometime later, but it hasn't been modified in at least a year or two.

              I found the hint on disabling checksum offloading on a previous topic from about 2020 where it was helping some posters with almost an identical issue as what we are seeing here, so definitely an issue that has occurred before if it is the same, and they found it in an even older post from back then. Seems to be something that only affects a portion of pfSense/Suricata users and only periodically. This doesn't fix it for everyone, but it's worth a try if you're setting is enabled (I know you said you have had yours off for years @SteveITS).

              And while I was writing this, I had some additional alerts on my interface, after setting it back to the default pass list. It appears that just disabling hardware checksum offloading has done the trick for me. I'll continue watching the interfaces, but I think my issue may now be resolved.

              Seems to be a result of the pfSense and Suricata updates and having hardware checksum offloading enabled that caused the instability. Again, I haven't modified the offloading settings in at least a year or two, so the updates changed something on my end that caused the compatibility issue on netgate hardware (running XG-7100-1U).

              Hope this helps someone with similar circumstances! I'll post again if I notice anything more.

              B 1 Reply Last reply Reply Quote 0
              • B
                btspce @sgnoc
                last edited by

                @sgnoc Great find! We are also running with hardware checksum offloading on the Netgate 6100 pair as the only thing enabled.
                My understanding was that hardware checksum offloading only has to be disabled when running inline (and it has been working until now through all versions).
                This could perhaps be a nic driver issue in the newer version of pfsense. Will Disable Hardware Checksum Offloading later today and report back.

                1 Reply Last reply Reply Quote 0
                • bmeeksB
                  bmeeks
                  last edited by

                  Here is the pertinent information about the various hardware offloading options available for NICs and how each impacts Suricata:

                  https://docs.suricata.io/en/suricata-7.0.2/performance/packet-capture.html#offloading

                  It's best if all offload options for NIC hardware are set to "off". Legacy Blocking Mode in Suricata (and Snort) uses PCAP to for packet acquisition. Inline IPS Mode uses Netmap. Note in the link above that for both methods all hardware offloading options should be disabled.

                  S 1 Reply Last reply Reply Quote 1
                  • bmeeksB
                    bmeeks @SteveITS
                    last edited by

                    @SteveITS said in Suricata blocking IPs on passlist, legacy mode blocking both:

                    @sgnoc I’d expect it’d be in the code right where it’s writing the log file. Imagining:

                    If (passlist log)
                    write log
                    Else
                    If (passlist wait)
                    sleep
                    End

                    Re checksum offload, I am pretty sure we’ve had that off per our setup doc from years ago because of false positives.

                    The Pass List debugging code is a per-thread thing. So, each worker thread does its own pass list debugging (when enabled). The file I/O is handled by built-in Suricata logging routines (the same ones that write the block.log and alerts.log files).

                    The amount of "wait time" is variable in the real case. Choosing a fixed interval would arbitrarily slow down all the packet processing threads.

                    1 Reply Last reply Reply Quote 0
                    • S
                      SteveITS Galactic Empire @bmeeks
                      last edited by

                      @bmeeks said in Suricata blocking IPs on passlist, legacy mode blocking both:

                      best if all offload options for NIC hardware are set to "off"

                      Thinking out loud again, is it possible for Suricata/Snort to check that config setting and show a warning?

                      @bmeeks said in Suricata blocking IPs on passlist, legacy mode blocking both:

                      Choosing a fixed interval would arbitrarily slow down all the packet processing threads

                      Yes but I was just trying to think of a way to show what about "extra logging" could affect packet processing. It wasn't meant to be a fix.

                      Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                      When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                      Upvote 👍 helpful posts!

                      bmeeksB 1 Reply Last reply Reply Quote 0
                      • bmeeksB
                        bmeeks @SteveITS
                        last edited by

                        @SteveITS said in Suricata blocking IPs on passlist, legacy mode blocking both:

                        Thinking out loud again, is it possible for Suricata/Snort to check that config setting and show a warning?

                        Both packages already do that for Inline IPS Mode, so it could be extended to cover all modes. I will add that to my future TODO list.

                        1 Reply Last reply Reply Quote 1
                        • S
                          sgnoc
                          last edited by

                          Just wanted to update, still running strong on my noisier interface with alerts only blocking external IPs. I haven't had another interface alert yet to confirm they are functioning properly, but I am hopeful this was the trick to get everything back on track. I'm curious to see if it works for @btspce once tested. I'll confirm my end for sure once other interfaces alert and operate as expected.

                          A warning alerting the setting as a conflict would be a nice addition. I remember having read that as an issue in the past for inline, but honestly I thought I had it disabled as it was. I don't get into that setting screen on the GUI very often, so a warning would be a welcomed addition. Although I'm not likely to ever toggle it again at this point. It will stay disabled along with the other offloading settings.

                          B 1 Reply Last reply Reply Quote 0
                          • B
                            btspce @sgnoc
                            last edited by

                            @sgnoc @bmeeks Now testing with hardware checksum offloading off.
                            Disabled hardware checksum offloading on backup firewall and rebooted.
                            Enabled suricata on interfaces again from primary firewall.
                            Suricata started on backup firewall without any issues.
                            Disabled hardware checksum offloading on primary firewall and rebooted.
                            wan vip plus other vip and ip adresses on wan was instantly blocked by suricata on secondary firewall during failover due to reboot of primary firewall. (It blocked wan vip, primary fw wan ip among other things)

                            Result: Primary firewall came back as master on all interfaces and secondary firewall was master on wan and backup on the other interfaces. Removed wan vip blocks on secondary firewall and traffic started to flow. Carp status went back to master/backup as it should when the block was removed.

                            So random adresses still has a chance to be blocked during failover in this newer version of pfsense and suricata. @bmeeks explained the timing issue well and it seems to hit us with enough traffic on wan.
                            The solution will probably be to switch to inline when using carp/ha to avoid this as we have never seen this timing issue on the older versions. Last working version was pfsense 23.05.1 and suricata 6.

                            We had around 300Mbit of incoming traffic on wan during failover above.
                            No interface flapping in suricata.log and everything works for 1 hour now but failover will be an issue it seems if enough traffic is hammering the interface during failover.

                            bmeeksB 2 Replies Last reply Reply Quote 0
                            • bmeeksB
                              bmeeks @btspce
                              last edited by bmeeks

                              @btspce:
                              I believe there are CARP configuration changes with respect to timeouts that might help your issue. Essentially you would want to lengthen the time CARP allows the "leader" to appear in an offline state before switching roles.

                              I can certainly see how CARP changing who is primary and who is secondary would cause the Suricata "flapping issue". And the resulting IP deletions and additions result in the timing windows that allow blocks to happen when you don't want them.

                              Perhaps one future solution is "sleeping" the interface monitoring thread in Suricata for some period of time before it begins changing out IP addresses in the Radix Tree. But then you could create a window where something like a VPN interface is brought up by the kernel AFTER Suricata had started and manually scanned the interface IPs. The VPN IP might not be in the Radix Tree at that point (because it was not present at Suricata startup) and get blocked much like what is happening with your WAN IP now. You might simply fix one issue and simultaneously create another one for other users.

                              In short, there is no easy solution on the Suricata side. The better way to address this would be stretch out the CARP deadtime so that Suricata has a chance to get up and running on the interfaces BEFORE the CARP daemon decides the primary is down and switches to a secondary.

                              B 1 Reply Last reply Reply Quote 0
                              • bmeeksB
                                bmeeks @btspce
                                last edited by bmeeks

                                @btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:

                                Now testing with hardware checksum offloading off.
                                Disabled hardware checksum offloading on backup firewall and rebooted.

                                While @sgnoc says disabling checksum offloading worked for him, I don't see how it can actually impact what's happening. The checksum offloading results in Suricata (or anything monitoring on the kernel end of the network connection) seeing invalid packet checksums. It does not alter what IP addresses are or are not in the packets and how they would be found (or not found) in a Radix Tree search. I also don't see how it could cause an IP to be deleted from and then later added back to an interface.

                                I guess it is possible the checksum offloading is causing something funky to happen at the NIC hardware level. If that is the case, then the actual NIC driver might be cycling the interface down and back up, and something like that would cause IP addresses to be deleted and added back as the interface was cycled. But you would expect that behavior to be noted in the pfSense system log.

                                1 Reply Last reply Reply Quote 0
                                • B
                                  btspce @bmeeks
                                  last edited by

                                  @bmeeks I thought of that earlier and as a test raised the Base Advertising Frequency from 1 to 10 on wan only but it didn't help there. But that was with hardware checksum offloading enabled. So I will probably have to redo that test and maybe with an even higher base number to rule that out. But I think the next step will be to move this suricata instance from wan to an internal interface and switching to inline and see if it runs stable on these Netgate 6100.

                                  I'm not sure disabling hardware checksum offloading did anything in our case either.
                                  Perhaps the combination of running ET Pro rules (longer rule loading time), amount of traffic at the time of suricata starting and carp/ha makes this setup more likely to hit the issue.

                                  bmeeksB 1 Reply Last reply Reply Quote 0
                                  • bmeeksB
                                    bmeeks @btspce
                                    last edited by bmeeks

                                    @btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:

                                    @bmeeks I thought of that earlier and as a test raised the Base Advertising Frequency from 1 to 10 on wan only but it didn't help there. But that was with hardware checksum offloading enabled. So I will probably have to redo that test and maybe with an even higher base number to rule that out. But I think the next step will be to move this suricata instance from wan to an internal interface and switching to inline and see if it runs stable on these Netgate 6100.

                                    I'm not sure disabling hardware checksum offloading did anything in our case either.
                                    Perhaps the combination of running ET Pro rules (longer rule loading time), amount of traffic at the time of suricata starting and carp/ha makes this setup more likely to hit the issue.

                                    I agree that the presence of CARP/HA is likely the cause of this problem. As I mentioned before, it's not a configuration I've ever tested with Suricata (nor Snort, for that matter). And the more traffic flowing over the interface, the more likely it is that a packet will trigger an alert while one of the interface IPs has been deleted from the Radix Tree (and before it gets added back to the tree).

                                    So, do you not run HA on the internal interfaces? I would think that wherever CARP/HA is in place (WAN, LAN, or elsewhere) that the interface flapping would happen.

                                    While Inline IPS Mode will eliminate permanent blocks of an interface IP, it can still result in traffic interruptions if a DROP rule triggers. But those interruptions should not impact packets associated with the CARP protocol unless a rule false positives on the traffic.

                                    Also be aware that Inline IPS Mode is not available for all NIC types, but it should be available and work for the NICs in the SG-6100 box.

                                    B 1 Reply Last reply Reply Quote 0
                                    • B
                                      btspce @bmeeks
                                      last edited by

                                      @bmeeks We do run HA on internal interfaces aswell. Moving the suricata instance from wan to one of the internal interfaces is simply to limit the traffic it sees when switching to inline as the load will increase. But it's not perfect either because now we have to rearrange or bypass some of the internal traffic which do not need to be scanned by suricata to limit the throughput drop on that side. I will probably do the switch this weekend if possible and report back.

                                      We did use inline mode a few years ago on XG-7100 but it wasn't stable enough and legacy mode solved all issues at the time. But there has been a lot of development since then.

                                      bmeeksB 1 Reply Last reply Reply Quote 0
                                      • bmeeksB
                                        bmeeks @btspce
                                        last edited by

                                        @btspce said in Suricata blocking IPs on passlist, legacy mode blocking both:

                                        We did use inline mode a few years ago on XG-7100 but it wasn't stable enough and legacy mode solved all issues at the time. But there has been a lot of development since then.

                                        Yes, a lot of work has gone into the netmap device driver over the last couple of years, especially in regards to mutliple host rings support in Suricata.

                                        You will almost certainly want to change the Suricata Run Mode from AutoFP to workers on the INTERFACE SETTINGS tab in the Performance section. That will usually work much better with netmap on multi-core CPUs if you also have multi-queue NICs. But experiment with both modes. For a small handful of users AutoFP has performed better. Depends a lot on the particular NIC.

                                        B 1 Reply Last reply Reply Quote 0
                                        • B
                                          btspce @bmeeks
                                          last edited by

                                          @bmeeks I'm now up and running in inline mode on two internal interfaces and in workers mode. One of interfaces has vlans on it.
                                          Hardware Checksum offload disabled and flow control disabled for the relevant parent interfaces.
                                          Everything works so far except both firewalls becomes carp master for the vlan interfaces only. No alerts on the interfaces. Any idea on this issue ?

                                          B 1 Reply Last reply Reply Quote 0
                                          • B
                                            btspce @btspce
                                            last edited by

                                            The issue is that vlan hardware tagging has to be disabled on the nic for suricata to be able to pass the vlan tags in inline mode.

                                            In this case it was interface igc0 so I entered the below command in a shell on both firewalls and traffic and carp was instantly happy. Is there any way to set this as a system tunable ?

                                            ifconfig igc0 -vlanhwtag
                                            
                                            
                                            B bmeeksB 2 Replies Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.