Help with troubleshooting Suricata failure



  • Suricata is running in Inline mode on our systems now after some hardware changes.  We only have one lingering issue that I hope people can give us some ideas how to solve.

    Peak traffic times for the campus is between 10PM and 1AM, pretty much every night.  Sometimes, between 10:30 and 11:00, all traffic stops flowing through the firewall.  The status screen shows the following

    • 100% CPU utilization (normal max is ~30%),
    • the WAN interface (the only interface where we run Suricata in Inline blocking mode) goes from green, to yellow (latency/packet loss), then to red (down),
    • traffic will rise from the normal peak of 950Mb-1.2Gb/s to 1.5Gb to 2Gb/s,
    • load averages will rise into the mid-teens (the box has 40 cores, so this shouldn't be alarming)

    If I turn off Suricata on the WAN at this point, everything goes back to working properly (aside from traffic not being inspected).

    My question, with all of that background info, is this - how can I tell what is causing the Suricata instance on the WAN interface to freak out?  I've looked in all the logs I can think of, Suricata never says it is out of memory (the box has 32GB of memory, we use about 30% of that, even during the times when Suricata freaks out), I can't see a particular traffic pattern or alert that is consistently in the logs when this happens.

    Does anyone have any ideas on how to figure out the culprit?  I'm at a loss…

    Thanks in advance



  • I assume this only happens with inline mode?  Have you experimented with Legacy Blocking mode enabled just to verify it is associated only with inline IPS mode?

    I ask because inline uses the newer netmap technology in FreeBSD.  There are also some NIC driver tweaks having to do with buffers that have fixed some throughput issues for other users unrelated to Suricata.  I think it effects mostly certain Intel NICs.  The symptoms are similar to yours if I recall correctly.

    Stopping Suricata on the interface will sort of "tickle it" as the netmap pipe is torn down and the conventional NIC driver-to-kernel network stack pathway is established.  That might be enough of a tickle to reset these buffer issues if they are the culprit.  Just a guess at this point, though.  It might still be a problem with netmap or Suricata and netmap at the higher throughputs.

    One last place to check for ideas is over on the Suricata Redmine site.  Other folks may have experienced a similar problem.  Here is a link:  https://redmine.openinfosecfoundation.org/projects/suricata.

    Bill



  • Thanks, I will check that out.  I haven't tried switching back to Legacy mode, as we ran in Legacy mode all last fall with few issues (aside from having to build extensive pass lists) because our NICs were not fully supported in Inline mode.  We didn't experience anything like this.  As the new ones are still Intel NICs, I am definitely going to look into the buffers issue.

    I'll report back what I find.



  • I realize this is an old topic - however, maybe someone out there has crossed this bridge and can shed some light on the issue.

    I am running into the same problem as the OP.  Suricata (Inline - Intel i211) effectively shuts down the WAN interface and runs the CPU up to 100%.  Nothing in the logs indicates a problem, suricata log just goes silent.  A stop / start of Suricata and all is well again.

    The few times I've encountered this it did not seem to happen during times of high load on the interfaces.

    OP - Did you have any luck in adjusting buffers?