Suricata frequent crashes
-
Running a Netgate SG-3100 here.
After booting up, often times Suricata crashes after parsing all the rules.
pid 92487 (suricata), uid 0: exited on signal 10 (core dumped)I've been perplexed for a while. Restarting in the GUI results in another core dump. Only rebooting (sometimes) helps get Suricata restarted.
Where can I find logs to diagnose? What should I change to get this to work? I think everything is at defaults... Running mostly ETPro rules and not much else...
-
This is a known issue and is due to the way compiler optimization works in clang/llvm, the compiler suite used by FreeBSD for ARM hardware cross-compiles. Signal 10 is a BUS ERROR fault, or better known as an "unaligned memory access" fault.
The problem is complex to explain as it requires a fairly detailed understanding of CPU and memory hardware design and data alignment in low-level programming code. Do some research on unaligned memory access to get a little bit of an idea what's going on.
The SG-3100 Netgate appliance uses an armv7 CPU. The compiler for that CPU type, when doing optimizations of the code, chooses to use machine instruction opcodes that do not support unaligned memory access. So when a particular set of conditions within the Suricata executable code are all true, the code generated by the compiler will throw a SIGNAL 10 error and quit. It takes a number of things to be true all at once for the fault to occur, thus it can appear to be a random event.
One potential fix is for the clang/llvm compiler to be told NOT to use optimizations when compiling. In that mode, it emits machine opcodes that will support unaligned access because the CPU microcode will "auto fix-up" the access.
All Intel-based hardware will auto-fixup unaligned memory access issues, so things run fine on Intel hardware. The actual root cause of these Signal 10 errors is really within the C source code of programs such as Suricata. However, because the issue only surfaces when certain hardware is used (in this case, ARM hardware with the clang/llvm compiler), the problem is quite rare and thus the Suricata source code maintainers are not motivated to invest the time and energy to find and fix the areas in their code where unaligned access is happening.
-
What can I do to minimize the risk of these bus errors? Any memory settings? Rule sets to enable/disable?
-
@msf2000 said in Suricata frequent crashes:
What can I do to minimize the risk of these bus errors? Any memory settings? Rule sets to enable/disable?
No, nothing really unless you knew precisely which rule or rules were causing the problem. But to complicate things even more, it could be the data in a network packet that causes the issue. It will be pretty random.
I know it's not the answer you want, but the only way for now to really not have the problem is to run Suricata on hardware with an Intel-based CPU. That can be an Atom, i5, i7, etc.