I have bought a new Netgate 3100. Installed suricata on it. Suricata is listening on both of my WANs (WAN1 50up, 30down; WAN2 rarely used, more for redundant connections and administrative purposes).
When I start suricata, i have first to delete a lock-file to make it run. After some time, suricata stops working without any message or anouncement. Any ideas how to fix this?
Here is the log:
bmeeks last edited by bmeeks
Look in the firewall's system log around the time you suspect Suricata crashed. Odds are you will find a Signal 10 Bus Error. This is the result of an unaligned access to memory. It is a problem in Snort, Suricata and clamAV as a minimum on ARM hardware such as the SG-3100. I've tried a fix for Snort and Suricata. The Suricata fix was successful for a time, but not anymore.
The issue is complicated to explain, but if you Google "unaligned memory access" you will find some info on what that is. This is only a problem on ARM hardware. Intel CPUs are immune as they will "auto-fixup" instructions that attempt unaligned access.
I looked into this in some detail and found that the main problem in FreeBSD is with the llvm compiler used to create armv6/armv7 binary code. That compiler chooses at times to use an ARM instruction pair that does not support unaligned access. The compiler could choose to use a different pair of instructions to accomplish the same thing, and that different pair will support unaligned access via "auto-fixup" just like Intel CPUs.
While it is technically true the issue lies within the C programming code of the binary packages that are failing on ARM hardware, the practical implication is that since the code works on Intel hardware without issue -- and because the ARM hardware base is very small -- the upstream projects for Snort, Suricata, clamAV and others have no great desire to invest the energy and effort required to find the bugs in the C code and fix them.
I worked with the pfSense team and we turned off compiler optimizations for these packages. It worked initially for Suricata, but now some change has crept into the upstream project and that change is causing unaligned access issues again on ARM hardware.
The problem will appear as a random bug because it takes a particular sequence of data and the resulting program code flow while processing that data to hit the instruction that attempts the unaligned memory access. When the unaligned access occurs, that triggers a CPU hardware interrupt fault and the OS shuts down the application (and logs the Signal 10 bus error).