Suricata process dying due to hyperscan problem
-
@bmeeks said in Suricata process dying due to hyperscan problem:
battery as good, but if power blips it drops the load
FWIW we see that a lot on older batteries, or I suppose defective ones. In our experience the UPS "self test" works to proactively alert the majority of the time but a decent amount the self test will trigger a power failure because the battery can't handle the load for the 2 seconds. :( And by "older" I mean over 4-5 years.
-
@SteveITS said in Suricata process dying due to hyperscan problem:
@bmeeks said in Suricata process dying due to hyperscan problem:
battery as good, but if power blips it drops the load
FWIW we see that a lot on older batteries, or I suppose defective ones. In our experience the UPS "self test" works to proactively alert the majority of the time but a decent amount the self test will trigger a power failure because the battery can't handle the load for the 2 seconds. :( And by "older" I mean over 4-5 years.
I suspect a defective battery at some level. It is a Tripp-Lite. My favorite is APC, and I think that's what I will go back with.
-
Again I want to mention that suricata works fine (on my system at least) in IPS mode with AC-BS Pattern Match instead the default (Hyperscan). This may help the developers to find the bug and the users to stay protected.
-
@chrysmon I am seeing the same thing in AC mode. It has yet to die since making the switch.
-
@ajohnson353 said in Suricata process dying due to hyperscan problem:
@chrysmon I am seeing the same thing in AC mode. It has yet to die since making the switch.
If I remember well, mine was not working in AC mode. Let it run for longer time to be sure.
-
Wonder if this might be the source of the mysterious Hyperscan bug we are seeing in Suricata?
https://www.freebsd.org/security/advisories/FreeBSD-EN-23:15.sanitizer.asc
If so, that would explain a lot of the weirdness. I will keep tabs on this. Thanks to @RobbieTT for the link in another thread unreleated to Suricata.
-
@bmeeks said in Suricata process dying due to hyperscan problem:
Wonder if this might be the source of the mysterious Hyperscan bug we are seeing in Suricata?
https://www.freebsd.org/security/advisories/FreeBSD-EN-23:15.sanitizer.asc
The two machines I posted about earlier, are both running with the default hyperscan enabled and with legacy blocking mode enabled. Both machines have not experienced a Suricata core dump since I disabled ASLR for the Suricata binary. Thus it seems increasingly plausible that the root of the issue is linked to ASLR and the link above about the LLVM sanitizer could certainly explain why this has suddenly happened.
-
@bmeeks
proccontrol -m aslr -s disable /usr/local/bin/suricata -i vmx2 -D -c /usr/local/etc/suricata/suricata_28559_vmx2/suricata.yaml --pidfile /var/run/suricata_vmx228559.pid[1022863 - RX#01-vmx2] 2023-12-03 20:36:32 Info: pcap: vmx2: snaplen set to 1518
[600532 - Suricata-Main] 2023-12-03 20:36:32 Notice: threads: Threads created -> RX: 1 W: 8 FM: 1 FR: 1 Engine started.
[1022863 - RX#01-vmx2] 2023-12-03 20:36:35 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used
[1022865 - W#02] 2023-12-03 20:36:41 Error: spm-hs: Hyperscan returned fatal error -1.
[1022866 - W#03] 2023-12-03 20:36:41 Error: spm-hs: Hyperscan returned fatal error -1.no luck for me
-
I tried to disable the ASLR on my system to test, but it caused the whole system to become unresponsive and I had to do a forced power cycle and revert back. Not sure what happened, since the logs only show suricata coming back online on the interfaces and then no logs until the reboot. I shut down the suricata processes before running "elfctl -e +noaslr /usr/local/bin/suricata" and then restarted the suricata interfaces. So no luck on my end testing with disabled ASLR.
-
@bmeeks I noticed this in another thread:
@jimp said in Major DNS Bug 23.01 with Quad9 on SSL:
While we are likely to include the patch from that EN in future builds it isn't relevant to Unbound.
They only use those sanitizers for debug/test builds and not for normal/production builds.
-
Hello fellow Netgate community members,
I recently learned that Snort and Suricata's maintainer does all this work you what you see here unpaid. I opened a ticket to have a Wikipedia type donate to maintainer button on all 3rd party packages. I personally want to send some money to maintainers. If you also feel the same way please respond to this Redmine.
https://redmine.pfsense.org/issues/15056
Happy Holidays.
-
Hello, I also have the same problem, I tried all the recommendations proposed on this topic, but without success.
However, I noticed that the problem only occurs on the network interfaces with higher traffic and not on the interfaces with low traffic.I have a Pfsense 23.09 installed on a Hyper-V machine, so no VLANS.
There are 6 network cards (without VLANS).Initially Suricata was active only on the WAN and it deactivates after an interval between 1 hour and 7-8 hours.
For testing I enabled Suricata on all interfaces. And I noticed that now it stops only on the WAN interface and the LAN interface on which the network traffic is higher. Suricata never stops on the interfaces where the traffic is very low. -
Hi all.
It seems I get the same bugger of an error and Suricata stopes running on (yes you guessed it) one interface.
I've read through most of the post here without finding a solution.
Is there a solution? Or?So what is next? A patch? Or an Update?
I'm on pFSense 2.7.1 and Suricata 7.0.2_1.
And yes, is there a way to turn off e Hyperscan system so that Suricata runs without it?
Paal B.
-
@tgm-its
use AC-BS instead of Auto
-
@tgm-its said in Suricata process dying due to hyperscan problem:
And yes, is there a way to turn off e Hyperscan system so that Suricata runs without it?
Currently no patch is available because the cause of the failure has yet to be determined. I cannot reproduce it in my limited test environment. At least I have not yet encountered it. It is a seemingly random sort of bug impacting some users and not others. Very difficult to troubleshoot a failure that you cannot reliably reproduce, but I'm still trying.
You can disable the use of the Hyperscan library by going to the INTERFACE SETTINGS page for an interface and scrolling down to the Pattern Matcher Algorithm parameter and choosing something other than
Auto
orHyperscan
. Save the change and restart Suricata on the interface so that the underlying binary sees the new parameter choice. -
@bmeeks
For me, the problem always manifests itself only on the network interfaces on which there is some traffic. The problem does not appear on network interfaces with low traffic.
The problem appears after an interval between 1-8 hours, probably the higher the traffic, the faster the problem appears. -
@paulp said in Suricata process dying due to hyperscan problem:
@bmeeks
For me, the problem always manifests itself only on the network interfaces on which there is some traffic. The problem does not appear on network interfaces with low traffic.
The problem appears after an interval between 1-8 hours, probably the higher the traffic, the faster the problem appears.Hmm...that does make some sense. I am testing with very low traffic in a limited VMware Workstation virtual machine setup. Perhaps hammering a test machine with
iperf3
traffic might trigger the bug ??? -
@kiokoman Thanks for the suggestion. I'll try that.
Paal B.
-
@bmeeks
idk if iperf3 can generate enough alert to trigger the bug maybe it's more like how many alert there are more than how mutch data is in the wire -
@bmeeks I was trying to use the AC-KS pattern matcher as an alternative to hyperscan, but they must be related in some manner. This is the log I receive after just a few minutes running AC-KS. Also, I'm getting the hyperscan errors on my WAN interface, which remains busy, but also on this interface that has little traffic and does not alert all that often, so it isn't just interfaces that are more active on my network.
Here is the last few lines in suricata.log for the failed interface running with AC-KS:
[186263 - RX#01-ix1.15] 2023-12-07 09:37:41 Info: pcap: ix1.15: running in 'auto' checksum mode. Detection of interface state will require 1000 packets [186263 - RX#01-ix1.15] 2023-12-07 09:37:41 Info: pcap: ix1.15: snaplen set to 1518 [100242 - Suricata-Main] 2023-12-07 09:37:41 Notice: threads: Threads created -> RX: 1 W: 4 FM: 1 FR: 1 Engine started. [186263 - RX#01-ix1.15] 2023-12-07 09:41:10 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used [186267 - W#04] 2023-12-07 09:46:00 Error: spm-hs: Hyperscan returned fatal error -1.
Here are the output from the core dump:
(gdb) bt #0 0x0000000830c6834a in thr_kill () from /lib/libc.so.7 #1 0x0000000830be8344 in raise () from /lib/libc.so.7 #2 0x0000000830c8ca39 in abort () from /lib/libc.so.7 #3 0x0000000830cdbf30 in ?? () from /lib/libc.so.7 #4 0x0000000830ca8440 in ?? () from /lib/libc.so.7 #5 0x0000000830ca142c in ?? () from /lib/libc.so.7 #6 0x0000000830ca1159 in ?? () from /lib/libc.so.7 #7 0x00000000005ac55a in ConfNodeFree () #8 0x00000000005ac536 in ConfNodeFree () #9 0x00000000005ac536 in ConfNodeFree () #10 0x00000000005ad305 in ConfDeInit () #11 0x000000000058fa61 in ?? () #12 0x000000000058f03f in SuricataMain () #13 0x0000000830bbf75a in __libc_start1 () from /lib/libc.so.7 #14 0x000000000058bd40 in _start ()
(gdb) bt full #0 0x0000000830c6834a in thr_kill () from /lib/libc.so.7 No symbol table info available. #1 0x0000000830be8344 in raise () from /lib/libc.so.7 No symbol table info available. #2 0x0000000830c8ca39 in abort () from /lib/libc.so.7 No symbol table info available. #3 0x0000000830cdbf30 in ?? () from /lib/libc.so.7 No symbol table info available. #4 0x0000000830ca8440 in ?? () from /lib/libc.so.7 No symbol table info available. #5 0x0000000830ca142c in ?? () from /lib/libc.so.7 No symbol table info available. #6 0x0000000830ca1159 in ?? () from /lib/libc.so.7 No symbol table info available. #7 0x00000000005ac55a in ConfNodeFree () No symbol table info available. #8 0x00000000005ac536 in ConfNodeFree () No symbol table info available. #9 0x00000000005ac536 in ConfNodeFree () No symbol table info available. #10 0x00000000005ad305 in ConfDeInit () No symbol table info available. #11 0x000000000058fa61 in ?? () No symbol table info available. #12 0x000000000058f03f in SuricataMain () No symbol table info available. #13 0x0000000830bbf75a in __libc_start1 () from /lib/libc.so.7 No symbol table info available. #14 0x000000000058bd40 in _start () No symbol table info available.
(gdb) info threads Id Target Id Frame * 1 LWP 102018 0x0000000830c6834a in thr_kill () from /lib/libc.so.7
(gdb) thread apply all bt Thread 1 (LWP 102018): #0 0x0000000830c6834a in thr_kill () from /lib/libc.so.7 #1 0x0000000830be8344 in raise () from /lib/libc.so.7 #2 0x0000000830c8ca39 in abort () from /lib/libc.so.7 #3 0x0000000830cdbf30 in ?? () from /lib/libc.so.7 #4 0x0000000830ca8440 in ?? () from /lib/libc.so.7 #5 0x0000000830ca142c in ?? () from /lib/libc.so.7 #6 0x0000000830ca1159 in ?? () from /lib/libc.so.7 #7 0x00000000005ac55a in ConfNodeFree () #8 0x00000000005ac536 in ConfNodeFree () #9 0x00000000005ac536 in ConfNodeFree () #10 0x00000000005ad305 in ConfDeInit () #11 0x000000000058fa61 in ?? () #12 0x000000000058f03f in SuricataMain () #13 0x0000000830bbf75a in __libc_start1 () from /lib/libc.so.7 #14 0x000000000058bd40 in _start ()
(gdb) thread apply all bt full Thread 1 (LWP 102018): #0 0x0000000830c6834a in thr_kill () from /lib/libc.so.7 No symbol table info available. #1 0x0000000830be8344 in raise () from /lib/libc.so.7 No symbol table info available. #2 0x0000000830c8ca39 in abort () from /lib/libc.so.7 No symbol table info available. #3 0x0000000830cdbf30 in ?? () from /lib/libc.so.7 No symbol table info available. #4 0x0000000830ca8440 in ?? () from /lib/libc.so.7 No symbol table info available. #5 0x0000000830ca142c in ?? () from /lib/libc.so.7 No symbol table info available. #6 0x0000000830ca1159 in ?? () from /lib/libc.so.7 No symbol table info available. #7 0x00000000005ac55a in ConfNodeFree () No symbol table info available. #8 0x00000000005ac536 in ConfNodeFree () No symbol table info available. #9 0x00000000005ac536 in ConfNodeFree () No symbol table info available. #10 0x00000000005ad305 in ConfDeInit () No symbol table info available. #11 0x000000000058fa61 in ?? () No symbol table info available. #12 0x000000000058f03f in SuricataMain () No symbol table info available. #13 0x0000000830bbf75a in __libc_start1 () from /lib/libc.so.7 No symbol table info available. #14 0x000000000058bd40 in _start () No symbol table info available.
I'll try and run this interface on auto (hyperscan) and catpure the core dump to post the debug details here. Again, this is on my smaller interface that has been failing with suricata. If that doesn't prove as beneficial, I'll work on getting my wan interface to produce the core dump with hyperscan to get you that debug info.