Suricata process dying due to hyperscan problem
-
I also have pfBlockerBG active. I will uninstall it to see if the problem still occurs.
@bmeeks: On one of my pfsense servers the problem manifests itself relatively quickly - a few hours.
I would be happy if I could somehow help to identify the problem. -
Not sure why, but I haven't had a single segfault kill my suricata interfaces running auto on my netgate 7100 for the last few hours, after the upgrade. It would have normally long since had the hyperscan error and core dumped the process. I'm keeping an eye on it, but so far so good.
-
Same here with pattern match auto or blocking enabled, suricata still won't start.
pkg info | grep suricata pfSense-pkg-suricata-7.0.2_2 pfSense package suricata suricata-7.0.2_5 High Performance Network IDS, IPS and Security Monitoring engine
-
Output from my 8200
pkg info | grep suricata
pfSense-pkg-suricata-7.0.2_2 pfSense package suricata
suricata-7.0.2_5 High Performance Network IDS, IPS and Security Monitoring engine -
I uninstalled pfBlockerNG, but the error persists on the wan interface.
(It is a hyper-v machine with Pfsense 23.09.1-RELEASE with suricata-7.0.2_5) -
@Bismarck said in Suricata process dying due to hyperscan problem:
Same here with pattern match auto or blocking enabled, suricata still won't start.
What error do you get on startup if you leave the Pattern Matcher set to AC-KS but enable only blocking? In other words, enable blocking but do not change the Pattern Matcher setting.
Show the output of the
suricata.log
file for the impacted interface (located under the LOGS VIEW tab) and also post anything relevant during the same time from the pfSense system log (under STATUS > SYSTEM LOGS). -
@bmeeks no error, "pattern match auto" or "blocking enabled" kills suricata. AC-KS or "disabling blocking" is fine.
-
@Bismarck said in Suricata process dying due to hyperscan problem:
@bmeeks no error, "pattern match auto" or "blocking enabled" kills suricata. AC-KS or "disabling blocking" is fine.
Help me understand this better...
I've gotten a lot of seemingly conflicting information from the posters in this long thread. My confusion is leading to frustration .
-
So if you leave Pattern Matching at AC-KS and enable blocking, Suricata starts and runs with no issue. Is that correct?
-
The only way to make it crash is to set Pattern Matcher to Auto with blocking enabled? Is that correct?
-
And just to close the loop- if you set Pattern Matcher to Auto but disable blocking Suricata starts without error. Is that correct?
For the crash configurations, what is the output of the
suricata.log
file? Is it always the "Hyperscan returned fatal error -1" message? -
-
- yes
- yes
- yes
suricata.log stops at this line
[118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect: error parsing signature "alert tcp any any -> any $HTTP_PORTS (msg:"SERVER-WEBAPP Microsoft SharePoint OAuth authentication bypass attempt"; flow:to_server,established; content:"access_token="; fast_pattern; nocase; http_client_body; base64_decode:bytes 100,relative; base64_data; content:"|22|alg|22|"; nocase; content:"|22|none|22|"; within:50; nocase; content:"/_api/"; nocase; http_uri; pcre:"/\x2f_api\x2f(web\x2f|lists\x2f|Microsoft|SP\x2e|_vti_bin|_layouts|apps\x2f|search\x2f)/Ui"; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2023-29357; reference:url,portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2023-29357; classtype:attempted-admin; sid:62467; rev:1;)" from file /usr/local/etc/suricata/suricata_10547_igc2/rules/suricata.rules at line 33984 [118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect-parse: "http_uri" keyword seen with a sticky buffer still set. Reset sticky buffer with pkt_data before using the modifier. [118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect: error parsing signature "alert tcp any any -> any $HTTP_PORTS (msg:"SERVER-WEBAPP Microsoft SharePoint OAuth authentication bypass attempt"; flow:to_server,established; content:"access_token="; nocase; http_client_body; base64_decode:bytes 100,relative; base64_data; content:"|22|alg|22|"; nocase; content:"|22|none|22|"; within:50; nocase; content:"/_layouts/15/"; fast_pattern:only; http_uri; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2023-29357; reference:url,portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2023-29357; classtype:attempted-admin; sid:62465; rev:1;)" from file /usr/local/etc/suricata/suricata_10547_igc2/rules/suricata.rules at line 33986 [118101 - Suricata-Main] 2023-12-12 15:10:30 Info: detect: 2 rule files processed. 34422 rules successfully loaded, 61 rules failed [118101 - Suricata-Main] 2023-12-12 15:10:30 Warning: threshold-config: can't suppress sid 2210063, gid 1: unknown rule [118101 - Suricata-Main] 2023-12-12 15:10:30 Info: threshold-config: Threshold config parsed: 1 rule(s) found
system log gives:
kernel pid 72109 (suricata), jid 0, uid 0: exited on signal 11 (core dumped)
/edit for suricata.log
-
The best workaround for you guys is to select AC-CS (the normal default when Hyperscan is not present on a system) or one of the other AC-* pattern matcher settings. I have no idea what is wrong with Hyperscan at this point, and I have no reasonable hope of being able to fix it when I can't even reproduce it. It's like throwing darts at a target while blindfolded.
Hyperscan is eventually likely going to be removed from Suricata anyway. Intel seems to have begun a process of taking the library closed-source and making it proprietary. There is an open-source equivalent called VectorScan, but so far it runs only on non-Intel hardware because that's why it was created in the first place. Hyperscan was for Intel-only CPUs, so that left a number of other platforms (ARM, PowerPC, etc.) out. Thus some folks created the VectorScan project for those other CPUs. But that project was not originally created with Intel CPUs as its target, so it may take a bit for things to settle out. Here is a link about this: https://github.com/intel/hyperscan/issues/421. Follow the additional links in that GitHub discussion to learn more.
-
@Bismarck said in Suricata process dying due to hyperscan problem:
- yes
- yes
- yes
suricata.log stops at this line
[118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect: error parsing signature "alert tcp any any -> any $HTTP_PORTS (msg:"SERVER-WEBAPP Microsoft SharePoint OAuth authentication bypass attempt"; flow:to_server,established; content:"access_token="; fast_pattern; nocase; http_client_body; base64_decode:bytes 100,relative; base64_data; content:"|22|alg|22|"; nocase; content:"|22|none|22|"; within:50; nocase; content:"/_api/"; nocase; http_uri; pcre:"/\x2f_api\x2f(web\x2f|lists\x2f|Microsoft|SP\x2e|_vti_bin|_layouts|apps\x2f|search\x2f)/Ui"; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2023-29357; reference:url,portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2023-29357; classtype:attempted-admin; sid:62467; rev:1;)" from file /usr/local/etc/suricata/suricata_10547_igc2/rules/suricata.rules at line 33984 [118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect-parse: "http_uri" keyword seen with a sticky buffer still set. Reset sticky buffer with pkt_data before using the modifier. [118101 - Suricata-Main] 2023-12-12 15:10:30 Error: detect: error parsing signature "alert tcp any any -> any $HTTP_PORTS (msg:"SERVER-WEBAPP Microsoft SharePoint OAuth authentication bypass attempt"; flow:to_server,established; content:"access_token="; nocase; http_client_body; base64_decode:bytes 100,relative; base64_data; content:"|22|alg|22|"; nocase; content:"|22|none|22|"; within:50; nocase; content:"/_layouts/15/"; fast_pattern:only; http_uri; metadata:policy balanced-ips drop, policy max-detect-ips drop, policy security-ips drop, service http; reference:cve,2023-29357; reference:url,portal.msrc.microsoft.com/en-us/security-guidance/advisory/CVE-2023-29357; classtype:attempted-admin; sid:62465; rev:1;)" from file /usr/local/etc/suricata/suricata_10547_igc2/rules/suricata.rules at line 33986 [118101 - Suricata-Main] 2023-12-12 15:10:30 Info: detect: 2 rule files processed. 34422 rules successfully loaded, 61 rules failed [118101 - Suricata-Main] 2023-12-12 15:10:30 Warning: threshold-config: can't suppress sid 2210063, gid 1: unknown rule [118101 - Suricata-Main] 2023-12-12 15:10:30 Info: threshold-config: Threshold config parsed: 1 rule(s) found
system log gives:
kernel pid 72109 (suricata), jid 0, uid 0: exited on signal 11 (core dumped)
/edit for suricata.log
Thanks! Can you post the entire
suricata.log
file for an interface when it crashes? You can scrub or x-out any IP addresses if you need to. I want to examine the whole log to see what steps are completed.And to clarify, have you ever seen the "Hyperscan returned fatal error -1" log message?
-
@bmeeks said in Suricata process dying due to hyperscan problem:
there you go
suricata.log crash (Auto)
https://pastebin.com/GAQn5CCy
suricata.log nocrash (AC-KC)
https://pastebin.com/kRcWVNDJ
And to clarify, have you ever seen the "Hyperscan returned fatal error -1" log message?
For me no, did a log search and only came up with signal 11 core dumped messages.
-
@bmeeks said in Suricata process dying due to hyperscan problem:
My pull request containing the anticipated fix for this Hyperscan error has been merged. An updated Suricata package has built and should appear as an available update for 2.7.2 CE and 23.09.1 Plus users.
Look for an update to version 7.0.2_2 for the Suricata package. When installed, the new package should pull in version 7.0.2_5 of the Suricata binary.
Fingers crossed this fixes the Hyperscan issue. But as I mentioned previously, since I could never reproduce the error in my small test environment, I can't say with 100% certainty the bug I found and fixed is the actual Hyperscan culprit.
Nearly 20 hours since updating to 7.0.2_2 on 23.09.1 Plus with custom bare metal setup and no Hyperscan crash yet. Pattern Match set to AUTO and Blocking Mode ENABLED. Using all VLANs that traverse a LAGG in my case just as a reminder.
Thanks, Bill!
-
@tylerevers said in Suricata process dying due to hyperscan problem:
Nearly 20 hours since updating to 7.0.2_2 on 23.09.1 Plus with custom bare metal setup and no Hyperscan crash yet. Pattern Match set to AUTO and Blocking Mode ENABLED. Using all VLANs that traverse a LAGG in my case just as a reminder.
Thanks, Bill!
That sounds encouraging. There definitely was a problem in the custom blocking plugin's code, but perhaps there are two different issues happening in this thread.
Some users are seeing the Hyperscan error, but others do not see that error and instead are getting Signal 11 segfaults from Suricata.
-
-
@bmeeks I'm over 18 hours updated (7.0.2_2 on 23.09.1) and still running stable with blocking enabled and pattern matcher set to Auto. No issues and everything seems stable to this point.
Previously I was only able to make it maybe 20 minutes before I got the hyperscan error. I definitely think the update had fixed one of the potential causes for these errors.
Thanks for your support!
-
@bmeeks I agree there may be two different things going on. My problem is (and continues to be under 7.0.2_2) that the kernel kills Suricata with a "failed to reclaim memory" error. But that isn't limited to Hyperscan - it also happens using AC and AC-KS. Only AC-BS stays running for more than a few minutes. There's nothing in the Suricata log that catches my eye, since the process is killed by the kernel.
-
I spoke a little too soon. It appears the fix just patched one error killing the interfaces a little sooner. Another problem is still killing them, they just last a noticeably longer time frame before the error.
I had been getting my wan interface and one lan interface within about 20 minutes. Now my wan lasted about 19 to 20 hours, and the lan interface that was crashing is still running for now.
I'll work on getting you the full details from the core dump as soon as I can get back to my computer to see if any of it has changed.
[102796 - W#03] 2023-12-12 11:52:35 Error: spm-hs: Hyperscan returned fatal error -1.
-
Environment
- Running pfSense CE 2.7.2 with Suricata plugin 7.0.2_2 and Suricata package 7.0.2_5
- No changes to Suricata ASLR
- Pattern matcher = Auto
- Legacy blocking mode = Enabled
- Multiple VLANs on the LAN interface, but Suricata is only running on a single VLAN (interface is called PC)
Reproducing the issue (with logs)
When I start the Suricata service the WAN interface starts and continues to run without issue, but the PC interface dies immediately. I do not see the Hyperscan error in the Suricata logs. This is 100% reproducible with this VM.
WAN (works) Suricata log - https://pastebin.com/qRRa2P48
PC (crashes) Suricata log - https://pastebin.com/FNcRQnhUSystem log excerpt showing that the PC Suricata instance dumps core.
Dec 12 10:05:51 kernel pid 10455 (suricata), jid 0, uid 0: exited on signal 11 (core dumped) Dec 12 10:05:50 php 3903 [Suricata] Suricata START for PC(vtnet0.700)... Dec 12 10:05:50 php 3903 [Suricata] Building new sid-msg.map file for PC... Dec 12 10:05:50 php 3903 [Suricata] Enabling any flowbit-required rules for: PC... Dec 12 10:05:50 php 3903 [Suricata] Updating rules configuration for: PC ... Dec 12 10:05:49 php 3903 [Suricata] Building new sid-msg.map file for WAN... Dec 12 10:05:49 php 3903 [Suricata] Enabling any flowbit-required rules for: WAN... Dec 12 10:05:49 php 3903 [Suricata] Updating rules configuration for: WAN ... Dec 12 10:05:49 php-fpm 13080 Starting Suricata on PC(vtnet0.700) per user request...
Workaround
This workaround does not require changing the pattern-matcher or disabling the legacy blocking mode. It works consistently across multiple hosts.
- Stop the Suricata service
- Go to Diagnostics --> Command Prompt
- Execute
elfctl -e +noaslr /usr/local/bin/suricata
- Start the Suricata service
In my case both interfaces start and continue to run without further crashes.
If you compare the failing PC interface suricata.log file with the working suricata.log file you can see where the process dumps core
PC (crashes) Suricata log - https://pastebin.com/FNcRQnhU
PC (working) Suricata log -https://pastebin.com/AE469T7mThe crashing instance fails immediately after attempting to parse a rule that it doesn't like. The working instance still sees that error, but continues to run.
This system log excerpt shows that both interfaces start correctly
Dec 12 10:58:05 kernel vtnet0.700: promiscuous mode enabled Dec 12 10:58:05 kernel vtnet0: promiscuous mode enabled Dec 12 10:58:00 kernel vtnet1: promiscuous mode enabled Dec 12 10:57:36 SuricataStartup 66406 Suricata START for PC(23822_vtnet0.700)... Dec 12 10:57:35 SuricataStartup 65014 Suricata START for WAN(65037_vtnet1)... Dec 12 10:57:08 SuricataStartup 98203 Suricata STOP for PC(23822_vtnet0.700)...
Next steps
I'm going to try removing the failing rule and then try starting up Suricata without the ASLR mitigation. I'll report back what I find.
-
@masons said in Suricata process dying due to hyperscan problem:
Environment
- Running pfSense CE 2.7.2 with Suricata plugin 7.0.2_2 and Suricata package 7.0.2_5
- No changes to Suricata ASLR
- Pattern matcher = Auto
- Legacy blocking mode = Enabled
- Multiple VLANs on the LAN interface, but Suricata is only running on a single VLAN (interface is called PC)
Reproducing the issue (with logs)
When I start the Suricata service the WAN interface starts and continues to run without issue, but the PC interface dies immediately. I do not see the Hyperscan error in the Suricata logs. This is 100% reproducible with this VM.
WAN (works) Suricata log - https://pastebin.com/qRRa2P48
PC (crashes) Suricata log - https://pastebin.com/FNcRQnhUSystem log excerpt showing that the PC Suricata instance dumps core.
Dec 12 10:05:51 kernel pid 10455 (suricata), jid 0, uid 0: exited on signal 11 (core dumped) Dec 12 10:05:50 php 3903 [Suricata] Suricata START for PC(vtnet0.700)... Dec 12 10:05:50 php 3903 [Suricata] Building new sid-msg.map file for PC... Dec 12 10:05:50 php 3903 [Suricata] Enabling any flowbit-required rules for: PC... Dec 12 10:05:50 php 3903 [Suricata] Updating rules configuration for: PC ... Dec 12 10:05:49 php 3903 [Suricata] Building new sid-msg.map file for WAN... Dec 12 10:05:49 php 3903 [Suricata] Enabling any flowbit-required rules for: WAN... Dec 12 10:05:49 php 3903 [Suricata] Updating rules configuration for: WAN ... Dec 12 10:05:49 php-fpm 13080 Starting Suricata on PC(vtnet0.700) per user request...
Workaround
This workaround does not require changing the pattern-matcher or disabling the legacy blocking mode. It works consistently across multiple hosts.
- Stop the Suricata service
- Go to Diagnostics --> Command Prompt
- Execute
elfctl -e +noaslr /usr/local/bin/suricata
- Start the Suricata service
In my case both interfaces start and continue to run without further crashes.
If you compare the failing PC interface suricata.log file with the working suricata.log file you can see where the process dumps core
PC (crashes) Suricata log - https://pastebin.com/FNcRQnhU
PC (working) Suricata log -https://pastebin.com/AE469T7mThe crashing instance fails immediately after attempting to parse a rule that it doesn't like. The working instance still sees that error, but continues to run.
This system log excerpt shows that both interfaces start correctly
Dec 12 10:58:05 kernel vtnet0.700: promiscuous mode enabled Dec 12 10:58:05 kernel vtnet0: promiscuous mode enabled Dec 12 10:58:00 kernel vtnet1: promiscuous mode enabled Dec 12 10:57:36 SuricataStartup 66406 Suricata START for PC(23822_vtnet0.700)... Dec 12 10:57:35 SuricataStartup 65014 Suricata START for WAN(65037_vtnet1)... Dec 12 10:57:08 SuricataStartup 98203 Suricata STOP for PC(23822_vtnet0.700)...
Next steps
I'm going to try removing the failing rule and then try starting up Suricata without the ASLR mitigation. I'll report back what I find.
This is very intriguing data. Thank you for the research and posting the results. This sort of jives with my original hypothesis that ASLR may be involved here. One of the Netgate kernel developers did not think it was because the currently documented ASLR bug is in the address sanitizer piece of the
llmv
compiler and he said that was unlikely to be used outside of debug builds. The documentation for the sanitizer says it results in about a 2x slowdown in execution.I also now doubt the documented address sanitizer bug in
llvm
is the likely cause, but your testing seems to imply that ASLR is at fault in some manner with this bug. However, other users experiencing the bug have tried disabling ASLR (as you did) and did not see any change in behavior. -
I removed the offending rule (SID 26470), removed the ASLR change and restarted Suricata. The PC interface Suricata instance immediately dumps core with Signal 11 again.
Stopping the Suricata service, making the ASLR change and restarting Suricata, results in the PC interface Suricata instance coming up and staying up.
At least for me, across several VMs, this is very consistent behavior.