Suricata process dying due to hyperscan problem
-
@tylerevers said in Suricata process dying due to hyperscan problem:
I'm not quite sure how this happened. I woke up today and was just looking at systems to discover that one specific Suricata Interface had deleted itself.
There is no mechanism within the package code for that to happen autonomously.
I would check first the pfSense system log on the impacted system to see what may be logged there, then look in the configuration backup history of pfSense to see if someone took an action. The GUI code logs an appropriate message for all configuration changes saved to the
config.xml
fiile. You can find the configuration entire history under DIAGNOSTICS > BACKUP AND RESTORE. -
Fingers crossed, but I think I found the bug.
I definitely found an errant double
free()
of memory when processing IPv4 addresses in a Pass List. When you execute a doublefree()
of memory you will get random crashes.Since I have been unable to reproduce the problem, I can't say for sure what I found will fix the Hyperscan issue, but I am hopeful based on the fact several of you have stated that turning off Legacy Blocking Mode (in other words, running in plain IDS mode) allows Suricata to run with no issue. The double
free()
was in the custom Legacy Blocking Module, and it was located in new code that was added with the first 7.0.0 Suricata update back when 23.09 Plus was still in development mode and 2.8 CE snapshots were active. Disabling Legacy Blocking Mode means this buggy portion of the module's code is not executed.I will create a pull request and get this fix posted for Netgate to review and merge. That will not happen until the first of the coming week.
Update: the fix is posted at https://github.com/pfsense/FreeBSD-ports/pull/1333 for review and merge by the Netgate developer team.
-
@bmeeks
just out of curiosity, what's the difference between "free" and "SCfree" ? i understand that free() it's to deallocates the memory but i can't find reference for scfree -
@kiokoman said in Suricata process dying due to hyperscan problem:
@bmeeks
just out of curiosity, what's the difference between "free" and "SCfree" ?Not a thing currently. The upstream Suricata developers just wrap some common C functions with their own names in case they might ever want to customize them for some reason. Today the two are exactly the same. Here are the
#define
preprocessor definitions currently in use by upstream:#define SCMalloc malloc #define SCCalloc calloc #define SCRealloc realloc #define SCFree free
I just fixed up my code to stay in sync. It was an overlooked typo thing that I noticed while scrutinizing the code for any possible bug.
Line 718 in the GitHub link I posted is where the errant double
free()
call happened. Notice the new revision deletes that line.Line 676 was a misplaced
continue
statement that could result in a memory leak because it bypasses thefree()
call to dump the IPv4 address structure created and passed to us elsewhere in the code. Notice thecontinue
statement was moved to be after thefree()
call.The other changes are just cosmetic.
-
@bmeeks
now i understand, thanks for the explanation -
@bmeeks said in Suricata process dying due to hyperscan problem:
@tylerevers said in Suricata process dying due to hyperscan problem:
I'm not quite sure how this happened. I woke up today and was just looking at systems to discover that one specific Suricata Interface had deleted itself.
There is no mechanism within the package code for that to happen autonomously.
I would check first the pfSense system log on the impacted system to see what may be logged there, then look in the configuration backup history of pfSense to see if someone took an action. The GUI code logs an appropriate message for all configuration changes saved to the
config.xml
fiile. You can find the configuration entire history under DIAGNOSTICS > BACKUP AND RESTORE.Thank you for your guidance. The configuration history indicated that my specific user made the change. My apologies for the red herring.
-
My pull request containing the anticipated fix for this Hyperscan error has been merged. An updated Suricata package has built and should appear as an available update for 2.7.2 CE and 23.09.1 Plus users.
Look for an update to version 7.0.2_2 for the Suricata package. When installed, the new package should pull in version 7.0.2_5 of the Suricata binary.
Fingers crossed this fixes the Hyperscan issue. But as I mentioned previously, since I could never reproduce the error in my small test environment, I can't say with 100% certainty the bug I found and fixed is the actual Hyperscan culprit.
-
@bmeeks said in Suricata process dying due to hyperscan problem:
My pull request containing the anticipated fix for this Hyperscan error has been merged. An updated Suricata package has built and should appear as an available update for 2.7.2 CE and 23.09.1 Plus users.
Look for an update to version 7.0.2_2 for the Suricata package. When installed, the new package should pull in version 7.0.2_5 of the Suricata binary.
For 23.09.1 I can confirm that it is available.
After the update I can see these packagespfSense-pkg-suricata-7.0.2_2 pfSense package suricata suricata-7.0.2_5 High Performance Network IDS, IPS and Security Monitoring engine
Thank you
-
@bmeeks
testedand .....
not working..
[340341 - RX#01-vmx2] 2023-12-11 22:42:50 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used
[340346 - W#05] 2023-12-11 22:42:53 Error: spm-hs: Hyperscan returned fatal error -1.
[340347 - W#06] 2023-12-11 22:42:53 Error: spm-hs: Hyperscan returned fatal error -1. -
@kiokoman said in Suricata process dying due to hyperscan problem:
@bmeeks
testedand .....
not working..
[340341 - RX#01-vmx2] 2023-12-11 22:42:50 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used
[340346 - W#05] 2023-12-11 22:42:53 Error: spm-hs: Hyperscan returned fatal error -1.
[340347 - W#06] 2023-12-11 22:42:53 Error: spm-hs: Hyperscan returned fatal error -1.Well, crap! I had high hopes.
Does it still work if you disable blocking mode?
-
@bmeeks
yes, it's running on an interface i have without blocking mode -
@kiokoman said in Suricata process dying due to hyperscan problem:
@bmeeks
yes, it's running on an interface i have without blocking modePlease share the output of this command run from a shell prompt:
pkg info | grep suricata
Let's make sure you have the latest binary. It should show suricata-7.0.2_5.
-
Shell Output - pkg info | grep suricata
pfSense-pkg-suricata-7.0.2_2 pfSense package suricata
suricata-7.0.2_5 High Performance Network IDS, IPS and Security Monitoring enginethis is pfsense 2.7.2
the strange part is that i have another pfsense 23.09.1-RELEASE that it's running on vmware with the same packages and more vlans but it have no trouble ....
the only difference is that 23.09.1 is running with 4 cpu and 2.7.2 with 8 cpu
tomorrow i can try to lower the cpu and see if there is any difference -
Still broken here also..... :-(
[177766 - RX#01-ix0] 2023-12-12 06:48:02 Info: checksum: More than 1/10th of packets have an invalid checksum, assuming checksum offloading is used (193/1000)
[177768 - W#02] 2023-12-12 07:17:29 Error: spm-hs: Hyperscan returned fatal error -1. -
After upgrade, the problem with the hyperscan error still occurs for me as well.
-
Well, sorry but I'm fresh out of ideas at this point. I have no clue what it could be. That fix was my last best hope.
Since I cannot reproduce the problem, it makes it practically impossible to troubleshoot and debug.
-
@bmeeks I think I mentioned it somewhere else, too, but the problem only occurs for me (Netgate 2100) when I have DNSBL (pfBlocker) running also. Have you tried that combo to reproduce? (Or at least occurs much faster - I've only run it for a few days in a row with DNSBL disabled, but when both are running, Suricata dies within 5-15 minutes.)
-
I should have also mentioned I'm running a Netgate 8200,..
-
I also have pfBlockerBG active. I will uninstall it to see if the problem still occurs.
@bmeeks: On one of my pfsense servers the problem manifests itself relatively quickly - a few hours.
I would be happy if I could somehow help to identify the problem. -
Not sure why, but I haven't had a single segfault kill my suricata interfaces running auto on my netgate 7100 for the last few hours, after the upgrade. It would have normally long since had the hyperscan error and core dumped the process. I'm keeping an eye on it, but so far so good.