Suricata process dying due to hyperscan problem
-
@btspce interesting…not seeing that on our 6100 pair. only a boot not a stop/start? (Not sure I restarted again after Suricata install)
-
@SteveITS The webgui hangs after install and during the start of suricata or after boot of the 6100 when suricata starts. CPU goes to 80-100% and after about 10 seconds webgui hangs. No more updates. Killing the suricata process and restart php-fpm restores access to webgui where I can see that both primary and secondary firewall is now master for the same interface and no traffic passes.
No issues when running 23.05.1
-
i never mentioned it here but i'm running on a CARP environment, are we all running on carp or there is people affected without carp?
-
@kiokoman
Not running CARP here. -
Same problems after upgrading to 23.09.1
don't run carp -
I'm encountering issues with starting Suricata on one of my PFsense installations, and I'm seeking assistance and sharing my observations with the community.
I currently have two similar setups with High Availability (HA) and CARP configurations, both running as virtual machines in Proxmox. However, one of them is working perfectly on PFsense 2.7.2, while the other encountered issues after updating to the same version.
Working System (referred to as "System A"):
- 8 Cores (HOST) without NUMA.
- 16GB of RAM.
- Virtio trunk interface with 8 queues.
- Two PF VMs running on two nodes within a cluster.
- Proxmox hardware with 2 x Xeon(R) Gold 6230 CPUs and 2 dual-port 100GB Mellanox cards.
Non-working System (referred to as "System B"):
- 6 Cores (HOST) with NUMA.
- 12GB of RAM.
- Dual-port Mellanox 25Gb/s NICs passed through to the PF VM and configured as LAGG.
- Two PF VMs running on two nodes within a cluster.
- Proxmox hardware with 2 x Intel(R) Xeon(R) CPU E5-2620.
On November 14th, I upgraded Suricata from version 6.x to 7.0.2 on one of the nodes in System B, while still running PFsense 2.7.0. This upgrade caused the GUI to crash, and I had to restart it from the console. I encountered a PHP error in the log, which I unfortunately don't have access to anymore. After this incident, I was unable to start Suricata on the interface.
Due to time constraints, I reverted to a snapshot to get it running again. During this period, I also tested Suricata installations on other systems for comparison, both with and without HA. Surprisingly, I couldn't reproduce the problem on any of these test systems, all running PFsense 2.7.2. This led me to believe that the issue might have been fixed.
Subsequently, I upgraded System A to PFsense 2.7.2 without any issues with Suricata or other components. However, when I upgraded the backup node of System B, the upgrade completed successfully, but Suricata failed to start once again.
Upon examining the Suricata log file, I found no errors; the last line indicated that it had started. However, the system.log displayed the following entries:
- "php-fpm[27588]: Starting Suricata on WAN_GC(lagg0.10) per user request..."
- "php[13124]: [Suricata] Updating rules configuration for: WAN_GC ..."
- "php[13124]: [Suricata] Enabling any flowbit-required rules for: WAN_GC..."
- "php[13124]: [Suricata] Building new sid-msg.map file for WAN_GC..."
- "php[13124]: [Suricata] Suricata START for v10(lagg0.10)..."
- "kernel: pid 16716 (suricata), jid 0, uid 0: exited on signal 11 (core dumped)"
It appears that Suricata crashes immediately upon startup.
Here are the steps I've taken and my findings:
- Uninstalling the Suricata package with the "retain config" option unchecked and reinstalling it did not resolve the issue.
- Disabling all rules allowed the interface to start.
- Enabling a single rule also allowed it to start, but enabling a different single rule had the same effect. Enabling both prevented Suricata from starting.
- Increasing values in "WAN - Flow" and "Stream" did not make a difference.
- Trying various settings in the system tunable found in forums had no effect.
- Disabling "Block offenders" allowed Suricata to start.
- Discovering the issue with hyperscan: Setting "Pattern Matcher Algorithm" to "AC" instead of "Auto" allowed Suricata to start and work with all rules active.
- Disabling NUMA in Proxmox did not help.
Both System A and B have very similar configurations, including mostly the same packages, with the main difference being the network interface cards. System B uses a dual 25G Mellanox card passed through, while System A uses a virtio NIC. However, I also attempted to start Suricata on a virtio card in System B, which was not in use, but it also refused to start. Additionally, System A has a newer CPU. System B employs LAGG with multiple VLANs, while System A has its LAGG managed by Proxmox and only sees a single virtio card with a VLAN trunk. System A has 29 VLAN interfaces, while System B has only 12.
Both systems were running Suricata PF pkg 7.0.2_2 and suricata-7.0.2_5, with all other packages up to date.
I have since reverted System B to a previous state, preventing me from conducting further tests. Nevertheless, I hope this information proves useful to the developers in diagnosing and resolving the issue.
Thank you.
-
upgraded Suricata from version 6.x to 7.0.2 on one of the nodes in System B, while still running PFsense 2.7.0. This upgrade caused the GUI to crash
It’s mostly expected to have problems if upgrading packages before pfSense versions…need to change your update branch to Previous first. Otherwise it may pull in later libraries etc.
-
Thanks. I don't remember exactly what happened back then but the newer version of Suricata was available in the package list, but no new PF-update was listed. I then believed that it belonged to my current PF version. I reverted back to a snapshot, and all was good after then.
The problem I have now, is not related to above, since this time PF was updated to 2.7.2, and Suricata was updated automatically along with it.
-
@safe yeah there’s a separate bug/issue that hides the upgrade, what fun.
https://docs.netgate.com/pfsense/en/latest/releases/2-7-1.html#troubleshooting -
@kiokoman, @masons , and any others wishing to try a debug Suricata version:
Guys, I was finally able to get my pfSense 2.7.2 CE package builder repaired and updated so that I can resume building test packages.
If you would like to try and help isolate and indentify this Hyperscan bug, read on to see how you can assist --
I have produced a debug-enabled version of Suricata 7.0.2. It was compiled for pfSense CE 2.7.2, because that's all I have a builder for. But I suspect it will also load and run okay on pfSense Plus 23.09.1.
You will need to install the debug-enabled package manually using the CLI at a shell prompt on the firewall. You can obtain the shell prompt either directly on the console locally or via a remote SSH connection. Obviously this should be done on a firewall where you are currently experiencing the Hyperscan crash.
This test involves replacing only the binary portion of the Suricata package. The GUI component (
pfSense-pkg-suricata-7.0.2_2
) will not be altered.WARNING: you should only try this on a test machine or one that you can quickly recover should something cause a major crash!!
Install the Debug-Enabled Suricata Binary Package
The debug-enabled Suricata binary package file is hosted on my Google Drive account here: https://drive.google.com/file/d/10lD0R907A1yQpn-aIewH8_GfPJiuVcIm/view?usp=sharing.
-
To begin, download the
suricata-7.0.2_6.pkg
file and transfer it to your firewall placing it in the/root
directory. IMPORTANT: make sure you transfer the file in binary (unaltered) form! So, if usingWinSCP
for the transfer from a Windows PC, choose "Binary" for the transfer type. -
Stop all running Suricata instances by executing this command from a shell prompt on the firewall:
/usr/local/etc/rc.d/suricata.sh stop
- Install the update debug version of the Suricata binary using the command below at a shell prompt on the firewall:
pkg-static install -f /root/suricata-7.0.2_6.pkg
4, If successful, restart Suricata by returning to the GUI and using the icons on the INTERFACES tab, or you can run the following command from the shell prompt:
/usr/local/etc/rc.d/suricata.sh start
If you experience a crash while running this debug build of Suricata, you can quickly grab some useful data by running this command on the core file from a shell prompt:
gdb /usr/local/bin/suricata /root/suricata.core
After it loads, type
bt
and ENTER to see a back trace. Post the displayed results back here. You can also runbt full
to produce a more detailed back trace. When finished, typeexit
to quit the debugger.To Restore Your Original Setup
-
From the GUI, remove the Suricata package. This will remove the GUI package but may not remove the updated debug-enabled binary. The next step insures the debug binary is also removed.
-
When the package deletion from the GUI completes, exit to a shell prompt and delete the debug version of the binary using this command:
pkg-static delete suricata-7.0.2_6
- Now return to the GUI and reinstall Suricata from the SYSTEM > PACKAGE MANAGER menu. This will reinstall the origina Suricata GUI and binary versions from the official pfSense repository.
-
-
@bmeeks
there is already a debug txt a couple of post before thisI can still try your version and let you know
-
[101255 - Suricata-Main] 2023-12-17 23:41:11 Notice: threads: Threads created -> RX: 1 W: 8 FM: 1 FR: 1 Engine started.
[368779 - RX#01-vmx2] 2023-12-17 23:41:11 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used
[368805 - W#04] 2023-12-17 23:41:37 Error: spm-hs: Hyperscan returned fatal error -1.
[368782 - W#01] 2023-12-17 23:41:37 Error: spm-hs: Hyperscan returned fatal error -1.crashed without generating /root/suricata.core
-
@kiokoman said in Suricata process dying due to hyperscan problem:
@bmeeks
there is already a debug txt a couple of post before thisI can still try your version and let you know
Sorry, I missed that earlier. It is helpful as it points to a problem somewhere in Hyperscan itself and not so much in the custom blocking module. Curious how things work when the custom blocking module is disabled, though .
Nothing in your back trace results seems to have any relationship to the Legacy Blocking Module.
-
@kiokoman said in Suricata process dying due to hyperscan problem:
[101255 - Suricata-Main] 2023-12-17 23:41:11 Notice: threads: Threads created -> RX: 1 W: 8 FM: 1 FR: 1 Engine started.
[368779 - RX#01-vmx2] 2023-12-17 23:41:11 Info: checksum: No packets with invalid checksum, assuming checksum offloading is NOT used
[368805 - W#04] 2023-12-17 23:41:37 Error: spm-hs: Hyperscan returned fatal error -1.
[368782 - W#01] 2023-12-17 23:41:37 Error: spm-hs: Hyperscan returned fatal error -1.crashed without generating /root/suricata.core
I was sort of afraid that might happen -- no core dump.
Refresh my memory on your NIC types (real or virtual) and which rule categories you are using. I want to try once more to duplicate the crash. I really should be able to since users are reporting it with a variety of NIC types (real and virtual), so I'm thinking the NIC is not important to the crash.
If you could post your
suricata.yaml
file for one of the crashing interfaces, that might help as well. I can import it directly into a virtual machine to see if I get the crash then. The file will be in/usr/local/etc/suricata/suricata_xxxx_yyyy/suricata.yaml
where the "xxxx" and "yyyy" parts are the physical NIC name and a random UUID. And even better would be also posting the actual active rules file. It will be in/usr/local/etc/suricata/suricata_xxxx_yyyy/rules/suricata.rules
.One other test you can try is changing the Run Mode for Suricata. The default is AutoFP. Try Workers and see if there is any change. Or if already using Workers, swap to AutoFP and test. This parameter is on the INTERFACE SETTINGS tab in the Performance section. Any change made requires Suricata to be restarted so it will see the change.
-
@bmeeks
i was thinking the same, i see no relationship to the Legacy Blocking Module, the only reference on gdb about alert-pf is on thread 2Thread 2 (LWP 179039 of process 2511 "IM#01"):
#0 0x00000008029807ea in _read () from /lib/libc.so.7
#1 0x00000008021f4a13 in ?? () from /lib/libthr.so.3
#2 0x0000000000d0198d in AlertPfMonitorIfaceChanges (args=0x803394ef0) at alert-pf.c:1058but the one throwing the error is Thread 8
Thread 8 (LWP 187487 of process 2511 "W#05"):
#0 0x00000008029a4454 in exit () from /lib/libc.so.7
#1 0x0000000000e9bbb9 in HSScan (ctx=<optimized out>, thread_ctx=<optimized out>, haystack=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.icrc.trendmicro.com:443\r\n\r\n", haystack_len=<optimized out>) at util-spm-hs.c:156
#2 0x0000000000c8319e in AppLayerProtoDetectPMMatchSignature (s=0x80322d4e0, tctx=0x832d22080, f=0x806648a80, buf=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.icrc.trendmicro.com:443\r\n\r\n", buflen=95, flags=<optimized out>, searchlen=<optimized out>, rflow=<optimized out>) at app-layer-detect-proto.c:215
#3 PMGetProtoInspect (tctx=0x832d22080, pm_ctx=0x1f12c80 <alpd_ctx>, mpm_tctx=<optimized out>, f=0x806648a80, buf=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.icrc.trendmicro.com:443\r\n\r\n", buflen=buflen@entry=95, flags=5 '\005', pm_results=0x7fffdf3f7a00, rflow=0x7fffdf3f7b0f) at app-layer-detect-proto.c:296
#4 0x0000000000c795c8 in AppLayerProtoDetectPMGetProto (tctx=<optimized out>, f=f@entry=0x806648a80, buf=<optimized out>, buflen=buflen@entry=95, flags=flags@entry=5 '\005', pm_results=pm_results@entry=0x7fffdf3f7a00, rflow=0x7fffdf3f7b0f) at app-layer-detect-proto.c:344
#5 0x0000000000c78731 in AppLayerProtoDetectGetProto (tctx=<optimized out>, f=f@entry=0x806648a80, buf=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.ic--Type <RET> for more, q to quit, c to continue without paging--
rc.trendmicro.com:443\r\n\r\n", buflen=95, ipproto=ipproto@entry=6 '\006', flags=flags@entry=5 '\005', reverse_flow=0x7fffdf3f7b0f) at app-layer-detect-proto.c:1433
#6 0x0000000000c69296 in TCPProtoDetect (tv=tv@entry=0x80d8e0600, ra_ctx=ra_ctx@entry=0x832a00020, app_tctx=app_tctx@entry=0x832d21100, p=p@entry=0x838c33200, f=f@entry=0x806648a80, ssn=ssn@entry=0x8338d5d80, stream=0x7fffdf3f7c68, data=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.icrc.trendmicro.com:443\r\n\r\n", data_len=95, flags=5 '\005', dir=UPDATE_DIR_OPPOSING) at app-layer.c:371
#7 0x0000000000c68c6d in AppLayerHandleTCPData (tv=tv@entry=0x80d8e0600, ra_ctx=ra_ctx@entry=0x832a00020, p=p@entry=0x838c33200, f=0x806648a80, ssn=ssn@entry=0x8338d5d80, stream=stream@entry=0x7fffdf3f7c68, data=0x8338f7800 "CONNECT wfbssvc65.icrc.trendmicro.com:443 HTTP/1.1\r\nHost: wfbssvc65.icrc.trendmicro.com:443\r\n\r\n", data_len=95, flags=5 '\005', dir=UPDATE_DIR_OPPOSING) at app-layer.c:709
#8 0x0000000000b62905 in ReassembleUpdateAppLayer (tv=0x80d8e0600, ra_ctx=0x832a00020, ssn=0x8338d5d80, stream=0x7fffdf3f7c68, p=0x838c33200, dir=UPDATE_DIR_OPPOSING) at stream-tcp-reassemble.c:1328
#9 StreamTcpReassembleAppLayer (tv=tv@entry=0x80d8e0600, ra_ctx=ra_ctx@entry=0x832a00020, ssn=ssn@entry=0x8338d5d80, stream=stream@entry=0x8338d5e20, p=p@entry=0x838c33200, dir=dir@entry=UPDATE_DIR_OPPOSING) at stream-tcp-reassemble.c:1391
#10 0x0000000000b64879 in StreamTcpReassembleHandleSegmentUpdateACK (tv=0x80d8e0600, ra_ctx=0x832a00020, ssn=0x8338d5d80, stream=0x8338d5e20, p=0x838c33200) at stream-tcp-reassemble.c:1949
#11 StreamTcpReassembleHandleSegment (tv=0x80d8e0600, ra_ctx=0x832a00020, ssn=0x8338d5d80, stream=0x8338d5d90, p=0x838c33200) at stream-tcp-reassemble.c:1997
#12 0x0000000000b9c789 in HandleEstablishedPacketToClient (tv=0x82e14bc4, tv@entry=0x80d8e0600, ssn=0x0, ssn@entry=0x8338d5d80, p=0x0, p@entry=0x838c33200, stt=0xe50a5969d84bc43d, stt@entry=0x832d60000) at stream-tcp.c:2811
#13 0x0000000000b7aa4d in StreamTcpPacketStateEstablished (tv=0x80d8e0600, p=0x838c33200, stt=0x832d60000, ssn=0x8338d5d80) at stream-tcp.c:3223
#14 StreamTcpStateDispatch (tv=tv@entry=0x80d8e0600, p=p@entry=0x838c33200, stt=stt@entry=0x832d60000, ssn=ssn@entry=0x8338d5d80, state=<optimized out>) at stream-tcp.c:5236
#15 0x0000000000b766c0 in StreamTcpPacket (tv=tv@entry=0x80d8e0600, p=p@entry=0x838c33200, stt=stt@entry=0x832d60000, pq=<optimized out>) at stream-tcp.c:5433
#16 0x0000000000b82781 in StreamTcp (tv=tv@entry=0x80d8e0600, p=p@entry=0x838c33200, data=0x832d60000, pq=pq@entry=0x832d18030) at stream-tcp.c:5745
#17 0x0000000000d53774 in FlowWorkerStreamTCPUpdate (tv=0x1, tv@entry=0x80d8e0600, fw=fw@entry=0x832d18000, p=p@entry=0x838c33200, detect_thread=detect_thread@entry=0x8338d7000, timeout=false) at flow-worker.c:391
#18 0x0000000000d52f4a in FlowWorker (tv=0x80d8e0600, p=0x838c33200, data=0x832d18000) at flow-worker.c:607
#19 0x0000000000e33b07 in TmThreadsSlotVarRun (tv=0x80d8e0600, p=0x838c33200, slot=0x8066db440) at tm-threads.c:135
#20 TmThreadsSlotVar (td=0x80d8e0600) at tm-threads.c:471
#21 0x00000008021e8d25 in ?? () from /lib/libthr.so.3
#22 0x0000000000000000 in ?? ()anyway give me a sec i give you the content of the yaml
-
-
@kiokoman said in Suricata process dying due to hyperscan problem:
Thanks! If you can post the
suricata.rules
file from the interface, that would be useful, too.I would really love to be able to reproduce the crash, but if not I can compile a 4.2.0 Hyperscan library that you can try just for giggles.
The error returned by Hyperscan is actually quite specific. Here are the associated comments:
hs_error_t err = hs_scan(sctx->db, (const char *)haystack, haystack_len, 0, scratch, MatchEvent, &match_offset); if (err != HS_SUCCESS && err != HS_SCAN_TERMINATED) { /* An error value (other than HS_SCAN_TERMINATED) from hs_scan() * indicates that it was passed an invalid database or scratch region, * which is not something we can recover from at scan time. */ SCLogError("Hyperscan returned fatal error %d.", err); exit(EXIT_FAILURE); }
This indicates to me that Suricata is passing Hyperscan either an invalid database or an invalid scratch memory area. What is strange, though, is that no other users on Linux are reporting this kind of issue. At least I have not found such a report.
-
@bmeeks
https://drive.google.com/drive/folders/1-ag4lFYM0I15IlHX3kxHoNNPV5LPX6QR?usp=sharing -
@kiokoman said in Suricata process dying due to hyperscan problem:
@bmeeks
https://drive.google.com/drive/folders/1-ag4lFYM0I15IlHX3kxHoNNPV5LPX6QR?usp=sharingGot them! Thanks!
-
@kiokoman:
Imported yoursuricata.yaml
configuration andsuricata.rules
file into my virtual machine. Only edited the interface names to reflectem0
which is what I use in my virtual machine at the moment.Suricata starts up and runs. No error yet. Will let it run for a while to see if a crash occurs. I suspect my little VM is not seeing the same amount of packets (traffic) as your machine, though.