Suricata v7.0.7_5 abruptly stops
-
@SteveITS There is nothing much i could come up with about the service that is causing this except that there is a cron update of pfBlockerNG happeing at the same time.
The cron update happens after these are disabled. I had removed the 'Service Watchdog' which eventually failed to restart the DNS resovler (unbound).
Next Event
Dec 15 04:00:01 php 55448 [pfBlockerNG] Starting cron process. Dec 15 02:00:01 php 37998 [pfBlockerNG] Starting cron process. Dec 15 00:01:36 kernel mvneta1: promiscuous mode disabled Dec 15 00:01:36 kernel pid 11897 (unbound), jid 0, uid 59, was killed: failed to reclaim memory Dec 15 00:01:36 kernel pid 23736 (suricata), jid 0, uid 0, was killed: failed to reclaim memory Dec 15 00:01:24 php 38071 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload Dec 15 00:00:33 php-cgi 39142 [Suricata] The Rules update has finished. Dec 15 00:00:33 php-cgi 39142 [Suricata] Suricata signalled with SIGUSR2 for LAN (mvneta1)...
Earlier Event
Dec 14 22:00:01 php 20743 [pfBlockerNG] Starting cron process. Dec 14 21:58:43 kernel mvneta1: promiscuous mode disabled Dec 14 21:58:43 kernel pid 5252 (php-fpm), jid 0, uid 0, was killed: failed to reclaim memory Dec 14 21:58:43 kernel pid 75593 (suricata), jid 0, uid 0, was killed: failed to reclaim memory Dec 14 21:57:50 php-fpm 5252 [Suricata] Suricata signalled with SIGUSR2 for LAN (mvneta1)...
Does this mean i should be better off with more RAM ?
-
@anishkgt which lists are you using in pfBlocker? Filling 4 GB implies a lot, or big ones. That may be a question for the pfBlocker forumโฆ
If youโre using the UT1 adult list for example that is gigantic. There may be better solutions like Cloudflare Family DNS.
-
@anishkgt said in Suricata v7.0.7_5 abruptly stops:
Dec 15 00:01:24 php 38071 [pfBlockerNG] No changes to Firewall rules, skipping Filter Reload``` Dec 15 00:00:33 php-cgi 39142 [Suricata] The Rules update has finished.
The order of these two log entries tells me that the Suricata rules update had just completed, then the pfBlockerNG update
cron
task kicked off about a minute later. So, at this point Suricata was happy and running again with its updated rules.Next, the two following log entries indicate to me that the pfBlockerNG
cron
task exhausted system RAM, so the OOM Reaper process kicked off and killed the two largest users of contiguous RAM -- Suricata andunbound
(the DNS Resolver). My suspicion is a large DNSBL list was being updated by the pfBlockerNG update job. That will involve the Python module ofunbound
causing that process to balloon in its memory footprint. Thus it would become a target of the OOM Reaper process as would Suricata because both would likely be the largest consumers of RAM at that point.Dec 15 00:01:36 kernel pid 11897 (unbound), jid 0, uid 59, was killed: failed to reclaim memory Dec 15 00:01:36 kernel pid 23736 (suricata), jid 0, uid 0, was killed: failed to reclaim memory
Like @SteveITS mentioned, your choices of pfBlockerNG lists can matter a lot. Some of the available choices are frankly just too large and other options such as using Cloudflare might be a better solution. Don't know the specifics of which lists you are using, but it's clear from the system logging that whatever you have chosen is "too much data" for the 4 GB of RAM in your firewall.
-
@bmeeks Turns out, on this model pfblockerNG and Suricata cannot run together. I had several DNSBL lists but with just two i can see the suricata is being killed when pfBlockerNG runs an update.
I will get a Protectli.
-
@anishkgt said in Suricata v7.0.7_5 abruptly stops:
on this model pfblockerNG and Suricata cannot run together
It totally depends on the pfBlocker and/or Suricata settings. As I said I've run them both together on a 2100. Check how big those lists are to download, and they will likely require some additional RAM to process while updating because that is done in PHP. PHP as I recall is limited to 512 MB by default on pfSense.
-
I don't think the OP is hitting a PHP memory limit. That results in a different type of "crash" with a logged message in the GUI and would never result in the OOM Reaper being activated.
The problem is I suspect with the Python module of
unbound
and it trying to process a large DNSBL file. That would be a process that interacts directly with the kernel for memory allocation and management. It would not use PHP. -
@bmeeks We're in agreement, I just meant, processing files with PHP takes up memory.
My point was, saying these two packages "cannot run together" on any 2100 [or presumably OP meant, any hardware with 4 GB RAM] is inaccurate by itself.
Ultimately it sounds like OP needs more RAM with the lists/settings being used.
-
@SteveITS I have the 2100 Max model which comes with 8GB RAM.
I don't see any pfBlockerNG update killing suricata here
Dec 16 16:13:31 kernel mvneta1: promiscuous mode disabled Dec 16 16:13:31 kernel pid 61266 (suricata), jid 0, uid 0, was killed: failed to reclaim memory Dec 16 16:12:21 php-fpm 89212 [Suricata] Suricata signalled with SIGUSR2 for LAN (mvneta1)... Dec 16 16:12:18 php-fpm 89212 [Suricata] Building new sid-msg.map file for LAN... Dec 16 16:12:17 php-fpm 89212 [Suricata] Enabling any flowbit-required rules for: LAN... Dec 16 16:12:10 php-fpm 89212 [Suricata] Updating rules configuration for: LAN ...
Moreover i have the update frequency of the DNSBLE groups as seen in the previous post to 'Every 2hrs' and 'Every 3 hrs'.
-
@anishkgt said in Suricata v7.0.7_5 abruptly stops:
2100 Max model which comes with 8GB RAM
4 GB: https://shop.netgate.com/products/2100-max-pfsense
I don't see any pfBlockerNG update killing suricata here
As noted above, "pid 61266 (suricata), jid 0, uid 0, was killed: failed to reclaim memory" means pfSense is desperately trying to free up RAM.
Ultimately we aren't going to be able to help much further without more info, such as how big the lists you've chosen are, and what exactly is using how much RAM at the time the process is killed (i.e. "top" output or Diagnostics > System Activity).
Overall, you're running out of RAM and when that happens processes start to crash. All we will be able to say is, "that process there is using a lot of memory."
If your settings need more than 4 GB to update then I suppose that is the answer for you.
-
@SteveITS Oh Yea my bad. 4GB it is.
As of now i have this on these running
-
@SteveITS said in Suricata v7.0.7_5 abruptly stops:
Ultimately we aren't going to be able to help much further without more info, such as how big the lists you've chosen are, and what exactly is using how much RAM at the time the process is killed (i.e. "top" output
Where can i find the "top" outout ?
-
@SteveITS
The SSH output -
@anishkgt OK but with 2.6 GB free there's not a problem now. You will need to look while Suricata is being killed.
-
@anishkgt:
Your problem is a dynamic one. That means the issue occurs when certain things align in terms of high memory utilization. It appears from circumstantial evidence that it happens when your pfBlockerNG automatedcron
task runs coincident with Suricata operating.Once the problem is triggered and the Out-of-Memory Reaper process in FreeBSD's kernel kills Suricata and/or
unbound
to free up RAM, the problem is gone until the next time the stars align in the software. So, you can runtop
or any other diagnostic tool all you want during the interim, but you will not see the problem unless you are actively running the tool and watching the screen output at the instant the OOM Reaper process activates and begins killing user processes to reclaim RAM. But even if you catch it I doubt you will learn much from it. The problem is simply an exhaustion of available RAM, and that triggers FreeBSD's kernel code to go into a self-protection mode where it will kill user processes in an attempt to free up RAM for use by the more critical kernel processes.Here is a thread from the FreeBSD Forums explaining memory management in FreeBSD: https://forums.freebsd.org/threads/understanding-memory-management.84695/. The thread in turn has some links to addtional information. Read through this thread and follow some of the links in it to gain a better understanding of how FreeBSD handles memory allocation, recovery, and protection. This should help you understand that your issue is a dynamic one and is triggered only when a certain set of conditions come together. The root cause is processing too much data, but to find out which thing exactly is responsible will require a lengthy process of elimination by disabling all DNSBL lists and then slowly adding them back one by one to see which breaks the firewall. Processes that use Python or Perl regex pattern matching can be real memory hogs -- especially if the search algorithm is poorly implemented.
What I see a lot of pfBlockerNG and DNSBL users misunderstand here on the forum is that something that works just fine for a few hundred items does not necessarily scale to 10s of thousands of items (or even more!). These users fail to grasp that a list with 10,000 or 50,000 or even 100,000 items is not going to process as efficiently as a list with only 100 or 200 items. And with regex pattern matching, the growth in memory is not necessarily linear (meaning twice the number of list lines does not necessarily equal twice the required RAM -- it could be 4, 8 or 16 times more RAM).
This quote from an earlier reply hints that you are using some lists that need a lot memory resources to handle:
I remember changing the 'Firewall Maximum States' from 338,000 to 500,000 and 'Firewall Maximum Table Entries' from default to 800000.
-
I have a 2100 and I use to have issues with memory on snort updates. I installed a swap partition on a dedicated external HDD drive that was designed for heavy use, and it fixed all my update resource issues. Long story short you have to free up memory or have a plan for when it is used the most. do not rely on swap all the time, but I admit I rely on it for ClamAV updates and snort updates should the happen at the same time, Murphys law when can go wrong will go wrong, some times my blacklist, snort and clamav all attempt to update at the same time it is very rare sometimes on reboots or package reinstalls but you got to plan for it. The 2100 should have an 8GBs ram option to function perfectly, again nothing is perfect so we got to roll with it. Do a flash drive and set it up as a swap.
โthe SSD manufacture had this to say about me using it like this...
"Hi Jonathan,This will damage the drive, it is not safe. Moreover, the response speed and read and write speed are far inferior to RAM. We recommend you not to use it this way, it will probably cause the SSD to become defective."
I really want to use something long term as I am limited on what I can do with this box it has hard set ram without any way to add or remove them. The NVMe drive is the only solution outside of a USB based HDD however that like you said is very slow.
The ZFS yes is a concern with the drive however it shows with gpart as FreeBSD
I triggered a panic and it works with crash dumps also. I had Netgate forum help me with this and FreeBSD forum. I am thinking I should use a actual USB based HDD in the long run to abuse it with swap use however with a firewall that would really slow it down.
Check out
ada0s3
Shell Output - gpart list -aโWarning Do not use your internal SSD for swap.
Ref:
https://forums.freebsd.org/threads/resolved-usb-based-swap.93362/#post-654423
This FreeBSD research I did got me going also Netgate forum if you want to make a swap.