2.5.1-RELEASE - Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI
-
@stephenw10 My guess is that the code reading the logs is tightly coupled with the PHP web render thread, effectively crippling the GUI. If it weren't, I'd have no problem with logs loading slowly, but now in the best case it's a looong delay, in the worst case it's a HTTP 504 displayed.
-
@johnpoz "Log firewall default blocks" is enabled. Wonder if it's there enabled by default, the UI doesn't tell. <-- point to improve the interface perhaps.
Only a few reject and deny rules tied to pfBlocker's Feodo abuse list are logged, however those caught in total less than 10 MB since last box reboot nine hours ago.
No pass rules traffic is logged. It would cripple the storage, terabytes a day pass through pf.I guess your second definition of spam applies here, assisted by the snort2c table populated by Suricata.
Both the slowest and fastest box push 50 to 300 Mbps constantly, so that'd be the culprit for the logs' size.
However, my main concern is that coupling between log reading and webpage generation or loading, in both the dashboard widget and the system logs -> firewall page.
-
@e-1-1 said in 2.5.1-RELEASE - Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI:
Wonder if it's there enabled by default
Yeah logging default block is on by default.
-
@e-1-1 :
Yeah logging default block is on by default.
Being DOSsed means (to me) : instead of using the firewall widget, have the logs send to a 'remote' log server which is more capable of handling 'huge' quantities of log lines.
The firewall widget is nice, but as soon as things heat up, it will slow down the access to the default login page of pfsense = the dashboard. Not good ...
The widget is nice, for the days everything is calm and nothing is happening.And of course, uncheck the default block log :
as it will only learn you that IPv4 goes from 1.1.1.1 to 254.254.254.254 ;)
-
https://docs.netgate.com/pfsense/en/latest/monitoring/logs/settings.html#log-rotation-settings
Read the notes, especially on the compression settings.
-
@jimp Thanks for linking that.
Looks like "none" is indeed the way forward to improve page load performance, current compress ratio is beautiful:
[2.5.1-RELEASE][root@hostname]/var/log: zfs get compression,compressratio zroot/var NAME PROPERTY VALUE SOURCE zroot/var compression lz4 inherited from zroot zroot/var compressratio 8.32x -
Due to job "conditioning", let's say, that pushes me to get rid of root causes and find error avoidance methods, I gotta get the following paragraphs out of my mind and here, "on paper":
I'd hereby push for a GUI hint towards this "hey, don't choose any compression, you are already using a ZFS dataset with compression enabled and it's x.y% efficient, no need for further CPU cycles to be wasted."
As well, I'd push for this "Log Compression" setting to be disabled by default when an install is done on a ZFS filesystem/volume manager, if it's not already.
And for a "best practice" guide (one can dream, something like Cisco's wireless best practice page in their wireless controllers' web interface) directly in the GUI, after an upgrade a message could pop-up saying "hey we noticed you use ZFS, but have log compression enabled. It's not ideal. Wanna change to none? warning - all previously logs will be deleted. Yes/No".
To conclude, even with "none" set, neither the dashboard, nor the firewall log section load as fast as I'd like.
Yes, @Gertjan, log shipping to a SIEM is nice, however due to resource constraints not every box/cluster that I manage benefits from this. Therefore the need to at least be able to view the system logs -> firewall tab, unfortunately with blocked entries for statistics. Now that I got to this, another thing comes to mind - if the dashboard widget and system logs are set to retrieve the last, say, 250 log lines, why is the webpage generating code trying to read all the logs, including the older ones? If it starts to read the current filter.log and it has reached the line limit, it should stop there and then imo. Ah, another one for the PHP logic.From the first post, my wish was a decoupling between the page load code and the log filter load code. Hope this is possible to do, sometime in a later release.
-
@e-1-1 said in 2.5.1-RELEASE - Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI:
zfs get compression,compressratio zroot/var
Your getting a better ratio than me.. I would assume that is because your logs are bigger? Lots more info that can be compressed for better overall ratio.
[21.05-RELEASE][admin@sg4860.local.lan]/: zfs get compression,compressratio zroot/var NAME PROPERTY VALUE SOURCE zroot/var compression lz4 inherited from zroot zroot/var compressratio 3.45x - [21.05-RELEASE][admin@sg4860.local.lan]/:
Or maybe you have other logs that compress better? I am not running any sort of ips, so there are none of those logs, etc.
-
@johnpoz I guess that's the case, filter.log reached a bigger size today.
/var/log: ls -alh filter* -rw------- 1 root wheel 378M Jun 8 01:41 filter.log
Not sure why it's not rotating though. Meh.. maybe after next reboot it sorts itself out.
Edit - or maybe I know?!
From one of my previous posts:
"Log Rotation Size (Bytes)" is default at 512000.
(can't find quote format button)
Actually, that field has no value, value 512000 is faintly visible there as a grey, washed out text. Do you think setting something there will trigger a log rotation sometime in the future via a built-in cron job or something similar?
-
The 'cron' package shows :
so every minute the command /usr/sbin/newsyslog is run.
It reads its default config file /etc/newsyslog.conf in which its told to read /var/etc/newsyslog.conf.d/pfSense.conf
In that file you can find :/var/log/userlog root:wheel 600 3 1024 * B /var/log/utx.log root:wheel 644 3 1024 * B /var/log/system.log root:wheel 600 7 1024 * JC /var/log/filter.log root:wheel 600 7 1024 * JC /var/log/dhcpd.log root:wheel 600 7 1024 * JC /var/log/vpn.log root:wheel 600 7 1024 * JC /var/log/poes.log root:wheel 600 7 1024 * JC /var/log/l2tps.log root:wheel 600 7 1024 * JC /var/log/openvpn.log root:wheel 600 7 1024 * JC /var/log/portalauth.log root:wheel 600 7 1024 * JC /var/log/ipsec.log root:wheel 600 7 1024 * JC /var/log/ppp.log root:wheel 600 7 1024 * JC /var/log/wireless.log root:wheel 600 7 1024 * JC /var/log/nginx.log root:wheel 600 7 1024 * JC /var/log/ntpd.log root:wheel 600 7 1024 * JC /var/log/gateways.log root:wheel 600 7 1024 * JC /var/log/resolver.log root:wheel 600 7 1024 * JC /var/log/routing.log root:wheel 600 7 1024 * JC /var/log/auth.log root:wheel 600 7 1024 * JC
This corresponds with :
where 1024 x 1024 (K) = 1 048 576, so each file will be approximatively 1 mega byte on my pfSense.
The default is probably 512 Kbytes, as the help text states. -
We had already discussed internally about what to do by default for ZFS cases on fresh installs. Initially we didn't opt to change anything because ZFS is smart enough to not compress already compressed data (it tries first, but if it doesn't save enough, it skips it) but in practice it appears to be slower anyhow, so it's worth disabling when we know it's not ideal. I made https://redmine.pfsense.org/issues/12011 to look into that.
I also made https://redmine.pfsense.org/issues/12012 for improving the GUI help text.
As for how the logs are parsed, I know it's inefficient but it's not an easily solvable optimization. This is done for some logs like the filter log to ensure that even if the user wants 50 lines, that they actually get 50 parsed lines and not an empty log if there were 50 suppressed entries or entries which could not be parsed. This is also necessary for searching because the user wants N results, not to search for items within the N lines returned. I'm not sure if going file by file or chunks of files is going to end up being better given all the extra overhead involved in fetching parts and iterating. I made https://redmine.pfsense.org/issues/12013 for that but it's not likely to get any traction in the near future given the nature of the problem.
Most users who need that kind of efficiency and large log sizes are better served by pushing logs to a more appropriate central log server. I know a lot of people rely on the firewall for that, but they really shouldn't. It isn't reliable for log retention, a log server must be used, especially in cases where legalities/regulations are involved.
-
@gertjan Aha! Roger that.
So it looks like an empty general "Log Rotation Size (Bytes)" actually leads to a whooping
var/log: less ../etc/newsyslog.conf.d/pfSense.conf | egrep filter /var/log/filter.log root:wheel 600 7 2097151.3671875 * C
How it ended up like that, might've found out below
.
I set the general parameter in the GUI to 512000, now pfSense.conf has still (?!) that "2097151.3671875" value for /var/log/filter.log. Set it to 1048576 like in your example, still "2097151.3671875".
Digging deeper, in the firewall log's own "Log Rotation Size (Bytes)" parameter, found value "2147483000". Culprit found, value deleted, "1048576" is now shown as a fading grey text.
Saved, waited a couple of minutes, pfSense.conf still showing "2097151.3671875". Manually set to 512000, saved => pfSense.conf shows "500". OK so far.Then cleared "Log Rotation Size (Bytes)", leaving the washed out default value. Save=> pfSense.conf still shows "500". It gets this from the general setting, right? Then I guess "1048576" should not be shown at all greyed out in the firewall tab, but a hint be present that if it's not set, it gets the general default or user-defined setting.
-
@jimp Appreciate it.
As for the "quick mode" log parsing, if I may add a point - the "read only X parsable lines" for me is applicable only in two cases:
- the dashboard firewall widget that by default gets the last few logs, sorted by time from the newest. Zero need afaik to parse more than those
- the default (freshly opened, no filters) firewall log in both normal and dynamic view.
For the rest, parse away :|.
On the efficiency, log volume and "central log server" topic I agree. It's a waste of time to use anything else than dedicated distributed log aggregation, indexing and searching clusters, for example. However, in regulated and air gapped environments, I can't use pfSense, no matter how much I would like to. It's simply not "enterprise-y" enough. Left some feedback last year on this subject in the web form, I hope it sparked some discussions internally. Won't detail here, it's simply too many features to list and it's off-topic.
-
The firewall log tosses out log entries which are very spammy and useless in some cases (like some IPv6 multicast entries which get dropped by default rules and are practically worthless), so it has to fetch significantly more than we really want to ensure there is enough data.
Guess too low and we get a ton of user complaints about "why is it only showing me 20 entries when I set it to 50" and so on.
-
@jimp thanks for all your hard work. Have you thought about Wireguard concept (less is more). This concept could be a game-changer for pfSense dev team.
-
@akegec said in 2.5.1-RELEASE - Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI:
Have you thought about Wireguard concept
That's an understatement.
I guess jimp has thought a lot about Wireguard recently. -
@gertjan it would be wise if pfsense team apologize and involve Jason A. Donenfeld to the project 2.6. To behave like a gentleman could bring back the trust of former pfsense fans/users back.
-
I have no idea what "less is more" might mean for logging, but the rest of that is off topic for this thread.
We already try to take a minimalist approach to support for logging. There are numerous logging features like reporting, collation, alerts, etc which we won't add because it's not the right place for them. Those things belong on a log server, no matter how many people try to convince us they belong on a firewall because they don't want to setup a log server.
-
@akegec
Why ?
To who ?
Some one was blessed, hurt ? At risk ?
For trying to add something new ?
And finding out that "when it's done", people start looking at it, and the 'code' just starts its actual "life cycle of updates, upgrades, changes" span. Nothing's new here neither.Maybe, pfSense and TNSR should use very conservative design rules, but, its (free) software, so they can put things in, and take things out if needed, when they see fit.
Yes, I've read the story. saw the blogs, Youtube video's and others.
All I know, is that I know close to nothing about it. So, finally, what I "think" is meaningless, and stays in my head.One thing for sure, I didn't lost 'trust', not for a moment.
Anyway, this has nothing to do with "Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI"
-
@gertjan said in 2.5.1-RELEASE - Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI:
Anyway, this has nothing to do with "Dashboard's "Firewall logs" widget kills the CPU and effectively DoS's the GUI"
Exactly..
One thing for sure, I didn't lost 'trust', not for a moment.
Agree 110% here..
Could things of been handled better in the public eye. Maybe.. I don't really concern myself with squabbles like that.. All kinds of stuff can get said in the heat of the discussion.. On both sides.. But just at a loss to what that has to do with the topic at hand anyway..
-
@johnpoz From my side, this topic can be closed. Root causes of the issues were found and several improvements in business logic and documentation are on the public redmine backlog.
-