Firewall effectively DOS'd by bad DHCP client
-
Hi all - first post newbie here! Thank you to Netgate for all your work on pfSense.
I am running pfSense 2.5.2. I noticed latency spikes when browsing the web and tried to log in to the firewall to diagnose the problem. The login screen took 20-30 seconds to appear. After a couple of failed attempts to login, I eventually managed to get to the dashboard where it became apparent that all CPU was being consumed by DHCPd log files being rotated (see output from top below)
The log files in question were 80-100MB, and they had built up very rapidly, hence my quad-core 2Ghz thin client (an HP T620 plus) effectively melting down as it carried out highly CPU-intensive activity across every core (the screenshot was after 1 of the log compressions had finished, hence why only 3 appear in the list and I was able to log in as a core had been free'd up).
The cause of the huge log files turned out to be a couple of wireless devices that had constantly requested a DHCP lease thousands of times a second, filling up the logs:
Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPACK on 192.168.0.xxx to 44:7f:77:xx:yy:zz (HiveView) via igb1 Aug 9 21:36:00 pfSense dhcpd[48529]: reuse_lease: lease age 591 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.0.xxx Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPREQUEST for 192.168.0.xxx from 44:7f:77:xx:yy:zz (HiveView) via igb1 Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPACK on 192.168.0.xxx to 44:7f:77:xx:yy:zz (HiveView) via igb1 Aug 9 21:36:00 pfSense dhcpd[48529]: reuse_lease: lease age 591 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.0.xxx Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPREQUEST for 192.168.0.xxx from 44:7f:77:xx:yy:zz (HiveView) via igb1 Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPACK on 192.168.0.xxx to 44:7f:77:xx:yy:zz (HiveView) via igb1 Aug 9 21:36:00 pfSense dhcpd[48529]: reuse_lease: lease age 591 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.0.xxx Aug 9 21:36:00 pfSense dhcpd[48529]: DHCPREQUEST for 192.168.0.xxx from 44:7f:77:xx:yy:zz (HiveView) via igb1
I believe this was because I was updating the firmware on my (unrelated) Wifi mesh system - seemingly this somehow upset either the wireless devices or the firewall into some kind of tight DHCP reqiuest loop which resolved itself after ~5 minutes. The system had been running without issue prior to this so it seems to much of a coincidence otherwise.
Getting to the point
I don't really know (or care) whether it was a naughty DHCP client in my wireless devices, or DHCPd itself that caused the logs to fill up. What I do care about is that the firewall was essentially inaccessible temporarily due to it.
I'm wondering if the log compression worker should be limited to ([number of cores] - 1) or some similar user-configurable parallelism limit to ensure that even if there is a queue of logs to compress, the firewall is still responsive. Alternatively, log rotation/compression could be done at a lower priority than system processes so it doesn't steal all available CPU (this may not work as well as the first suggestion though).
This could probably apply as a general case to log rotation, not just DHCPd - otherwise any activity which causes rapid log generation can cause the same issue.
Just my $0.02