High load Netgate 6100
-
I currently have 4 Netgate 6100's deployed and 1 of them has abnormally high load averages / CPU usage
Each 6100 has a site-to-site IPsec VPN (128bit GCM and QAT is enabled) and there's about 50 - 80 Mbit travelling over that VPN at any given time
The 3 6100's without the issue average around 0.75-1.5 load average but the 1 with the issue averages 4-5 load average and 80-95% CPU
This appears to be causing significant instability issues for us, where the firewall goes completely unresponsive every 1 - 3 days and requires a reboot to return to normal operations
This is a top run on the 6100 with the issue:
And this is top run on one of the other 6100s without the issue:
The configs of all 4 are effectively identical, they only differ by the WAN settings and the 6100 with the issue uses pfblocker sync to keep the other 6100s pfblocker settings identical (so we only have to whitelist something once)
The only other difference I can think of is that the 6100 with the issue has a 300/300 connection and a backup PPPoE connection (that no traffic flows over unless there is a WAN failure) and the other 6100s are on 1000/1000 connections but the total WAN activity on the 6100 with the issue under 150mbits and the IPsec never goes above 80mbit
I'm at a total loss as to what the issue is to be honest, I have tried:
- lowering the ipsec from 256bit to 128
- I have removed all traffic shaping and limiters
- I have factory reset and restored the config
- I removed hn ALTQ support (this made no difference)
- I have net.inet.ip.intr_queue_maxlen set to 3000 or I would get a positive value when running sysctl net.inet.ip.intr_queue_drops
Could someone help me / give me some hints as to what I could try next?
-
Both of those 6100s are passing 80Mbps IPSec when that top output was taken?
Is that traffic always in one direction?
Steve
-
@eria211 said in High load Netgate 6100:
pfblocker
What version of pfBlocker? The last three -devel versions have a bug related to changes in pfSense 22.05. If that's the case there's an easy fix to change a ) to a space:
https://redmine.pfsense.org/issues/13154 -
Yes, good point! It's probably that.
-
@stephenw10 yes, give or take they were 50-80mbits at the time of taking the screenshot, its a series of Truenas snapshot replications so its 5-8 datasets replicating at a maximum of 1MiB/s a piece
The IPSec traffic is always in that direction from the low load average 6100 to the high load average 6100
-
@steveits I am absolutely stunned, as soon as I edited the file and reloaded pfblocker the CPU dropped to 33% and the load average has gone from 5.2 down to 2.45
Thank you for your help - I will make this change on the other 6100's
-
@eria211 said in High load Netgate 6100:
@steveits I am absolutely stunned, as soon as I edited the file and reloaded pfblocker the CPU dropped to 33% and the load average has gone from 5.2 down to 2.45
Thank you for your help - I will make this change on the other 6100's
This will also fix the IP blocking stats and reporting as well. I finally made this same change to my 6100 yesterday.