High CPU USAGE IN 2.7.0-RELEASE
-
We are not using captive portal. Around 50 devices are connecting in LAN.
Interrupt load increases as number devices connection increase in LAN.
"LAN in" Traffic has reaching till 300 M as cpu interrupt load increases.Then we need to reboot firewall.
-
@stephenw10 said in High CPU USAGE IN 2.7.0-RELEASE:
What NICs do you have in that box? I can see re0 there.
Is it all Realtek NICs?
-
We are using three NIC.
We are using two WANs and one LAN.
WAN1 is Intel adapters
LAN is Intel adapters
WAN 2 is Realtek adapter -
Are you load-balancing traffic between those WANs?
-
No, We are not doing a load balancer as secondary WAN has less bandwidth.
FYI - We have updated now the version to 2.7.2
-
What does
vmstat -i
show is creating the interrupt load? -
This is taken before issue.
-
Ok nothing too huge there. What does it show when the issue happens?
-
Our Network has IPsec VPN, and OpenVPN. I can access the firewall via web GUI and SSH, IPsec network, and Open VPN Client.
Once issue has arrived, I can't able to firewall via SSH, web GUI, IPsec Network, and OpenVPN client but all existing devices can connect to the internet continuously without any issue..
-
Can you connect to the console to check that? You captured the top output above.
-
I couldn't access the console, it froze just like SSH. After rebooting, I was able to access the console.
-
How did you get the output from top then?
-
After a few seconds of the above output, I noticed that the console froze.
-
Hmm, you might try entering 'ctl+t' at the console. That can respond when nothing else does. It should show what process is using the CPU cycles.
-
I attempted everything, but I was unable to receive any output from the console.
I noticed bandwidth usage, Below Image is when there was no issue.
"The images below (bandwidth and CPU load interruption screenshot) were taken before the issue occurred. When issue is arriving,"Lan in Bandwidth " will be reaching a maximum 300M consistently as the image has 170 MB.
-
Ah, maybe it really is just slamming the firewall with traffic from LAN then. If you disconnect the LAN when that's happening does the console become responsive again?
Is LAN igb0?
-
Yes LAN is igb0.
I didn't disconnect the LAN and check the console. I will check next time.
How can I monitor LAN traffic for each host? Which log should be checked for this issue? Please let me know
-
I would expect the system log so show something after you reboot.
The igb0 NIC is multithreaded so it can try to use all available CPU cycles on all cores. em0 is single so could not. You might switch those assignments so you can still connect and investigate when this hgappens.
-
-
em NICs are single queue. Only one CPU core can service the incoming and outgoing traffic queues. That means that on a 4 core CPU like you have it can never load all the cores.
igb NICs are multiqueue and here are attaching with 4 queues. Enough to load all the CPU cores sufficiently to prevent other services running.
You could override that by setting:
dev.igb.0.iflib.override_nrxqs=1 dev.igb.0.iflib.override_ntxqs=1
Or you could try to set a lower max interrupt rate like maybe:
hw.em.max_interrupt_rate=2000
But just swapping the WAN and LAN NIC assignments so LAN in em0 is probably easier. Unless you're not local to the box.