High CPU Usage, Low Troughput
-
Hello! I'm deploying a new HA firewall pair. Now that they are installed, I am seeing much lower throughput that expected and maxed CPU. The two firewalls are XG-7100s each connected with the 10g ports LACP bonded to a Cisco Nexus switch. The WAN and LAN are VLANs on the LAGG. The two firewalls are then connected directly to each other on a 1g port for Sync traffic. There are no major packages or services running on the firewalls. I was expecting to be able to push a minimum of 8g through the firewall maybe all the way up to 20g but I'm having an issue much sooner. When I start to put load on the firewall, I hit a wall around 1-2g. When it gets to its wall, the CPU is completely maxed; like can't get to the WebUI maxed. I started going through the CPU troubleshooting guide but it doesn't include any info on how to interpret the data or act on it. Here is an output from top -aSH but let me know if there is more info I can provide.
**last pid: 94110; load averages: 10.33, 8.55, 5.66 up 72+19:07:55 10:28:09 345 processes: 10 running, 274 sleeping, 61 waiting CPU: 2.9% user, 0.0% nice, 6.6% system, 75.0% interrupt, 15.4% idle Mem: 54M Active, 126M Inact, 681M Wired, 98M Buf, 7019M Free Swap: 1024M Total, 1024M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 12 root -92 - 0K 1040K CPU2 2 18.0H 55.87% [intr{irq300: ix1:q2}] 12 root -92 - 0K 1040K WAIT 0 17.9H 54.39% [intr{irq298: ix1:q0}] 12 root -92 - 0K 1040K CPU1 1 17.6H 53.88% [intr{irq299: ix1:q1}] 12 root -92 - 0K 1040K WAIT 3 18.6H 50.08% [intr{irq301: ix1:q3}] 12 root -92 - 0K 1040K WAIT 3 440:13 21.39% [intr{irq296: ix0:q3}] 12 root -92 - 0K 1040K WAIT 0 440:11 20.57% [intr{irq293: ix0:q0}] 29965 root 52 0 6600K 2864K bpf 3 41.5H 18.87% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filter 12 root -92 - 0K 1040K RUN 1 434:35 18.78% [intr{irq294: ix0:q1}] 12 root -92 - 0K 1040K WAIT 2 432:12 17.62% [intr{irq295: ix0:q2}] 11 root 155 ki31 0K 64K RUN 3 1670.2 14.12% [idle{idle: cpu3}] 11 root 155 ki31 0K 64K RUN 1 1669.3 13.75% [idle{idle: cpu1}] 11 root 155 ki31 0K 64K RUN 2 1669.9 13.46% [idle{idle: cpu2}] 11 root 155 ki31 0K 64K RUN 0 1677.9 13.44% [idle{idle: cpu0}] 12 root -72 - 0K 1040K RUN 2 173:45 12.43% [intr{swi1: pfsync}] 26295 root 25 0 6404K 2560K select 2 18.9H 8.36% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log - 22777 root 43 0 88576K 36728K RUN 3 1:54 3.11% php-fpm: pool nginx (php-fpm) 8 root -16 - 0K 16K e6000s 3 101.7H 2.14% [e6000sw tick kproc] 0 root -92 - 0K 944K - 2 5:28 0.66% [kernel{ix1:q0}] 12 root -92 - 0K 1040K WAIT 3 8:23 0.49% [intr{irq306: ix2:q3}] 12 root -92 - 0K 1040K WAIT 2 9:00 0.46% [intr{irq305: ix2:q2}] 12 root -92 - 0K 1040K WAIT 1 8:38 0.45% [intr{irq304: ix2:q1}] 12 root -92 - 0K 1040K WAIT 0 8:36 0.42% [intr{irq303: ix2:q0}] 21 root -16 - 0K 16K - 1 38:37 0.39% [rand_harvestq]**
-
top -aSH during a transfer test ?
maybe try to follow this advicehttps://docs.netgate.com/pfsense/en/latest/interfaces/low-throughput-troubleshooting.html
https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html
-
This is during a transfer. I was looking at that first link and it doesn't really explain anything so I can dump all that info but I can't read it. Is there anything I should look for specifically or should I just do all of the tuneables?
-
there is a section for ix card,
https://docs.netgate.com/pfsense/en/latest/hardware/tuning-and-troubleshooting-network-cards.html#intel-ix-4-cards