CPU Usage high on one core. Reload didn't fix [SOLVED]

Visseroth

OK, so I have been trying to figure out why my CPU usage on one core has been stuck at 100% so I reinstalled all my packages thinking maybe something wasn't working correctly and all that did was load up both cores at 100% and cause my 160GB hard drive to fill up rapidly. So much so I had to reload because I don't know how to figure out where the source of the problem is as I'm still learning FreeBSD.
So here are my system specs….

Pentium D 3.0Ghz
2GB of ECC RAM
Dual GB onboard NICs (bge0=LAN bge0=Not in use)
1 X 10/100 PCI NIC (WAN)
1 X Linksys Wireless BG PCI NIC (Not in use curretnly)
Motorola Cable modem plugged directly into the PfSense box
Linksys SLM2024 24 port gigabit switch for the network.

I have the following installed and running.....
arpwatch
bandwidthd
darkstat
dhcpd
havp
ntpd
snort
squid
squidGuard

Squid's cache is setup to cache as much as possible. I figured since I have a 160GB hard drive why not.

After my reload I have one core that is stuck at 100% again. I know this because my CPU usage is stuck at 50% total and my RRD graphs indicate that either system or interrupts are using the CPU but I can't tell which it is because the colors are so close to the same and i would attach a graph but I don't see any ways to attach to the forum.

My processes are sitting at about 155, states 144/195000, MBUF Usage 1266/25600. Memory currently 27%, swap 0

Does anyone have any ideas or can anyone help me troubleshoot this?

Nachtfalke

You should take a look at "DIAGNOSTICS -> System Activity" and see which process is using so much CPU usage.

It could be a problem of a very high squid hard disk cache. On start of squid this can cause a high CPU usage.

wallabybob

pfSense shell command top -S -H will give you more information on processes consuming CPU.

pfSense shell command vmstat -i will give you interrupt rates,

I suggest you post the output in a reply. In the the pfSense web GUI you can execute a pfSense shell command on page Diagnostics -> Command Prompt.

Visseroth

top gives me…..

19 root 171 ki-6 0K 16K CPU1 1 743:13 98.29% idlepoll
11 root 171 ki31 0K 32K RUN 0 588:56 94.19% {idle: cpu0}
11 root 171 ki31 0K 32K RUN 1 71:41 6.79% {idle: cpu1}
8516 proxy 44 0 99M 83504K kqread 0 3:17 0.20% {initial thread}
20748 root 49 0 102M 25692K accept 0 0:11 0.20% php

for the top processes and vmstat gives me.....

interrupt total rate
irq1: atkbd0 5 0
irq14: ata0 78 0
irq16: bge0 dc0+ 180 0
irq19: ral0 uhci1+ 734832 16
irq23: uhci0 ehci0 2 0
cpu0: timer 89821972 1999
cpu1: timer 89821869 1999
Total 180378938 4015

Visseroth

I managed to find a image past bin, so here's what my system RRD graphs look like….

In the second graph where you see the rise of light red is where I reinstalled all my packages. The gap between the rise and the drop is where I reloaded my system from scratch and then uploaded my config back to the system.

I also just now went through and stopped each package one at a time while watching top via putty before restarting the package then stopping another package and repeating with each package until I had gone through them all and it didn't seem like the stopping of any package made a difference in CPU usage.

cougarmaster

Try turning off polling from

system –> advanced --> networking

Visseroth

huh, and I guess you figured that out by looking at idlepoll usage there in the list. I figured that was just a idle process. That definitely did it though, my CPUs went down to idle.

Gertjan

@Visseroth:

irq19: ral0 uhci1+ 734832 16

ral0 = your non-active wifi network card.
uhc1+ = some USB sub system.

Strange to see that kind of activity on these two devices, knowing that one, ral0, isn't used, and the USB business on a firewall is more an exception (maybe a UPS, that's it).

Remove the wifi NIC card to get more details.

edit: ok, saw you post above, seems that polling the unemployed ral0 device was responsible for this.

Visseroth

ahhh, so because the NIC wasn't in use and it was being polled it caused that additional load because the system was trying to poll at something that wasn't there?

wallabybob

@Visseroth:

ahhh, so because the NIC wasn't in use and it was being polled it caused that additional load because the system was trying to poll at something that wasn't there?

No, idlepoll is an optional FreeBSD procedure which polls the network interfaces instead of using interrupts. In some types of loads this polling can give higher throughput than interrupt driven NICs because polling reduces overhead in interrupt servicing.

The vmstat figures for irq19 would be entirely for uhci1+. 16 interrupts a second is probably not a big load but at least three device interrupt service routines (ral0, uhci1 and unspecified, the "+") are called on every irq19 interrupt. That ral0 was unused would mean the idlepoll thread was calling the ral service routine unnecessarily.

idlepoll will effectively result in one CPU being always busy.

Visseroth

Ahh, well thank you gentleman. I have learned even more!