CPU Usage high on one core. Reload didn't fix [SOLVED]



  • OK, so I have been trying to figure out why my CPU usage on one core has been stuck at 100% so I reinstalled all my packages thinking maybe something wasn't working correctly and all that did was load up both cores at 100% and cause my 160GB hard drive to fill up rapidly. So much so I had to reload because I don't know how to figure out where the source of the problem is as I'm still learning FreeBSD.
    So here are my system specs….

    Pentium D 3.0Ghz
    2GB of ECC RAM
    Dual GB onboard NICs (bge0=LAN bge0=Not in use)
    1 X 10/100 PCI NIC (WAN)
    1 X Linksys Wireless BG PCI NIC (Not in use curretnly)
    Motorola Cable modem plugged directly into the PfSense box
    Linksys SLM2024 24 port gigabit switch for the network.

    I have the following installed and running.....
    arpwatch
    bandwidthd
    darkstat
    dhcpd
    havp
    ntpd
    snort
    squid
    squidGuard

    Squid's cache is setup to cache as much as possible. I figured since I have a 160GB hard drive why not.

    After my reload I have one core that is stuck at 100% again. I know this because my CPU usage is stuck at 50% total and my RRD graphs indicate that either system or interrupts are using the CPU but I can't tell which it is because the colors are so close to the same and i would attach a graph but I don't see any ways to attach to the forum.

    My processes are sitting at about 155, states 144/195000, MBUF Usage 1266/25600. Memory currently 27%, swap 0

    Does anyone have any ideas or can anyone help me troubleshoot this?



  • You should take a look at "DIAGNOSTICS -> System Activity" and see which process is using so much CPU usage.

    It could be a problem of a very high squid hard disk cache. On start of squid this can cause a high CPU usage.



  • pfSense shell command top -S -H will give you more information on processes consuming CPU.

    pfSense shell command vmstat -i will give you interrupt rates,

    I suggest you post the output in a reply. In the the pfSense web GUI you can execute a pfSense shell command on page Diagnostics -> Command Prompt.



  • top gives me…..

    19 root    171 ki-6    0K    16K CPU1    1 743:13 98.29% idlepoll
      11 root    171 ki31    0K    32K RUN    0 588:56 94.19% {idle: cpu0}
      11 root    171 ki31    0K    32K RUN    1  71:41  6.79% {idle: cpu1}
    8516 proxy    44    0    99M 83504K kqread  0  3:17  0.20% {initial thread}
    20748 root      49    0  102M 25692K accept  0  0:11  0.20% php

    for the top processes and vmstat gives me.....

    interrupt                          total       rate
    irq1: atkbd0                           5          0
    irq14: ata0                           78          0
    irq16: bge0 dc0+                     180          0
    irq19: ral0 uhci1+                734832         16
    irq23: uhci0 ehci0                     2          0
    cpu0: timer                     89821972       1999
    cpu1: timer                     89821869       1999
    Total                          180378938       4015



  • I managed to find a image past bin, so here's what my system RRD graphs look like….


    In the second graph where you see the rise of light red is where I reinstalled all my packages. The gap between the rise and the drop is where I reloaded my system from scratch and then uploaded my config back to the system.

    I also just now went through and stopped each package one at a time while watching top via putty before restarting the package then stopping another package and repeating with each package until I had gone through them all and it didn't seem like the stopping of any package made a difference in CPU usage.



  • Try turning off polling from

    system –> advanced --> networking



  • huh, and I guess you figured that out by looking at idlepoll usage there in the list. I figured that was just a idle process. That definitely did it though, my CPUs went down to idle.



  • @Visseroth:

    irq19: ral0 uhci1+                734832         16

    ral0 = your non-active wifi network card.
    uhc1+ = some USB sub system.

    Strange to see that kind of activity on these two devices, knowing that one, ral0,  isn't used, and the USB business on a firewall is more an exception (maybe a UPS, that's it).

    Remove the wifi NIC card to get more details.

    edit: ok, saw you post above, seems that polling the unemployed ral0 device was responsible for this.



  • ahhh, so because the NIC wasn't in use and it was being polled it caused that additional load because the system was trying to poll at something that wasn't there?



  • @Visseroth:

    ahhh, so because the NIC wasn't in use and it was being polled it caused that additional load because the system was trying to poll at something that wasn't there?

    No, idlepoll is an optional FreeBSD procedure which polls the network interfaces instead of using interrupts. In some types of loads this polling can give higher throughput than interrupt driven NICs because polling reduces overhead in interrupt servicing.

    The vmstat figures for irq19 would be entirely for uhci1+. 16 interrupts a second is probably not a big load but at least three device interrupt service routines (ral0, uhci1 and unspecified, the "+") are called on every irq19 interrupt. That ral0 was unused would mean the idlepoll thread was calling the ral service routine unnecessarily.

    idlepoll will effectively result in one CPU being always busy.



  • Ahh, well thank you gentleman. I have learned even more!


Log in to reply