High CPU-usage with a Intel Core i3?



  • Hi!

    I have a fresh 1.2.3-RELELASE installation and some troubles. The CPU-usage is around 85-95% with a brand new hardware setup. I've had alot older systems that had alot lower CPU-usage. Totally in traffic i only have around 600 GB for the last month with an average traffic of 1.73 Mb/s. I installed PhpSysInfo and the Load Averages was only 0.77 0.35 0.16. I guess the traffic isnt the problem, but why is the CPU working so hard? I only have PhpSysInfo when it comes to packages and i just have a couple of firewall rules and no Captive Portal activated. In total there are 4 wired interfaces (Adaptec AIC-6915 and NatSemi DP8381) and 1 wireless (Ralink RT2561S-based).

    The hardware is following: Intel Core i3 540 3.06 GHz, Intel DH55HC motherboard (H55 chipset), 2 GB DDR2-memory, Seagate 80 GB 2.5-inch harddrive (ST980811AS).

    Tip for checking what is eating all the CPU-power?

    A chart:



  • UNCHECK  |–-> System ---> Advanced ---> Networking ---> Enable device polling


  • Rebel Alliance Developer Netgate

    Looks like you have polling on, as ericab was hinting at. Polling uses all spare CPU cycles to check for input from the network instead of waiting for interrupts.

    On an i3 I doubt you need polling, it probably wouldn't help you much at all.

    If it's not polling, we'd need to see a screen or so of output from "top -SH" from the shell to see what processes are using the CPU.



  • Device polling wasnt enabled. Strange. I checked it and unchecked it just to be safe, but the CPU-usage is still very high.

    This is the output from the top -sh:

    _24434 root        1  8  20  6564K  4844K wait    9:01  0.00% sh
      442 root        1 -58    0 21076K 18936K bpf      2:31  0.00% tcpdump
      420 root        1  44    0  3268K  1152K select  2:01  0.00% syslogd
      443 root        1  -8    0  3156K  784K piperd  1:05  0.00% logger
      988 root        1  8  20  3156K  796K nanslp  1:04  0.00% check_reload_status
      613 root        1  4    0  6168K  3128K kqread  0:30  0.00% lighttpd
    24256 root        1  44    0  3312K  1540K select  0:10  0.00% hostapd
    55879 nobody      1  64  20  3156K  1336K select  0:05  0.00% dnsmasq
      965 root        1  8    0  3240K  1272K nanslp  0:04  0.00% cron
      875 _ntp        1  44    0  3156K  1228K select  0:04  0.00% ntpd
    55906 dhcpd      1  64  20  3156K  2036K select  0:02  0.00% dhcpd
      890 root        1  44    0  3156K  1244K select  0:01  0.00% ntpd
      997 root        1  8    0  3156K  900K nanslp  0:01  0.00% minicron
    45213 root        1  -8    0 40712K 17136K piperd  0:01  0.00% php
      635 root        1  4    0 44808K 20296K accept  0:01  0.00% php
    22507 dhcp      1  44    0  3156K  1320K select  0:01  0.00% dhclient
      625 root        1  4    0 45832K 20104K accept  0:01  0.00% php
      618 root        1  4    0 45832K 18992K accept  0:00  0.00% php


  • Rebel Alliance Developer Netgate

    top -SH
    

    (capital s, capital h)

    Also be sure to include the header lines that list the summary of cpu/ram/etc usage.



  • Ops, sorry. Here's the full output:

    _$ top -SH
    last pid: 59280;  load averages:  0.22,  0.11,  0.04  up 42+20:33:39    19:09:33
    84 processes:  2 running, 63 sleeping, 19 waiting

    Mem: 70M Active, 26M Inact, 38M Wired, 92K Cache, 31M Buf, 1729M Free
    Swap: 4096M Total, 4096M Free

    PID USERNAME PRI NICE  SIZE    RES STATE    TIME  WCPU COMMAND
      34 root    -68    -    0K    8K WAIT  823.4H 96.00% irq19: sf1 atapci1
      11 root    171 ki31    0K    8K RUN    189.3H 10.79% idle: cpu0
      25 root    -68    -    0K    8K WAIT  380:59  0.00% irq18: sf0 atapci0
      26 root    -68    -    0K    8K WAIT  202:23  0.00% irq16: sf3 ehci0
      15 root      44    -    0K    8K -      125:28  0.00% yarrow
      12 root    -32    -    0K    8K WAIT    52:33  0.00% swi4: clock sio
      33 root    -68    -    0K    8K WAIT    17:49  0.00% irq22: ral0
    24434 root      8  20  6564K  4844K wait    9:03  0.00% sh
      442 root    -58    0 21076K 18960K bpf      2:31  0.00% tcpdump
      420 root      44    0  3268K  1148K select  2:01  0.00% syslogd
        4 root      -8    -    0K    8K -        1:57  0.00% g_up
      20 root      8    -    0K    8K -        1:39  0.00% thread taskq
        5 root      -8    -    0K    8K -        1:16  0.00% g_down
      443 root      -8    0  3156K  784K piperd  1:05  0.00% logger
      988 root      8  20  3156K  796K nanslp  1:04  0.00% check_reload_status
      48 root      20    -    0K    8K syncer  0:46  0.00% syncer
      613 root      4    0  6168K  3128K kqread  0:30  0.00% lighttpd
        3 root      -8    -    0K    8K -        0:25  0.00% g_event_

    Seems to be problems with an IRQ, maybe from the Ralink or SiS-nics?


  • Rebel Alliance Developer Netgate

    grep atapci1 /var/log/dmesg.boot

    It's probably an HDD controller, so something is making the HDD chew up lots of CPU.

    You might try adjusting any IDE/SATA options you have in the BIOS.



  • Very strange. SF1 is one of the server NIC:s with four ports.

    Sorry to take your time, but i'm terrible with FreeBSD :-/ The grep command gave me this:

    atapci1: <intel ata="" controller="">port 0xf090-0xf097,0xf080-0xf083,0xf070-0xf077,0xf060-0xf063,0xf050-0xf05f,0xf040-0xf04f irq 19 at device 31.2 on pci0
    atapci1: [ITHREAD]
    ata4: <ata 0="" channel="">on atapci1
    ata5: <ata 1="" channel="">on atapci1</ata></ata></intel>

    Could it be the HDD that has bad sectors etc?


  • Rebel Alliance Developer Netgate

    Is the drive actually on that controller? Is anything?

    atacontrol list
    


  • ATA channel 0:
        Master:      no device present
        Slave:      no device present
    ATA channel 1:
        Master:      no device present
        Slave:      no device present
    ATA channel 2:
        Master:      no device present
        Slave:      no device present
    ATA channel 3:
        Master:      no device present
        Slave:      no device present
    ATA channel 4:
        Master:  ad8 <st980811as 3.alb="">SATA revision 1.x
        Slave:      no device present
    ATA channel 5:
        Master: acd0 <optiarc dvd="" rw="" ad-5240s="" 1.03="">SATA revision 1.x
        Slave:      no device present</optiarc></st980811as>


  • Rebel Alliance Developer Netgate

    So it could be either the HDD or the DVD

    You could try moving the DVD drive to another controller, see if the problem stays the same or follows the DVD drive.



  • Thanks! I'll try to change SATA-port and unplug the DVD. Very strange problem since the controller is the standard ICH11 from Intel.



  • This line

    34 root    -68    -    0K    8K WAIT  823.4H 96.00% irq19: sf1 atapci1

    shows its the irq19 handler thats chewing up most of the CPU. It shows that AT LEAST devices sf1 and atapci1 use irq19. There may be others.

    One possible explanation is that the BIOS has either incorrectly calculated the IRQs or has provided incorrect information to FreeBSD such that FreeBSD thinks that some device which really uses IRQ19 uses (say IRQ17). If this is happening then the handler for such device would be on IRQ17 and when the device interrupts on IRQ19 there is no handler there to clear the interrupt condition and so on exit from the interrupt there is an immediate interrupt again because the interrupt condition hasn't been cleared.

    Another possible explanation is that there is a bug in the handler for some device on IRQ19 and in some circumstances the interrupt condition is not cleared.

    Perhaps this problem will go away if you try a pfSense 2.0 snapshot build; it has newer device drivers than pfSense 1.2.3 and your motherboard has a pretty new chipset.

    If you can't go to pfSense 2.0 or it doesn't help you could try removing devices to see what makes a difference. Can you put one or more of your disks on atapci1? Can you replace (at least for test) the 4 port NIC?


  • Rebel Alliance Developer Netgate

    vmstat -i
    

    Will show what devices are on an IRQ (to a point..)

    I didn't catch the sf1 in there earlier, but that is an Adaptec ethernet controller… I don't think I've ever seen one of those before, not sure what kind of quality it is (or the driver for that matter) so as wallabybob mentioned it could be that as well.

    Moving the devices around a bit may still prove helpful, at least in narrowing down exactly which is to blame.

    It might be worth relocating both of those drives to the primary sata controller instead of ports 4/5, they are obviously connected through a different internal path. sf1 may behave itself if it doesn't have to fight for the bus.



  • I seems to have the exact same problem on my new 1.2.3 install on i3 540 and thru the Web GUI it showed a constant 20+% CPU.
    When view thru "top -SH" command I get the following which showed "irq19: atapci1+" taking up 89% on CPU0

    top -SH

    last pid: 29754;  load averages:  0.00,  0.01,  0.00                                up 0+05:13:43  06:09:48
    87 processes:  6 running, 66 sleeping, 15 waiting
    CPU:  0.0% user,  0.0% nice,  0.1% system, 23.0% interrupt, 76.8% idle
    Mem: 23M Active, 9280K Inact, 37M Wired, 280K Cache, 16M Buf, 2864M Free
    Swap: 8192M Total, 8192M Free

    PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
      11 root    171 ki31    0K    8K CPU3  3 311:54 100.00% idle: cpu3
      13 root    171 ki31    0K    8K CPU1  1 311:45 100.00% idle: cpu1
      12 root    171 ki31    0K    8K RUN    2 308:51 100.00% idle: cpu2
      38 root    -64    -    0K    8K CPU0  0 264:54 89.79% irq19: atapci1+
      14 root    171 ki31    0K    8K RUN    0  47:24 11.87% idle: cpu0

    The "atacontrol list" command output I got:

    atacontrol list

    ATA channel 0:
        Master:      no device present
        Slave:      no device present
    ATA channel 1:
        Master:      no device present
        Slave:      no device present
    ATA channel 2:
        Master:      no device present
        Slave:      no device present
    ATA channel 3:
        Master:  ad6 <st3500514ns sn11="">SATA revision 2.x
        Slave:      no device present
    ATA channel 4:
        Master: acd0 <hl-dt-st dvdram="" gh24ns50="" xp02="">SATA revision 1.x
        Slave:      no device present
    ATA channel 5:
        Master:      no device present
        Slave:      no device present
    ATA channel 6:
        Master:      no device present
        Slave:      no device present

    and "vmstat -i" output I got:

    vmstat -i

    interrupt                          total      rate
    irq16: ehci0                        159          0
    irq19: atapci1+              1281400417      67541
    irq23: ehci1                          55          0
    cpu0: timer                    37942915      1999
    irq256: re0                      364280        19
    irq257: re1                      353607        18
    cpu2: timer                    37933878      1999
    cpu3: timer                    37933877      1999
    cpu1: timer                    37933878      1999
    Total                        1433863066      75577

    So I'm wondering should I also try to relocate my SATA Hard disk to ATA channel 1 and DVD rom to ATA channel 2?

    Any suggestion ?

    Thanks.</hl-dt-st></st3500514ns>



  • Please post the dmesg output so we can see more about your configuration, particularly what devices are using irq19.

    If you have the option to do so, i would go into the BIOS, record the mode of all the disk controllers and change them all to legacy rather than raid or AHCI and see if that makes a difference. Or try a snapshot build of pfSense 2.0.



  • This is very interesting after I poke around i can actually see that the CPU hog for IRQ19 is used by "Intel ATA Controller" as i can see it during the system bootup and it scroll by the screen.
    then I check my system BIOS and my on-board SATA controller is already using IDE mode (non AHCI mode). So i went and disable just about everything I can find under the BIOS related to AHCI or disk mode and access, but nothing will make the CPU hog of IRQ19 drop. So finally I decided to change my SATA controller from IDE mode to AHCI mode, then after a reboot, my CPU usage is down to about 0% now. So this is interesting that it should be set to AHCI mode on this hardware combination.

    This has been a good learning experience.


Log in to reply