High CPU usage when downloading with muliwan config



  • Basically here's my config:

    Version 2.1-RELEASE (amd64)
    built on Wed Sep 11 18:17:37 EDT 2013
    FreeBSD 8.3-RELEASE-p11

    CPU Type AMD Sempron™ 140 Processor

    When I trigger an HTTP download (I'm using dual WAN config), my CPU usage spikes up to 100% which is weird. I tried looking at the System Activity and I notice that the command /usr/local/sbin/check_reload takes around 50% CPU usage. I tried reading in this forum first and found out that this is just a symptom of another problem. So how do I start troubleshooting this? I think my CPU is more than enough for my purpose, right?



  • What I also don't understand is why the CPU usage reaches 100% when the idle command in the system activity never reaches any value below 50%. Common sense says that CPU usage should not increase above 50% because of the behavior of the idle command, right?


  • Netgate Administrator

    How are you looking at the system activity?
    Try using 'top -SH' at the command line.

    What other threads have you looked at that implied this was a symptom?
    What packages are you running?

    Steve



  • I just look at it by going to Diagnostics -> System Activity, isn't that the same as "top -SH" ?

    I've looked at this thread: http://forum.pfsense.org/index.php/topic,59996.0.html

    I don't use any packages at all.



  • BUMP!


  • Netgate Administrator

    I see nothing about check_reload in the thread you linked to.

    Is that a typo? Do you actually mean check_reload_status?

    Did you try running top -SH at the CLI? What does the system idle show for each processor?
    The webgui screen doesn't show the total cpu usage by type breakdown line:

    CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
    

    You might have a high interrupt load accounting for you high CPU usage.

    Steve



  • @stephenw10:

    I see nothing about check_reload in the thread you linked to.

    Is that a typo? Do you actually mean check_reload_status?

    Did you try running top -SH at the CLI? What does the system idle show for each processor?
    The webgui screen doesn't show the total cpu usage by type breakdown line:

    CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
    

    You might have a high interrupt load accounting for you high CPU usage.

    Steve

    Ooops, yeah I meant check_reload_status. Sorry didn't notice that.

    Yes I did and it did show approximately the same percentage for the idle command compared to that of the one shown in the webgui. I have to double check when I get back home.

    If ever I have high interrupt load, what can I do with that?



  • I tried running top -SH again and the results is EXACTLY the same to that of the webgui's. I thought the top -SH should present more details? I already forgot but I think my CPU is just single core.


  • Netgate Administrator

    Hmm, interesting. On my test box here the Diagnostics: System Activity: screen does not show the interrupt load. It's running Nano 32 bit but I can't see why that would make any difference.

    If you do have a high interrupt load it could be linked to the problem.
    Try running vmstat -i from the console to see more interrupt info.

    Steve



  • Well, I don't see the interrupt load on both GUI and the command line. We're using the same version of pfsense, right?

    This is what I have with vmstat -i:

    $ vmstat -i
    interrupt                          total      rate
    irq14: ata0                      271172          1
    irq18: fxp2                    35396773        184
    irq19: fxp3                    23563272        123
    cpu0: timer                    382836731      1999
    irq256: em0                    56883303        297
    Total                          498951251      2606


  • Netgate Administrator

    Like I say I'm running the 32bit NanoBSD version which is probably different to you.

    
    [2.1-RELEASE][root@pfsense.localdomain]/root(7): top -SH
    last pid: 28780;  load averages:  0.05,  0.05,  0.01   up 16+00:13:51  22:00:38
    144 processes: 3 running, 105 sleeping, 36 waiting
    CPU:  0.0% user,  0.0% nice,  0.4% system,  0.2% interrupt, 99.4% idle
    Mem: 74M Active, 18M Inact, 113M Wired, 1228K Cache, 77M Buf, 745M Free
    Swap:
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
       10 root     171 ki31     0K    16K RUN     0 381.6H 100.00% idle{idle: cpu0}
       10 root     171 ki31     0K    16K CPU1    1 381.6H 100.00% idle{idle: cpu1}
       11 root     -32    -     0K   288K WAIT    1  89:32  0.00% intr{swi4: clock}
    32487 root      44    0  6280K  6300K select  0  56:51  0.00% ntpd
       11 root     -64    -     0K   288K WAIT    0  24:50  0.00% intr{irq23: uhci0
       14 root     -44    -     0K   168K -       1  21:11  0.00% usb{usbus0}
    52329 nobody    74  r30  3316K  1364K nanslp  1  10:33  0.00% LCDd
    65162 root      64   20  3300K  1208K nanslp  0   4:06  0.00% lcdproc
       11 root     -44    -     0K   288K WAIT    1   3:59  0.00% intr{swi1: netisr
    32131 root      76   20  3644K  1468K wait    1   2:39  0.00% sh
    26623 root      44    0  3264K  1232K select  0   2:23  0.00% apinger
      296 root      76   20  3352K  1180K kqread  1   1:37  0.00% check_reload_stat
        0 root     -16    0     0K   152K sched   0   1:32  0.00% kernel{swapper}
       49 root      -8    -     0K     8K mdwait  0   1:09  0.00% md1
       11 root     -32    -     0K   288K WAIT    0   1:00  0.00% intr{swi4: clock}
    29864 dhcpd     44    0 11456K  7916K select  1   0:32  0.00% dhcpd
    [2.1-RELEASE][root@pfsense.localdomain]/root(8): uname -a
    FreeBSD pfsense.localdomain 8.3-RELEASE-p11 FreeBSD 8.3-RELEASE-p11 #0: Wed Sep 11 19:13:36 EDT 2013
    root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.8.i386  i386
    
    

    That does seem like quite a high rate on cpu0 : timer. Here's my home box for comparison:

    [2.1-RELEASE][root@pfsense.fire.box]/root(1): vmstat -i
    interrupt                          total       rate
    irq4: uart0                          942          0
    irq14: ata0                      1692946          0
    irq16: fxp3 uhci0                7461649          1
    irq17: fxp2 fxp6                58819745          8
    irq18: em0 ath0++              438389933         60
    irq19: fxp0 fxp4+                      4          0
    irq23: ehci0                           1          0
    irq26: em1                      59344136          8
    irq27: em2                     132075810         18
    cpu0: timer                   2905412614        400
    Total                         3603197780        496
    
    

    What traffic load was it handling when those figures were taken?

    This might be a complete distraction from the cause.  ::)

    Steve

    Edit: Are you using device polling? (it's off by default)
    Has this install always done this or has something changed?



  • @stephenw10:

    Like I say I'm running the 32bit NanoBSD version which is probably different to you.

    
    [2.1-RELEASE][root@pfsense.localdomain]/root(7): top -SH
    last pid: 28780;  load averages:  0.05,  0.05,  0.01   up 16+00:13:51  22:00:38
    144 processes: 3 running, 105 sleeping, 36 waiting
    CPU:  0.0% user,  0.0% nice,  0.4% system,  0.2% interrupt, 99.4% idle
    Mem: 74M Active, 18M Inact, 113M Wired, 1228K Cache, 77M Buf, 745M Free
    Swap:
    
      PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
       10 root     171 ki31     0K    16K RUN     0 381.6H 100.00% idle{idle: cpu0}
       10 root     171 ki31     0K    16K CPU1    1 381.6H 100.00% idle{idle: cpu1}
       11 root     -32    -     0K   288K WAIT    1  89:32  0.00% intr{swi4: clock}
    32487 root      44    0  6280K  6300K select  0  56:51  0.00% ntpd
       11 root     -64    -     0K   288K WAIT    0  24:50  0.00% intr{irq23: uhci0
       14 root     -44    -     0K   168K -       1  21:11  0.00% usb{usbus0}
    52329 nobody    74  r30  3316K  1364K nanslp  1  10:33  0.00% LCDd
    65162 root      64   20  3300K  1208K nanslp  0   4:06  0.00% lcdproc
       11 root     -44    -     0K   288K WAIT    1   3:59  0.00% intr{swi1: netisr
    32131 root      76   20  3644K  1468K wait    1   2:39  0.00% sh
    26623 root      44    0  3264K  1232K select  0   2:23  0.00% apinger
      296 root      76   20  3352K  1180K kqread  1   1:37  0.00% check_reload_stat
        0 root     -16    0     0K   152K sched   0   1:32  0.00% kernel{swapper}
       49 root      -8    -     0K     8K mdwait  0   1:09  0.00% md1
       11 root     -32    -     0K   288K WAIT    0   1:00  0.00% intr{swi4: clock}
    29864 dhcpd     44    0 11456K  7916K select  1   0:32  0.00% dhcpd
    [2.1-RELEASE][root@pfsense.localdomain]/root(8): uname -a
    FreeBSD pfsense.localdomain 8.3-RELEASE-p11 FreeBSD 8.3-RELEASE-p11 #0: Wed Sep 11 19:13:36 EDT 2013
    root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.8.i386  i386
    
    

    That does seem like quite a high rate on cpu0 : timer. Here's my home box for comparison:

    [2.1-RELEASE][root@pfsense.fire.box]/root(1): vmstat -i
    interrupt                          total       rate
    irq4: uart0                          942          0
    irq14: ata0                      1692946          0
    irq16: fxp3 uhci0                7461649          1
    irq17: fxp2 fxp6                58819745          8
    irq18: em0 ath0++              438389933         60
    irq19: fxp0 fxp4+                      4          0
    irq23: ehci0                           1          0
    irq26: em1                      59344136          8
    irq27: em2                     132075810         18
    cpu0: timer                   2905412614        400
    Total                         3603197780        496
    
    

    What traffic load was it handling when those figures were taken?

    This might be a complete distraction from the cause.  ::)

    Steve

    Edit: Are you using device polling? (it's off by default)
    Has this install always done this or has something changed?

    Weird then. I hope someone who knows the difference can chime in in this thread.

    When those figures were taken, I initiated an Nvidia driver download in Firefox. I'll see if those values change if I use Internet Download Manager (multi-thread downloading) so that the two WAN interfaces will max out.

    If device polling is off by default, then no I do not do that.

    I'm not really sure if this was the behavior from before because the CPU bar in the dashboard updates every 10 seconds only so I never noticed any 100% usage until now when I closely monitored it. And no, nothing has changed from the very start, really.


  • Netgate Administrator

    Hmm, what speed are your two WANs? It could be just loading the cpu correctly, though that seems unlikely without any packages unless you have very fast WANs.

    Steve



  • @stephenw10:

    Hmm, what speed are your two WANs? It could be just loading the cpu correctly, though that seems unlikely without any packages unless you have very fast WANs.

    Steve

    Well, when I tested that my WAN's reached 13 - 15 Mbps down each and I was downloading using IDM at around 3MBps. What do you think?


  • Netgate Administrator

    2X15Mbps is well within the capabilities of that CPU even if you were running a load of packages.

    I'm pretty sure the high load check_reload_status is a symptom not a cause but I can't think what it might be.  :-\

    Anything in the syslogs?

    Steve


  • Netgate Administrator

    Are you using IPv6 at all? This may have some relevance: https://redmine.pfsense.org/issues/2555

    Steve



  • I think I know what's causing it now. It's dynDNS. I had it setup for my first WAN since the beginning when my ISP used to gave me public IP addresses. Now that my ISP uses carrier-grade NAT (my modem is given a private IP), dynDNS keeps on working on updating. I noticed that in the system logs. One weird thing I have in the syslogs is that dynDNS is alos updating my second WAN when I never set it up to do so and there are no entries under dynDNS anymore. Why is this?


  • Netgate Administrator

    This may also be a symptom of something else causing the interface settings to be reloaded. The dyndns client is triggered and checks to see whether an update is necessary.

    Steve



  • @stephenw10:

    This may also be a symptom of something else causing the interface settings to be reloaded. The dyndns client is triggered and checks to see whether an update is necessary.

    Steve

    Yeah, I noticed that it does this when the alternative gateway set for WAN 2 has a "packet latency" or "packet loss" status. But why will it update dyndns for WAN 2 when I don't have an entry for it?


Log in to reply