High CPU usage when downloading with muliwan config

kevindd992002

Basically here's my config:

Version 2.1-RELEASE (amd64)
built on Wed Sep 11 18:17:37 EDT 2013
FreeBSD 8.3-RELEASE-p11

CPU Type AMD Sempron 140 Processor

When I trigger an HTTP download (I'm using dual WAN config), my CPU usage spikes up to 100% which is weird. I tried looking at the System Activity and I notice that the command /usr/local/sbin/check_reload takes around 50% CPU usage. I tried reading in this forum first and found out that this is just a symptom of another problem. So how do I start troubleshooting this? I think my CPU is more than enough for my purpose, right?

kevindd992002

What I also don't understand is why the CPU usage reaches 100% when the idle command in the system activity never reaches any value below 50%. Common sense says that CPU usage should not increase above 50% because of the behavior of the idle command, right?

stephenw10

How are you looking at the system activity?
Try using 'top -SH' at the command line.

What other threads have you looked at that implied this was a symptom?
What packages are you running?

Steve

kevindd992002

I just look at it by going to Diagnostics -> System Activity, isn't that the same as "top -SH" ?

I've looked at this thread: http://forum.pfsense.org/index.php/topic,59996.0.html

I don't use any packages at all.

kevindd992002

BUMP!

stephenw10

I see nothing about check_reload in the thread you linked to.

Is that a typo? Do you actually mean check_reload_status?

Did you try running top -SH at the CLI? What does the system idle show for each processor?
The webgui screen doesn't show the total cpu usage by type breakdown line:

CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle

You might have a high interrupt load accounting for you high CPU usage.

Steve

kevindd992002

@stephenw10:

I see nothing about check_reload in the thread you linked to.

Is that a typo? Do you actually mean check_reload_status?

Did you try running top -SH at the CLI? What does the system idle show for each processor?
The webgui screen doesn't show the total cpu usage by type breakdown line:
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
You might have a high interrupt load accounting for you high CPU usage.

Steve

Ooops, yeah I meant check_reload_status. Sorry didn't notice that.

Yes I did and it did show approximately the same percentage for the idle command compared to that of the one shown in the webgui. I have to double check when I get back home.

If ever I have high interrupt load, what can I do with that?

kevindd992002

I tried running top -SH again and the results is EXACTLY the same to that of the webgui's. I thought the top -SH should present more details? I already forgot but I think my CPU is just single core.

stephenw10

Hmm, interesting. On my test box here the Diagnostics: System Activity: screen does not show the interrupt load. It's running Nano 32 bit but I can't see why that would make any difference.

If you do have a high interrupt load it could be linked to the problem.
Try running vmstat -i from the console to see more interrupt info.

Steve

kevindd992002

Well, I don't see the interrupt load on both GUI and the command line. We're using the same version of pfsense, right?

This is what I have with vmstat -i:

$ vmstat -i
interrupt total rate
irq14: ata0 271172 1
irq18: fxp2 35396773 184
irq19: fxp3 23563272 123
cpu0: timer 382836731 1999
irq256: em0 56883303 297
Total 498951251 2606

stephenw10

Like I say I'm running the 32bit NanoBSD version which is probably different to you.


[2.1-RELEASE][root@pfsense.localdomain]/root(7): top -SH
last pid: 28780;  load averages:  0.05,  0.05,  0.01   up 16+00:13:51  22:00:38
144 processes: 3 running, 105 sleeping, 36 waiting
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.2% interrupt, 99.4% idle
Mem: 74M Active, 18M Inact, 113M Wired, 1228K Cache, 77M Buf, 745M Free
Swap:

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10 root     171 ki31     0K    16K RUN     0 381.6H 100.00% idle{idle: cpu0}
   10 root     171 ki31     0K    16K CPU1    1 381.6H 100.00% idle{idle: cpu1}
   11 root     -32    -     0K   288K WAIT    1  89:32  0.00% intr{swi4: clock}
32487 root      44    0  6280K  6300K select  0  56:51  0.00% ntpd
   11 root     -64    -     0K   288K WAIT    0  24:50  0.00% intr{irq23: uhci0
   14 root     -44    -     0K   168K -       1  21:11  0.00% usb{usbus0}
52329 nobody    74  r30  3316K  1364K nanslp  1  10:33  0.00% LCDd
65162 root      64   20  3300K  1208K nanslp  0   4:06  0.00% lcdproc
   11 root     -44    -     0K   288K WAIT    1   3:59  0.00% intr{swi1: netisr
32131 root      76   20  3644K  1468K wait    1   2:39  0.00% sh
26623 root      44    0  3264K  1232K select  0   2:23  0.00% apinger
  296 root      76   20  3352K  1180K kqread  1   1:37  0.00% check_reload_stat
    0 root     -16    0     0K   152K sched   0   1:32  0.00% kernel{swapper}
   49 root      -8    -     0K     8K mdwait  0   1:09  0.00% md1
   11 root     -32    -     0K   288K WAIT    0   1:00  0.00% intr{swi4: clock}
29864 dhcpd     44    0 11456K  7916K select  1   0:32  0.00% dhcpd
[2.1-RELEASE][root@pfsense.localdomain]/root(8): uname -a
FreeBSD pfsense.localdomain 8.3-RELEASE-p11 FreeBSD 8.3-RELEASE-p11 #0: Wed Sep 11 19:13:36 EDT 2013
root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.8.i386  i386

That does seem like quite a high rate on cpu0 : timer. Here's my home box for comparison:

[2.1-RELEASE][root@pfsense.fire.box]/root(1): vmstat -i
interrupt                          total       rate
irq4: uart0                          942          0
irq14: ata0                      1692946          0
irq16: fxp3 uhci0                7461649          1
irq17: fxp2 fxp6                58819745          8
irq18: em0 ath0++              438389933         60
irq19: fxp0 fxp4+                      4          0
irq23: ehci0                           1          0
irq26: em1                      59344136          8
irq27: em2                     132075810         18
cpu0: timer                   2905412614        400
Total                         3603197780        496

What traffic load was it handling when those figures were taken?

This might be a complete distraction from the cause. ::)

Steve

Edit: Are you using device polling? (it's off by default)
Has this install always done this or has something changed?

kevindd992002

@stephenw10:

Like I say I'm running the 32bit NanoBSD version which is probably different to you.


[2.1-RELEASE][root@pfsense.localdomain]/root(7): top -SH
last pid: 28780;  load averages:  0.05,  0.05,  0.01   up 16+00:13:51  22:00:38
144 processes: 3 running, 105 sleeping, 36 waiting
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.2% interrupt, 99.4% idle
Mem: 74M Active, 18M Inact, 113M Wired, 1228K Cache, 77M Buf, 745M Free
Swap:

  PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   10 root     171 ki31     0K    16K RUN     0 381.6H 100.00% idle{idle: cpu0}
   10 root     171 ki31     0K    16K CPU1    1 381.6H 100.00% idle{idle: cpu1}
   11 root     -32    -     0K   288K WAIT    1  89:32  0.00% intr{swi4: clock}
32487 root      44    0  6280K  6300K select  0  56:51  0.00% ntpd
   11 root     -64    -     0K   288K WAIT    0  24:50  0.00% intr{irq23: uhci0
   14 root     -44    -     0K   168K -       1  21:11  0.00% usb{usbus0}
52329 nobody    74  r30  3316K  1364K nanslp  1  10:33  0.00% LCDd
65162 root      64   20  3300K  1208K nanslp  0   4:06  0.00% lcdproc
   11 root     -44    -     0K   288K WAIT    1   3:59  0.00% intr{swi1: netisr
32131 root      76   20  3644K  1468K wait    1   2:39  0.00% sh
26623 root      44    0  3264K  1232K select  0   2:23  0.00% apinger
  296 root      76   20  3352K  1180K kqread  1   1:37  0.00% check_reload_stat
    0 root     -16    0     0K   152K sched   0   1:32  0.00% kernel{swapper}
   49 root      -8    -     0K     8K mdwait  0   1:09  0.00% md1
   11 root     -32    -     0K   288K WAIT    0   1:00  0.00% intr{swi4: clock}
29864 dhcpd     44    0 11456K  7916K select  1   0:32  0.00% dhcpd
[2.1-RELEASE][root@pfsense.localdomain]/root(8): uname -a
FreeBSD pfsense.localdomain 8.3-RELEASE-p11 FreeBSD 8.3-RELEASE-p11 #0: Wed Sep 11 19:13:36 EDT 2013
root@snapshots-8_3-i386.builders.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_wrap.8.i386  i386

That does seem like quite a high rate on cpu0 : timer. Here's my home box for comparison:

[2.1-RELEASE][root@pfsense.fire.box]/root(1): vmstat -i
interrupt                          total       rate
irq4: uart0                          942          0
irq14: ata0                      1692946          0
irq16: fxp3 uhci0                7461649          1
irq17: fxp2 fxp6                58819745          8
irq18: em0 ath0++              438389933         60
irq19: fxp0 fxp4+                      4          0
irq23: ehci0                           1          0
irq26: em1                      59344136          8
irq27: em2                     132075810         18
cpu0: timer                   2905412614        400
Total                         3603197780        496

What traffic load was it handling when those figures were taken?

This might be a complete distraction from the cause. ::)

Steve

Edit: Are you using device polling? (it's off by default)
Has this install always done this or has something changed?

Weird then. I hope someone who knows the difference can chime in in this thread.

When those figures were taken, I initiated an Nvidia driver download in Firefox. I'll see if those values change if I use Internet Download Manager (multi-thread downloading) so that the two WAN interfaces will max out.

If device polling is off by default, then no I do not do that.

I'm not really sure if this was the behavior from before because the CPU bar in the dashboard updates every 10 seconds only so I never noticed any 100% usage until now when I closely monitored it. And no, nothing has changed from the very start, really.

stephenw10

Hmm, what speed are your two WANs? It could be just loading the cpu correctly, though that seems unlikely without any packages unless you have very fast WANs.

Steve

kevindd992002

@stephenw10:

Hmm, what speed are your two WANs? It could be just loading the cpu correctly, though that seems unlikely without any packages unless you have very fast WANs.

Steve

Well, when I tested that my WAN's reached 13 - 15 Mbps down each and I was downloading using IDM at around 3MBps. What do you think?

stephenw10

2X15Mbps is well within the capabilities of that CPU even if you were running a load of packages.

I'm pretty sure the high load check_reload_status is a symptom not a cause but I can't think what it might be. :-\

Anything in the syslogs?

Steve

stephenw10

Are you using IPv6 at all? This may have some relevance: https://redmine.pfsense.org/issues/2555

Steve

kevindd992002

I think I know what's causing it now. It's dynDNS. I had it setup for my first WAN since the beginning when my ISP used to gave me public IP addresses. Now that my ISP uses carrier-grade NAT (my modem is given a private IP), dynDNS keeps on working on updating. I noticed that in the system logs. One weird thing I have in the syslogs is that dynDNS is alos updating my second WAN when I never set it up to do so and there are no entries under dynDNS anymore. Why is this?

stephenw10

This may also be a symptom of something else causing the interface settings to be reloaded. The dyndns client is triggered and checks to see whether an update is necessary.

Steve

kevindd992002

@stephenw10:

This may also be a symptom of something else causing the interface settings to be reloaded. The dyndns client is triggered and checks to see whether an update is necessary.

Steve

Yeah, I noticed that it does this when the alternative gateway set for WAN 2 has a "packet latency" or "packet loss" status. But why will it update dyndns for WAN 2 when I don't have an entry for it?