VMWare Pentest lab: Extremely high CPU on host

Fmstrat

Hi all,

I'm having an odd issue with a PFsense 2.0 install in a pentest lab. Running VMWare Workstation 7 on an i7 920 with 16GB of RAM, and the PFSense instance has plenty of ram and access to two of the processors.

The VM session has two network interfaces, one bridged to the local network through the on-board 1000Mb NIC, the other bridged to what we'll call the "external" network through a PCI 100Mb NIC.

There is another VM instance running Backtrack that is bridged to the local network through the on-board 1000Mb NIC as well. When running nmap on the Backtrack instance to another machine on the "external" network (which is really just a remote lab machine), the PFSense instance, which is routing the traffic, spikes the host CPU up to 125% CPU usage (1.25 processors). The backtrack instance is barely using any CPU at this point. CPU usage INSIDE the PFSense VM is low, perhaps 10%.

I've also noticed high utilization, like 80% CPU, just from running a number of concurrent downloads from any hardware or virtual machine I route through the PFSense VM.

Any ideas?

Thanks,
B.

tommyboy180

My first reaction would to run TOP and see what it actually using that CPU. Did you select the multiprocessing kernal at install?

Fmstrat

@tommyboy180:

My first reaction would to run TOP and see what it actually using that CPU. Did you select the multiprocessing kernal at install?

TOP on the client shows no CPU use (2%). You bring up a good point, I used the precreated VMWare image, and I'm not sure it's default kernel supports multiprocessing.

Output of uname -a is: FreeBSD pfsense.coronium 8.1-RELEASE-p4 FreeBSD 8.1-RELEASE-p4 #0: Tue Jun 21 16:48:23 EDT 2011 sullrich@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8 i386

If this isn't what I should have, is it possible to change the kernel post install?

tommyboy180

It looks like you do have the multiprocessor kernal installed. Is your CPU still spiking for a long period of time?

Fmstrat

@tommyboy180:

It looks like you do have the multiprocessor kernal installed. Is your CPU still spiking for a long period of time?

Yes, it's very repeatable. All I need to do is fire up a few downloads or uploads, or run a portscan or anything that makes a lot of connections and CPU on the host OS shoots up while the guest OS (pfsense) CPU appears low.

Thanks.

NetJunkie

I came here looking for a solution to the same problem. I'm running pfSense 2.0 under VMware vSphere 5.0. At idle it's fine, but under load the CPU use shown by vCenter spikes way up. Inside the guest (pfSense) the load is almost nothing…maybe 5% on CPU. Load average is well under 1. In vCenter it'll be 80% - 90% of a single vCPU, two vCPUs cut that in half...four by fourth. I've switched network cards from e1000 to VMXNET to VMXNET2 with the same results.

RootWyrm

Confirmed with NetJunkie via Twitter; I'm also seeing unusually high CPU utilization even at low loads as well with 2.0-RELEASE. Averaging >10% in esxtop at <100KB/s combined with systat -vmstat disagreeing vehemently: <3% total CPU utilization.
I thought it was pf itself not reporting or under-reporting CPU, but it's not. I'm on ESXi 4.1U1, 2 vCPU, 1GB, with decently large reservation. I'm not seeing exceptionally high INTR loading either; it's more or less exactly where I'd expect it with em(4)'s. I switched to POLLING, gave it a swift reboot to the rear, and relative CPU utilization is MUCH worse than expected - 50% system reported by systat, and ESXi reporting one core at 80%, one at 75%, one at 70% and one at 20% - constant on both. Never below 50%. This is at <20KB/s of traffic, as well.

Something is definitely broken here.

EDIT: How weirdly broken? Try this interesting setup: two em0 interfaces, enable POLLING, reboot. CPU utilization is insane, no? Now, disable POLLING, apply but do not reboot. Suddenly, the CPU utilization appears to be much, much better. The difference here was narrowed to pfSense reporting <1% and ESXi reporting <4%.

tester_02

Running 2.0 release (64 bit) on vmware server. No cpu load issue.
Squid/squidguard/snort installed and 2 nic's.

sullrich

From a shell post the output of:

top -SH

billm

I'm not seeing this on my ESXi 4.1.0 install with pfSense 2.1-development (upgraded right after v6 branch was merged in, so this is 2.0 w/ v6) VM is configured as FreeBSD 64bit, running AMD64 release of pfSense. Handed off a single CPU to pfSense but running an SMP kernel. Ran 8mbit of small frames through the firewall and only saw host CPU usage a hair over what pfSense reported (25% in guest 30% of one core in host).

Are you running the open-vm-tools package? Also, paste the output of

sysctl kern.timecounter.choice kern.timecounter.hardware kern.hz

Thanks

–Bill

RootWyrm

@sullrich:

From a shell post the output of:

top -SH

last pid: 11421; load averages: 0.07, 0.03, 0.01 up 0+22:57:12 15:50:45
96 processes: 3 running, 77 sleeping, 16 waiting
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 40M Active, 73M Inact, 131M Wired, 88K Cache, 110M Buf, 740M Free
Swap: 4096M Total, 4096M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 16K CPU0 0 22.7H 100.00% {idle: cpu0}
11 root 171 ki31 0K 16K RUN 1 22.6H 100.00% {idle: cpu1}
0 root 76 0 0K 64K sched 1 1045.8 0.00% {swapper}
21 root 76 ki-6 0K 8K pollid 1 14:27 0.00% idlepoll
12 root -44 - 0K 128K WAIT 0 8:07 0.00% {swi1: netisr 0}
12 root -32 - 0K 128K WAIT 0 3:50 0.00% {swi4: clock}
12 root -32 - 0K 128K WAIT 1 0:37 0.00% {swi4: clock}
31198 root 64 20 4524K 3032K bpf 1 0:30 0.00% arpwatch
14 root -16 - 0K 8K - 1 0:20 0.00% yarrow
22066 root 44 0 4948K 2520K select 1 0:18 0.00% syslogd
32469 nobody 64 20 3572K 2344K select 0 0:16 0.00% darkstat
13799 root 64 20 3316K 1348K select 1 0:15 0.00% apinger
21140 root 76 20 3656K 1508K wait 0 0:13 0.00% sh
53332 root 44 0 26140K 5012K select 1 0:12 0.00% vmtoolsd
23900 root 44 0 3316K 924K piperd 0 0:09 0.00% logger
23696 root 44 0 6936K 3708K bpf 1 0:06 0.00% tcpdump
27742 root 44 0 3352K 1352K select 1 0:05 0.00% miniupnpd

Looks pretty normal, right? Right. So here's the interesting part.

2 users Load 0.01 0.02 0.00 Oct 14 15:52

Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 57064 20596 298256 56456 757432 count
All 83272 25284 3511012 76448 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt cow 800 total
43 496 4 256 3133 zfod atkbd0 1
ozfod fdc0 irq6
0.1%Sys 0.2%Intr 0.0%User 0.0%Nice 99.8%Idle %ozfod ata1 irq15
| | | | | | | | | | | daefr mpt0 irq17
prcfr 400 cpu0: time
28 dtbuf totfr 400 cpu1: time
Namei Name-cache Dir-cache 69211 desvn react
Calls hits % hits % 835 numvn pdwak
7 7 100 65 frevn pdpgs
intrn
Disks da0 md0 pass0 134544 wire
KB/t 16.00 0.00 0.00 41080 act
tps 0 0 0 75208 inact
MB/s 0.00 0.00 0.00 92 cache
%busy 0 0 0 757340 free

Notice something missing? Yup. This is with polling disabled by the checkbox.

em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>em1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>Notice a problem here? Yes. POLLING is still enabled. Checkbox in pfSense is UNCHECKED, but POLLING is on. Here's what happens when you check that POLLING box again.

em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>em1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>last pid: 29327; load averages: 0.87, 0.31, 0.12 up 0+23:07:15 16:00:48
96 processes: 4 running, 76 sleeping, 16 waiting
CPU: 0.0% user, 0.0% nice, 49.7% system, 0.0% interrupt, 50.3% idle
Mem: 40M Active, 76M Inact, 130M Wired, 92K Cache, 110M Buf, 738M Free
Swap: 4096M Total, 4096M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
21 root 171 ki-6 0K 8K CPU1 1 16:15 98.97% idlepoll
11 root 171 ki31 0K 16K RUN 0 22.8H 96.97% {idle: cpu0}
11 root 171 ki31 0K 16K RUN 1 22.7H 9.96% {idle: cpu1}
0 root 76 0 0K 64K sched 1 1045.8 0.00% {swapper}
12 root -44 - 0K 128K WAIT 0 8:09 0.00% {swi1: netisr 0}
12 root -32 - 0K 128K WAIT 0 3:51 0.00% {swi4: clock}
12 root -32 - 0K 128K WAIT 1 0:37 0.00% {swi4: clock}
31198 root 64 20 4524K 3032K bpf 0 0:31 0.00% arpwatch
14 root -16 - 0K 8K - 0 0:20 0.00% yarrow
22066 root 44 0 4948K 2520K select 0 0:18 0.00% syslogd
32469 nobody 64 20 3572K 2344K select 0 0:16 0.00% darkstat
13799 root 64 20 3316K 1348K select 0 0:15 0.00% apinger
21140 root 76 20 3656K 1508K wait 1 0:13 0.00% sh
53332 root 44 0 26140K 5012K select 0 0:12 0.00% vmtoolsd
23900 root 44 0 3316K 924K piperd 0 0:09 0.00% logger
23696 root 44 0 6936K 3708K bpf 0 0:06 0.00% tcpdump
27742 root 44 0 3352K 1352K select 0 0:05 0.00% miniupnpd

2 users Load 0.96 0.45 0.18 Oct 14 16:01

Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 57188 20676 298332 56460 755756 count
All 83408 25364 3511088 76452 pages
Proc: Interrupts
r p d s w Csw Trp Sys Int Sof Flt cow 805 total
1 42 3M 7 259 5 3107 3 3 zfod atkbd0 1
ozfod fdc0 irq6
50.0%Sys 0.0%Intr 0.0%User 0.0%Nice 50.0%Idle %ozfod ata1 irq15
| | | | | | | | | | | daefr 5 mpt0 irq17
========================= prcfr 400 cpu0: time
8 dtbuf 2 totfr 400 cpu1: time
Namei Name-cache Dir-cache 69211 desvn react
Calls hits % hits % 890 numvn pdwak
11 11 100 65 frevn pdpgs
intrn
Disks da0 md0 pass0 133376 wire
KB/t 17.19 0.00 0.00 41368 act
tps 5 0 0 77764 inact
MB/s 0.09 0.00 0.00 92 cache
%busy 1 0 0 755664 free

8:01:24pm up 78 days 3:19, 200 worlds; CPU load average: 0.29, 0.16, 0.08
PCPU USED(%): 3.5 3.0 22 14 69 1.2 2.5 4.1 AVG: 15
PCPU UTIL(%): 3.6 3.2 22 12 66 1.2 2.4 2.7 AVG: 14
CORE UTIL(%): 6.7 34 67 5.0 AVG: 28

ID GID NAME NWLD %USED %RUN %SYS %WAIT %RDY
1 1 idle 8 273.89 800.00 0.00 0.00 800.00
1537396 1537396 earthmother - p 5 102.58 97.93 0.04 380.97 0.07

This is with OpenVM Tools 313025. Timecounter looks like this:
kern.timecounter.choice: TSC(-100) ACPI-safe(850) i8254(0) dummy(-1000000)
kern.timecounter.hardware: ACPI-safe
kern.hz: 100

Pretty much exactly as expected; all other FreeBSD guests are exactly the same. ACPI-safe over TSC, no stepwarnings, and frequency 3579545 - no exceptions on any of them. (They're all 8.1-RELEASE currently.) This is on 32-bit, too, I forgot to mention.</rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,promisc,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,promisc,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,simplex,multicast>

timotl

I have been seeing this under ESXi 5 also.
I installed the vendor supplied tools and am using a single trunked E1000.

After trying all of the nic settings, I happened to disable powerd and the CPU usage went down by more than half.
Can anyone else confirm this?

-timotl

loftyDan

I too have this issue, using ESXi 5 (and previously on 4.1). Changing the powerd settings did not resolve the issue for me. I've tried 2.0 i386, my primary config, 2.1 i386 and 2.1 AMD64. For both dev builds I tried with my config backup, and a clean install, and the results were always the same. pfSense reports 16-20% CPU load, while ESXi reports a 62% load (on a Xeon X3440 @ 2.53GHz). This is with a download speed of about 3.6 MB/sec (29 Mb/sec). In every case Open-VM-Tools has been installed and I've been using the E1000 NIC. Speeds directly connected to the modem yield 31 Mb/sec.

If there is anything else I can test, or any more information I can provide, please let me know. I'd love for this problem to get resolved.

Veni

@loftyDan:

I too have this issue, using ESXi 5 (and previously on 4.1). Changing the powerd settings did not resolve the issue for me. I've tried 2.0 i386, my primary config, 2.1 i386 and 2.1 AMD64. For both dev builds I tried with my config backup, and a clean install, and the results were always the same. pfSense reports 16-20% CPU load, while ESXi reports a 62% load[…]

I'm seeing the same thing but on a single x5650 @ 2.67 GHz.

If i try to limit the CPU usage from the vSphere client then i don't get the performance i'm after(aprox 150 Mbps). Instead i get around 20-22 Mbps.
So it sounds as if the usage is real somehow, otherwise why whould i see performance issues when giving pfSense a maximum of 1-1.5 GHz?

kkrauth

Just to chime in on this thread, as I'm seeing the same issues. I'm running the following release:
[2.0.1-RELEASE][root@pfSense.localdomain]/root(7): uname -a
FreeBSD pfSense.localdomain 8.1-RELEASE-p6 FreeBSD 8.1-RELEASE-p6 #0: Mon Dec 12 18:15:35 EST 2011 root@FreeBSD_8.0_pfSense_2.0-AMD64.snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8 amd64

within ESXi 5. I installed open-vm-tools and vmware's provided drivers for VMXNET3 adapter. Both internal/extenal NICs are running with the VMXNET3 driver. The problem was exactly the same using E1000 drivers.

The attached screenshot shows what is happening when the network is pretty much idle. During load, this spikes up even higher, even though pFsense top reports almost no usage whatsoever. I tried both with powerd turned on and off.

pfsense.png_thumb

marsboer

Same issue on fresh pfSense 2.0.1 install running on KVM (Proxmox VE) with smp kernel. With only a couple of mbits of traffic the CPU usage increases massively on the physical host (above 50%) running on single virtual CPU and 512 MB RAM.

pfSense does not support virtio (the paravirtualized devices for KVM) so I thought using emulated NICs was the main reason for the bad CPU performance even under light load, but now I am starting to think that this is may be a more generic problem with pfSense in virtualized setups in general.

clayton_ross

i too am having the same problem. pfsence 2.0 64, esxi 5.0 2 cores 2 nics vmtools

iFloris

As most others on this thread, I too have run into this problem.
Something that is not clear to me is if using e1000 is the source of such increased cpu usage on esx.
And if that is the case, does switching to another adapter, such as flexible or vmxnet 2/3 help in reducing load for any of you?

kkrauth

@iFloris:

As most others on this thread, I too have run into this problem.
Something that is not clear to me is if using e1000 is the source of such increased cpu usage on esx.
And if that is the case, does switching to another adapter, such as flexible or vmxnet 2/3 help in reducing load for any of you?

I tried all three virtual adapters and the behaviour was the same.

Mattofsweden

I'm seeing the same issues here on a DELL PowerEdge R310 Quad Core Xeon:
Using ESXi 4.1 and pfSense 2.0, 2.0.1, old-2.1-dev in i386/amd64 flavors
Using ESXi 5.0 and pfSense 2.0.1 and 2.1-dev in i386/amd64 flavors from feb/march/april.

Same results on other host hardware also (Two DELL Servers with virtualized environment at home for testing purposes.)

Have not tried the VMXNET due to others not seeing any performance gain, only been using virtualized E1000 so far.

What I'm using a lot is VLANs, which might be a contributing culprit for some of us? Assigning VLANs directly in switch configuration in vSphere, or natively in pfSense has had "largely" the same results.

I absolutely love pfSense, now that I've got a hang of it, and have deployed quite a few in different scenarios past few months. But, not to sound negative here, there gotta be something we can do about these high loads in virtualized environments. I had to switch over to bare-metal, on slightly aged HW, on our lab network which is a bit unsatisfying. I loose a bit of my redundancy (if one VM or host fails, just fire up the copy or using HA Sync).

I suppose it's underlying FreeBSD issue?
I don't really know how to set up something similar in any of the *BSD flavors, and honestly can't find the time to learn currently, but surely one of you guys could test a simple routing setup using FreeBSD/OpenBSD/NetBSD and see if there's the same performance issue? (Maybe with/without VLAN incl. trunking/non-native.)