Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    VMWare Pentest lab: Extremely high CPU on host

    Scheduled Pinned Locked Moved Virtualization
    85 Posts 29 Posters 70.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      Fmstrat
      last edited by

      Hi all,

      I'm having an odd issue with a PFsense 2.0 install in a pentest lab. Running VMWare Workstation 7 on an i7 920 with 16GB of RAM, and the PFSense instance has plenty of ram and access to two of the processors.

      The VM session has two network interfaces, one bridged to the local network through the on-board 1000Mb NIC, the other bridged to what we'll call the "external" network through a PCI 100Mb NIC.

      There is another VM instance running Backtrack that is bridged to the local network through the on-board 1000Mb NIC as well. When running nmap on the Backtrack instance to another machine on the "external" network (which is really just a remote lab machine), the PFSense instance, which is routing the traffic, spikes the host CPU up to 125% CPU usage (1.25 processors). The backtrack instance is barely using any CPU at this point. CPU usage INSIDE the PFSense VM is low, perhaps 10%.

      I've also noticed high utilization, like 80% CPU, just from running a number of concurrent downloads from any hardware or virtual machine I route through the PFSense VM.

      Any ideas?

      Thanks,
      B.

      1 Reply Last reply Reply Quote 0
      • T
        tommyboy180
        last edited by

        My first reaction would to run TOP and see what it actually using that CPU. Did you select the multiprocessing kernal at install?

        -Tom Schaefer
        SuperMicro 1U 2X Intel pro/1000 Dual Core Intel 2.2 Ghz - 2 Gig RAM

        Please support pfBlocker | File Browser | Strikeback

        1 Reply Last reply Reply Quote 0
        • F
          Fmstrat
          last edited by

          @tommyboy180:

          My first reaction would to run TOP and see what it actually using that CPU. Did you select the multiprocessing kernal at install?

          TOP on the client shows no CPU use (2%). You bring up a good point, I used the precreated VMWare image, and I'm not sure it's default kernel supports multiprocessing.

          Output of uname -a is: FreeBSD pfsense.coronium 8.1-RELEASE-p4 FreeBSD 8.1-RELEASE-p4 #0: Tue Jun 21 16:48:23 EDT 2011    sullrich@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj.pfSense/usr/pfSensesrc/src/sys/pfSense_SMP.8  i386

          If this isn't what I should have, is it possible to change the kernel post install?

          1 Reply Last reply Reply Quote 0
          • T
            tommyboy180
            last edited by

            It looks like you do have the multiprocessor kernal installed. Is your CPU still spiking for a long period of time?

            -Tom Schaefer
            SuperMicro 1U 2X Intel pro/1000 Dual Core Intel 2.2 Ghz - 2 Gig RAM

            Please support pfBlocker | File Browser | Strikeback

            1 Reply Last reply Reply Quote 0
            • F
              Fmstrat
              last edited by

              @tommyboy180:

              It looks like you do have the multiprocessor kernal installed. Is your CPU still spiking for a long period of time?

              Yes, it's very repeatable. All I need to do is fire up a few downloads or uploads, or run a portscan or anything that makes a lot of connections and CPU on the host OS shoots up while the guest OS (pfsense) CPU appears low.

              Thanks.

              1 Reply Last reply Reply Quote 0
              • N
                NetJunkie
                last edited by

                I came here looking for a solution to the same problem.  I'm running pfSense 2.0 under VMware vSphere 5.0.  At idle it's fine, but under load the CPU use shown by vCenter spikes way up.  Inside the guest (pfSense) the load is almost nothing…maybe 5% on CPU.  Load average is well under 1.  In vCenter it'll be 80% - 90% of a single vCPU, two vCPUs cut that in half...four by fourth.  I've switched network cards from e1000 to VMXNET to VMXNET2 with the same results.

                1 Reply Last reply Reply Quote 0
                • R
                  RootWyrm
                  last edited by

                  Confirmed with NetJunkie via Twitter; I'm also seeing unusually high CPU utilization even at low loads as well with 2.0-RELEASE. Averaging >10% in esxtop at <100KB/s combined with systat -vmstat disagreeing vehemently: <3% total CPU utilization.
                  I thought it was pf itself not reporting or under-reporting CPU, but it's not. I'm on ESXi 4.1U1, 2 vCPU, 1GB, with decently large reservation. I'm not seeing exceptionally high INTR loading either; it's more or less exactly where I'd expect it with em(4)'s. I switched to POLLING, gave it a swift reboot to the rear, and relative CPU utilization is MUCH worse than expected - 50% system reported by systat, and ESXi reporting one core at 80%, one at 75%, one at 70% and one at 20% - constant on both. Never below 50%. This is at <20KB/s of traffic, as well.

                  Something is definitely broken here.

                  EDIT: How weirdly broken? Try this interesting setup: two em0 interfaces, enable POLLING, reboot. CPU utilization is insane, no? Now, disable POLLING, apply but do not reboot. Suddenly, the CPU utilization appears to be much, much better. The difference here was narrowed to pfSense reporting <1% and ESXi reporting <4%.

                  1 Reply Last reply Reply Quote 0
                  • T
                    tester_02
                    last edited by

                    Running 2.0 release (64 bit) on vmware server.  No cpu load issue.
                    Squid/squidguard/snort installed and 2 nic's.

                    1 Reply Last reply Reply Quote 0
                    • S
                      sullrich
                      last edited by

                      From a shell post the output of:

                      top -SH

                      1 Reply Last reply Reply Quote 0
                      • B
                        billm
                        last edited by

                        I'm not seeing this on my ESXi 4.1.0 install with pfSense 2.1-development (upgraded right after v6 branch was merged in, so this is 2.0 w/ v6)  VM is configured as FreeBSD 64bit, running AMD64 release of pfSense.  Handed off a single CPU to pfSense but running an SMP kernel.  Ran 8mbit of small frames through the firewall and only saw host CPU usage a hair over what pfSense reported (25% in guest 30% of one core in host).

                        Are you running the open-vm-tools package?  Also, paste the output of

                        sysctl kern.timecounter.choice kern.timecounter.hardware kern.hz

                        Thanks

                        –Bill

                        pfSense core developer
                        blog - http://www.ucsecurity.com/
                        twitter - billmarquette

                        1 Reply Last reply Reply Quote 0
                        • R
                          RootWyrm
                          last edited by

                          @sullrich:

                          From a shell post the output of:

                          top -SH

                          last pid: 11421;  load averages:  0.07,  0.03,  0.01    up 0+22:57:12  15:50:45
                          96 processes:  3 running, 77 sleeping, 16 waiting
                          CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
                          Mem: 40M Active, 73M Inact, 131M Wired, 88K Cache, 110M Buf, 740M Free
                          Swap: 4096M Total, 4096M Free

                          PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
                            11 root    171 ki31    0K    16K CPU0    0  22.7H 100.00% {idle: cpu0}
                            11 root    171 ki31    0K    16K RUN    1  22.6H 100.00% {idle: cpu1}
                              0 root      76    0    0K    64K sched  1 1045.8  0.00% {swapper}
                            21 root      76 ki-6    0K    8K pollid  1  14:27  0.00% idlepoll
                            12 root    -44    -    0K  128K WAIT    0  8:07  0.00% {swi1: netisr 0}
                            12 root    -32    -    0K  128K WAIT    0  3:50  0.00% {swi4: clock}
                            12 root    -32    -    0K  128K WAIT    1  0:37  0.00% {swi4: clock}
                          31198 root      64  20  4524K  3032K bpf    1  0:30  0.00% arpwatch
                            14 root    -16    -    0K    8K -      1  0:20  0.00% yarrow
                          22066 root      44    0  4948K  2520K select  1  0:18  0.00% syslogd
                          32469 nobody    64  20  3572K  2344K select  0  0:16  0.00% darkstat
                          13799 root      64  20  3316K  1348K select  1  0:15  0.00% apinger
                          21140 root      76  20  3656K  1508K wait    0  0:13  0.00% sh
                          53332 root      44    0 26140K  5012K select  1  0:12  0.00% vmtoolsd
                          23900 root      44    0  3316K  924K piperd  0  0:09  0.00% logger
                          23696 root      44    0  6936K  3708K bpf    1  0:06  0.00% tcpdump
                          27742 root      44    0  3352K  1352K select  1  0:05  0.00% miniupnpd

                          Looks pretty normal, right? Right. So here's the interesting part.

                          2 users    Load  0.01  0.02  0.00                  Oct 14 15:52

                          Mem:KB    REAL            VIRTUAL                      VN PAGER  SWAP PAGER
                                  Tot  Share      Tot    Share    Free          in  out    in  out
                          Act  57064  20596  298256    56456  757432  count
                          All  83272  25284  3511012    76448          pages
                          Proc:                                                            Interrupts
                            r  p  d  s  w  Csw  Trp  Sys  Int  Sof  Flt        cow    800 total
                                      43      496    4  256      3133            zfod        atkbd0 1
                                                                                    ozfod      fdc0 irq6
                          0.1%Sys  0.2%Intr  0.0%User  0.0%Nice 99.8%Idle        %ozfod      ata1 irq15
                          |    |    |    |    |    |    |    |    |    |    |      daefr      mpt0 irq17
                                                                                    prcfr  400 cpu0: time
                                                                  28 dtbuf          totfr  400 cpu1: time
                          Namei    Name-cache  Dir-cache    69211 desvn          react
                            Calls    hits  %    hits  %      835 numvn          pdwak
                                7      7 100                    65 frevn          pdpgs
                                                                                    intrn
                          Disks  da0  md0 pass0                            134544 wire
                          KB/t  16.00  0.00  0.00                            41080 act
                          tps      0    0    0                            75208 inact
                          MB/s  0.00  0.00  0.00                                92 cache
                          %busy    0    0    0                            757340 free

                          Notice something missing? Yup. This is with polling disabled by the checkbox.

                          em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
                                  options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>em1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
                                  options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>Notice a problem here? Yes. POLLING is still enabled. Checkbox in pfSense is UNCHECKED, but POLLING is on. Here's what happens when you check that POLLING box again.

                          em0: flags=8843 <up,broadcast,running,simplex,multicast>metric 0 mtu 1500
                                  options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>em1: flags=8943 <up,broadcast,running,promisc,simplex,multicast>metric 0 mtu 1500
                                  options=db <rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum>last pid: 29327;  load averages:  0.87,  0.31,  0.12    up 0+23:07:15  16:00:48
                          96 processes:  4 running, 76 sleeping, 16 waiting
                          CPU:  0.0% user,  0.0% nice, 49.7% system,  0.0% interrupt, 50.3% idle
                          Mem: 40M Active, 76M Inact, 130M Wired, 92K Cache, 110M Buf, 738M Free
                          Swap: 4096M Total, 4096M Free

                          PID USERNAME PRI NICE  SIZE    RES STATE  C  TIME  WCPU COMMAND
                            21 root    171 ki-6    0K    8K CPU1    1  16:15 98.97% idlepoll
                            11 root    171 ki31    0K    16K RUN    0  22.8H 96.97% {idle: cpu0}
                            11 root    171 ki31    0K    16K RUN    1  22.7H  9.96% {idle: cpu1}
                              0 root      76    0    0K    64K sched  1 1045.8  0.00% {swapper}
                            12 root    -44    -    0K  128K WAIT    0  8:09  0.00% {swi1: netisr 0}
                            12 root    -32    -    0K  128K WAIT    0  3:51  0.00% {swi4: clock}
                            12 root    -32    -    0K  128K WAIT    1  0:37  0.00% {swi4: clock}
                          31198 root      64  20  4524K  3032K bpf    0  0:31  0.00% arpwatch
                            14 root    -16    -    0K    8K -      0  0:20  0.00% yarrow
                          22066 root      44    0  4948K  2520K select  0  0:18  0.00% syslogd
                          32469 nobody    64  20  3572K  2344K select  0  0:16  0.00% darkstat
                          13799 root      64  20  3316K  1348K select  0  0:15  0.00% apinger
                          21140 root      76  20  3656K  1508K wait    1  0:13  0.00% sh
                          53332 root      44    0 26140K  5012K select  0  0:12  0.00% vmtoolsd
                          23900 root      44    0  3316K  924K piperd  0  0:09  0.00% logger
                          23696 root      44    0  6936K  3708K bpf    0  0:06  0.00% tcpdump
                          27742 root      44    0  3352K  1352K select  0  0:05  0.00% miniupnpd

                          2 users    Load  0.96  0.45  0.18                  Oct 14 16:01

                          Mem:KB    REAL            VIRTUAL                      VN PAGER  SWAP PAGER
                                  Tot  Share      Tot    Share    Free          in  out    in  out
                          Act  57188  20676  298332    56460  755756  count
                          All  83408  25364  3511088    76452          pages
                          Proc:                                                            Interrupts
                            r  p  d  s  w  Csw  Trp  Sys  Int  Sof  Flt        cow    805 total
                            1          42        3M    7  259    5 3107    3      3 zfod        atkbd0 1
                                                                                    ozfod      fdc0 irq6
                          50.0%Sys  0.0%Intr  0.0%User  0.0%Nice 50.0%Idle        %ozfod      ata1 irq15
                          |    |    |    |    |    |    |    |    |    |    |      daefr    5 mpt0 irq17
                          =========================                                prcfr  400 cpu0: time
                                                                  8 dtbuf        2 totfr  400 cpu1: time
                          Namei    Name-cache  Dir-cache    69211 desvn          react
                            Calls    hits  %    hits  %      890 numvn          pdwak
                                11      11 100                    65 frevn          pdpgs
                                                                                    intrn
                          Disks  da0  md0 pass0                            133376 wire
                          KB/t  17.19  0.00  0.00                            41368 act
                          tps      5    0    0                            77764 inact
                          MB/s  0.09  0.00  0.00                                92 cache
                          %busy    1    0    0                            755664 free

                          8:01:24pm up 78 days  3:19, 200 worlds; CPU load average: 0.29, 0.16, 0.08
                          PCPU USED(%): 3.5 3.0  22  14  69 1.2 2.5 4.1 AVG:  15
                          PCPU UTIL(%): 3.6 3.2  22  12  66 1.2 2.4 2.7 AVG:  14
                          CORE UTIL(%): 6.7      34      67    5.0    AVG:  28

                          ID    GID NAME            NWLD  %USED    %RUN    %SYS  %WAIT    %RDY
                                1      1 idle                8  273.89  800.00    0.00    0.00  800.00
                          1537396 1537396 earthmother - p    5  102.58  97.93    0.04  380.97    0.07

                          This is with OpenVM Tools 313025. Timecounter looks like this:
                          kern.timecounter.choice: TSC(-100) ACPI-safe(850) i8254(0) dummy(-1000000)
                          kern.timecounter.hardware: ACPI-safe
                          kern.hz: 100

                          Pretty much exactly as expected; all other FreeBSD guests are exactly the same. ACPI-safe over TSC, no stepwarnings, and frequency 3579545 - no exceptions on any of them. (They're all 8.1-RELEASE currently.) This is on 32-bit, too, I forgot to mention.</rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,promisc,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,promisc,simplex,multicast></rxcsum,txcsum,vlan_mtu,vlan_hwtagging,polling,vlan_hwcsum></up,broadcast,running,simplex,multicast>

                          1 Reply Last reply Reply Quote 0
                          • T
                            timotl
                            last edited by

                            I have been seeing this under ESXi 5 also.
                            I installed the vendor supplied tools and am using a single trunked E1000.

                            After trying all of the nic settings, I happened to disable powerd and the CPU usage went down by more than half.
                            Can anyone else confirm this?

                            -timotl

                            1 Reply Last reply Reply Quote 0
                            • L
                              loftyDan
                              last edited by

                              I too have this issue, using ESXi 5 (and previously on 4.1).  Changing the powerd settings did not resolve the issue for me.  I've tried 2.0 i386, my primary config, 2.1 i386 and 2.1 AMD64.  For both dev builds I tried with my config backup, and a clean install, and the results were always the same.  pfSense reports 16-20% CPU load, while ESXi reports a 62% load (on a Xeon X3440 @ 2.53GHz).  This is with a download speed of about 3.6 MB/sec (29 Mb/sec).  In every case Open-VM-Tools has been installed and I've been using the E1000 NIC.  Speeds directly connected to the modem yield 31 Mb/sec.

                              If there is anything else I can test, or any more information I can provide, please let me know.  I'd love for this problem to get resolved.

                              1 Reply Last reply Reply Quote 0
                              • V
                                Veni
                                last edited by

                                @loftyDan:

                                I too have this issue, using ESXi 5 (and previously on 4.1).  Changing the powerd settings did not resolve the issue for me.  I've tried 2.0 i386, my primary config, 2.1 i386 and 2.1 AMD64.  For both dev builds I tried with my config backup, and a clean install, and the results were always the same.  pfSense reports 16-20% CPU load, while ESXi reports a 62% load[…]

                                I'm seeing the same thing but on a single x5650 @ 2.67 GHz.

                                If i try to limit the CPU usage from the vSphere client then i don't get the performance i'm after(aprox 150 Mbps). Instead i get around 20-22 Mbps.
                                So it sounds as if the usage is real somehow, otherwise why whould i see performance issues when giving pfSense a maximum of 1-1.5 GHz?

                                1 Reply Last reply Reply Quote 0
                                • K
                                  kkrauth
                                  last edited by

                                  Just to chime in on this thread, as I'm seeing the same issues. I'm running the following release:
                                  [2.0.1-RELEASE][root@pfSense.localdomain]/root(7): uname -a
                                  FreeBSD pfSense.localdomain 8.1-RELEASE-p6 FreeBSD 8.1-RELEASE-p6 #0: Mon Dec 12 18:15:35 EST 2011    root@FreeBSD_8.0_pfSense_2.0-AMD64.snaps.pfsense.org:/usr/obj./usr/pfSensesrc/src/sys/pfSense_SMP.8  amd64

                                  within ESXi 5. I installed open-vm-tools and vmware's provided drivers for VMXNET3 adapter. Both internal/extenal NICs are running with the VMXNET3 driver. The problem was exactly the same using E1000 drivers.

                                  The attached screenshot shows what is happening when the network is pretty much idle. During load, this spikes up even higher, even though pFsense top reports almost no usage whatsoever. I tried both with powerd turned on and off.

                                  pfsense.png
                                  pfsense.png_thumb

                                  1 Reply Last reply Reply Quote 0
                                  • M
                                    marsboer
                                    last edited by

                                    Same issue on fresh pfSense 2.0.1 install running on KVM (Proxmox VE) with smp kernel. With only a couple of mbits of traffic the CPU usage increases massively on the physical host (above 50%) running on single virtual CPU and 512 MB RAM.

                                    pfSense does not support virtio (the paravirtualized devices for KVM) so I thought using emulated NICs was the main reason for the bad CPU performance even under light load, but now I am starting to think that this is may be a more generic problem with pfSense in virtualized setups in general.

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      clayton_ross
                                      last edited by

                                      i too am having the same problem.  pfsence 2.0 64, esxi 5.0  2 cores 2 nics vmtools

                                      1 Reply Last reply Reply Quote 0
                                      • I
                                        iFloris
                                        last edited by

                                        As most others on this thread, I too have run into this problem.
                                        Something that is not clear to me is if using e1000 is the source of such increased cpu usage on esx.
                                        And if that is the case, does switching to another adapter, such as flexible or vmxnet 2/3 help in reducing load for any of you?

                                        one layer of information
                                        removed

                                        1 Reply Last reply Reply Quote 0
                                        • K
                                          kkrauth
                                          last edited by

                                          @iFloris:

                                          As most others on this thread, I too have run into this problem.
                                          Something that is not clear to me is if using e1000 is the source of such increased cpu usage on esx.
                                          And if that is the case, does switching to another adapter, such as flexible or vmxnet 2/3 help in reducing load for any of you?

                                          I tried all three virtual adapters and the behaviour was the same.

                                          1 Reply Last reply Reply Quote 0
                                          • M
                                            Mattofsweden
                                            last edited by

                                            I'm seeing the same issues here on a DELL PowerEdge R310 Quad Core Xeon:
                                            Using ESXi 4.1 and pfSense 2.0, 2.0.1, old-2.1-dev in i386/amd64 flavors
                                            Using ESXi 5.0 and pfSense 2.0.1 and 2.1-dev in i386/amd64 flavors from feb/march/april.

                                            Same results on other host hardware also (Two DELL Servers with virtualized environment at home for testing purposes.)

                                            Have not tried the VMXNET due to others not seeing any performance gain, only been using virtualized E1000 so far.

                                            What I'm using a lot is VLANs, which might be a contributing culprit for some of us? Assigning VLANs directly in switch configuration in vSphere, or natively in pfSense has had "largely" the same results.

                                            I absolutely love pfSense, now that I've got a hang of it, and have deployed quite a few in different scenarios past few months. But, not to sound negative here, there gotta be something we can do about these high loads in virtualized environments. I had to switch over to bare-metal, on slightly aged HW, on our lab network which is a bit unsatisfying. I loose a bit of my redundancy (if one VM or host fails, just fire up the copy or using HA Sync).

                                            I suppose it's underlying FreeBSD issue?
                                            I don't really know how to set up something similar in any of the *BSD flavors, and honestly can't find the time to learn currently, but surely one of you guys could test a simple routing setup using FreeBSD/OpenBSD/NetBSD and see if there's the same performance issue? (Maybe with/without VLAN incl. trunking/non-native.)

                                            Regards,
                                            Mattias

                                            IT Teacher & Networking Consultant

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.