VMWare Pentest lab: Extremely high CPU on host
-
Very sure that its the OS….
Why should the nics use high CPU when no traffic and therefore no offloading occurs?
-
Running VMWare ESXi 5.0.0-469512-standard
pfSense 2.0.1-Release (amd64) 12/12/2011.
Packages:
Backup System 0.1.5
bandwidthd System 2.0.1.3
Cron Services 0.1.5
darkstat Network Management 3.0.714
Lightsquid Network Report 1.8.2 pkg v.2.32
mailreport Network Management 1.2
ntop Network Management 4.1.0_3 v2.3
Open-VM-Tools Services 8.7.0.3046 (build-313025)
OpenVPN Client Export Utility Security 0.24
pfBlocker Firewall 1.0.2
Sarg Network Report 2.3.2 pkg v.0.6.1
squid Network 2.7.9 pkg v.4.3.1
vnstat2 Network Management 1.10_2
widescreen Enhancements 0.2Physical Hardware is a Dell PowerEdge 2950
1x Quad Core Xeon e5345 (2.33Ghz)
8gb RAM
Dual Ethernet Standard built-in Dell NICs (2x BroadCom BCM5708 Gig-Eth)
Running 3 Virtual machines, only one is pfSense.pfSense Virtual Machine:
2x E1000 NICs
LSI Logic SAS.
1 GB RAM
20 GB Disk Space.
There are no resource allocation limits configured in VMWare.pfSense has a couple of VPN clients connected and two gateways that are 2xT1 Each. 8 Routing policies to direct outbound traffic via gateways depending upon source and destination plus some route failover and load balancing. Traffic levels are almost max'd out on both gateways during the business day.
Both pfSense TOP and VMware Utilization show 15-25% and are very close to each other in readings.
Basically, mine works great.
-
To clarify my previous post… I Did a little more checking.
FWIW - I'm running the Open VMtools installed via pfSense Config package page.
The actual cpu usage is different than I thought as I misread the chart.
Currently, running only two virts, the second is basically 100% idle.
pfSense is configured for 1GB RAM as above and the resources are set to shared.
From vShpere Client:
CPU is 775 mghz (out of 9308 mghz system capacity), Host Memory used is 1175MB, guest Memory is 18%From pfSense Web - Diagnostics, System Activity:
last pid: 22805; load averages: 0.07, 0.11, 0.08 up 0+13:48:42 14:55:10
153 processes: 5 running, 122 sleeping, 8 zombie, 18 waitingMem: 655M Active, 93M Inact, 186M Wired, 22M Cache, 110M Buf, 18M Free
Swap: 2048M Total, 196M Used, 1852M Free, 9% InusePID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 64K CPU2 2 811:38 100.00% {idle: cpu2}
11 root 171 ki31 0K 64K RUN 1 801:14 100.00% {idle: cpu1}
11 root 171 ki31 0K 64K CPU0 0 793:16 100.00% {idle: cpu0}
11 root 171 ki31 0K 64K CPU3 3 813:53 96.97% {idle: cpu3}
0 root 76 0 0K 128K sched 0 1024.8 0.00% {swapper}
0 root -68 0 0K 128K - 1 7:21 0.00% {em1 taskq}
0 root -68 0 0K 128K - 3 6:21 0.00% {em0 taskq}
23518 proxy 45 0 731M 612M kqread 2 6:06 0.00% squid
9567 root 44 0 6924K 852K select 1 3:10 0.00% powerd
2655 root 44 0 28456K 8008K select 1 2:33 0.00% bsnmpd
12 root -32 - 0K 288K WAIT 0 2:20 0.00% {swi4: clock}
4 root -8 - 0K 16K - 3 1:50 0.00% g_down
7633 root 76 20 8292K 544K wait 2 0:45 0.00% sh
12 root -40 - 0K 288K WAIT 1 0:37 0.00% {swi2: cambio}
12 root -64 - 0K 288K WAIT 2 0:36 0.00% {irq18: mpt0}
3 root -8 - 0K 16K - 0 0:29 0.00% g_up
8950 nobody 44 0 8164K 1740K select 1 0:23 0.00% darkstat
24 root 44 - 0K 16K syncer 0 0:22 0.00% syncerIn all - roughly 10% cpu physical CPU used and inside the virt, it appears to be 10%. So, lower CPU than I originally thought, but still the numbers agree.
I'm getting ready to install an third virt that will actually have significant cpu/disk access, so it may get interesting.
-
@Veni: Sorry I was away for a bit - to clarify this, are you using direct access for the USB- device or are you routing it through the host- OS?
In general it would be helpful for everybody to list whether:- there are any devices accessed directly
- which devices - causing the high CPU - use which IRQ's and what IRQ's are in access by other devices (check that by digging down from "vmstat 2", "mpstat 2", then go to "sar -I XALL 2 10" and check which the high interrupts are resolving to in "cat /proc/interrupts"
I would point my finger on IRQ's as it seems that the host system actually has a high CPU. So check your hosts NIC (and eventually USB) IRQ's, possibly give them fixed, separate IRQs. Furthermore, try to enable adaptive moderation on your NICs (Adaptive RX: off TX: off) …check with "ethtool -c ethx" - or simply try to increase the mtu to a high value (on host and guest - something like "ifconfig eth2 mtu 9000") to see whether your CPU usage goes down.
As I use the faster, free product called oVirt I (un)fortunately don't have the means to test this on an ESXi system :D oops, sorry... make love not war!!!! ;)
-
@Veni: Sorry I was away for a bit - to clarify this, are you using direct access for the USB- device or are you routing it through the host- OS?
No problem :). Always nice to keep the thread going so that we can come to an end. At this point this is a home machine so basically nobody is loosing anything on this problem.
-
I'm running the USB phone routed through the ESXi to the guest.
-
I've tried(see my last post) several guest installations without the USB phone, just basic pfSense installations from scratch.
-
I've checked with vmstat -i that I'm not using any shared IRQ addresses in the guests OS's mentioned in my last post.
-
My real pfSense 2.0.1 installation(the one with the USB phone) is using shared IRQ's(em3 shared with uhci0+ @ rate 6 and em0 shared with ehci0 @ rate 1186).
Update:
Forgot that i run one more FreeBSD based guest, running FreeNAS(FreeBSD 7.3-RELEASE-p7) and it shows up @ 26 MHz according to vSphere Client while top shows 100% idle.
Sounds that something @ FreeBSD 8+ and some hardware in our boxes are not working as they should.We would need a FreeBSD guru that knows what differs between FreeBSD 7 and 8,
or somebody that can find the common denominator within our boxes. -
-
I'm wondering if what you guys are seeing is the load generated from the vSwitches.
These are certainly not CPU cost free, and also are not known by the guest OS, and as such would not register as a load in the guest.
To someone's point above, has anyone experiencing this issue tested using DirectPath I/O to forward the NIC's directly to the pfSense guest and see if the extra load goes away or not?
-
New update with the following:
FreeBSD 8.1 i386: 26 MHz.
FreeBSD 8.2 i386: 26 MHz.
FreeBSD 9.0 i386: 0 MHz.
OpenBSD 5.1 i386: 26 MHz.Values given by vSphere Client when clients are idle.
So at a first glance it does not sound like it's the BSD part that is the problem, but I'm not familiar
with how to export the NAT and rules from pfSense to packet filter on these platforms, because I would
like to try to put some real load on them, but for that I need inbound NAT rules for it to be effective.I'm wondering if what you guys are seeing is the load generated from the vSwitches.
Hi mattlach. That one slipped by me. Have not been been taking into account that the vSwitches do take up some cpu cycles, but why would there be such a big difference between pfSense 1.2.3 and 2.0.x?
-
Can you see realtime performance on the ESXi host?? That is where the high CPU use is and not in the client….
-
Can you see realtime performance on the ESXi host?? That is where the high CPU use is and not in the client….
I'm seeing high CPU usage with the help of vSphere client when running pfSense 2.0.x guest. Feel that that is enough to show the high CPU usage by the guest.
-
Hello,
i'm having similar problems with pfsense (2.0 and 2.1) and vmware. I'm not sure witch vmware version i'm using since it's in the "cloud", but cloud management concole says "powered by vmware". We used to run pfsense on very old PC and it worked like a charm, but now when we moved to "cloud" and got more CPU and RAM recources for pfSense it's suddenly not enought for pfSense. pfSense mamagement console says it's only using a few percents of CPU, but cloud administrator tells me we're using 100% and that's the reason our internet connection is so slow. I tried reinstalling and even installed 2.1 RC1, but it didn't help much. Does anyone have any idea what can be done or should i look for another firewall? -
Not good :(
-
Hello,
i'm having similar problems with pfsense (2.0 and 2.1) and vmware. I'm not sure witch vmware version i'm using since it's in the "cloud", but cloud management concole says "powered by vmware". We used to run pfsense on very old PC and it worked like a charm, but now when we moved to "cloud" and got more CPU and RAM recources for pfSense it's suddenly not enought for pfSense. pfSense mamagement console says it's only using a few percents of CPU, but cloud administrator tells me we're using 100% and that's the reason our internet connection is so slow. I tried reinstalling and even installed 2.1 RC1, but it didn't help much. Does anyone have any idea what can be done or should i look for another firewall?Have you tried pfSense 1.2.3? I'm about to revert from 2.0.1 to 1.2.3 this month due to high CPU usage on 2.0.x. I will be canceling the 3G backup route so 1.2.3 will be more than enough.
-
Thats one of the reasons I havent upgraded yet.
It showed the same on both of my physical server setup running in a VM.
1.2.3 works great!
-
Hello,
i'm having similar problems with pfsense (2.0 and 2.1) and vmware. I'm not sure witch vmware version i'm using since it's in the "cloud", but cloud management concole says "powered by vmware". We used to run pfsense on very old PC and it worked like a charm, but now when we moved to "cloud" and got more CPU and RAM recources for pfSense it's suddenly not enought for pfSense. pfSense mamagement console says it's only using a few percents of CPU, but cloud administrator tells me we're using 100% and that's the reason our internet connection is so slow. I tried reinstalling and even installed 2.1 RC1, but it didn't help much. Does anyone have any idea what can be done or should i look for another firewall?Have you tried pfSense 1.2.3? I'm about to revert from 2.0.1 to 1.2.3 this month due to high CPU usage on 2.0.x. I will be canceling the 3G backup route so 1.2.3 will be more than enough.
Thank you for a quick response. I haven't tried 1.2.3 yet. Is it secure? I mean it's an older release so there might be know security bugs, or isn't there?
-
Not as far as I am aware.
-
I cant say I have had this issue yet, but then again I cannot get it configured where i can reach the lan from wan side. I am running it in virtual box, not sure if that would make a complete difference. although if you may have a guide on setup I could use that. I will try to redo and test out vmware this coming weekend.
-
There are XSS and CSRF vulnerabilities in 1.x's web interface. Though if you follow general best practices for managing any web-administered device (use a diff browser than you ever use for Internet), that's a non-issue. Every web-managed device has had some XSS and CSRF issues, and many commercial security-related products have a number of known unpatched XSS and CSRF. Some have released updates fixing them. There isn't anything imminently exploitable in any pfSense version, but I wouldn't recommend running anything prior to the latest stable release.
We and many, many others run most or all our production firewalls on ESX. This very site runs behind firewalls in ESX, and can route gigabit wire speed between internal VLANs, without any excessive CPU usage on the host. All of our production colos run their firewalls in ESX without any issues at all, and they're pushing significant loads. Why a minority of people see this, I don't know, but it's something we plan to investigate post-2.1 when time permits. It may be something that just goes away when we get to a newer FreeBSD base.
Note you do need to make sure you're on the latest ESX (5.0U1 or 5.1 should be fine), while I'm not aware of any ESX issues exactly along these lines, they have patched several bugs related to FreeBSD guests over the years, and there is at least one ugly one in 5.0 pre-update 1.
-
[…]I'm about to revert from 2.0.1 to 1.2.3 this month due to high CPU usage on 2.0.x. I will be canceling the 3G backup route so 1.2.3 will be more than enough.
Reverted last month. Runs perfectly smooth. Right now 42 Mbps makes the guest drive up the clock to 506 MHz(out of 2,66 GHz) on the host. Perfect!
You guys that run your large setups on ESXi without any CPU utilization issues, what type of motherboard, pCPU and pNIC are you using?
-
Use IBM X3550M4 with Intel 10GbE cars X520-T2.
-
Use IBM X3550M4 with Intel 10GbE cars X520-T2.
If you are running at 10 Gbit/s uplink, do you use DirectPath I/O with the pNIC's to pfSense or do you virtualize them to pfSense?
Otherwise the platform is a more current generation than mine.