VMWare Pentest lab: Extremely high CPU on host
-
- /etc/sysctl.conf: append: kern.timecounter.hardware=TSC ….at the end
- /boot/loader.conf: append: kern.hz="100" ....at the end
...restart, so far - CPU usage OK ;)
Thanks for the tip :).
But no-go on my end ???. Still the same, running @ 1800-2400 MHz(according to ESXi) for only 20 Mbps traffic.
pfSense shows only 4% CPU usage.It sounds to me that it must be close to the network part between the guest(pfSense higher than 1.2.3) and host.
Because my 1.2.3 guest that only has approx. 1,2 Mbps of traffic shows approx. 29 MHz according to ESXi.I am running a Sony Ericsson USB connected UMTS cellular phone as a backup WAN. Have not excluded it as
the culprit. That is the reason why i have a pfSense 2.x running, because of the UMTS backup.
1.2.3 did not have support for these kind of connections so it takes care of a different part of the network. -
I would try "harder" to eliminate the VMWare CPU throttling effects… did you try something like this: http://communities.vmware.com/thread/87794
...reading:
Add in /boot/loader.conf:Disable CPU frequency/voltage throttling control
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1Disable local APIC timers (FreeBSD 8+)
hint.apic.0.clock=0
Reduce interrupt rate (at the cost of slightly increases response time)
kern.hz=100
Saves 128 interrupts per second per core at the cost of reduced scheduling precision
hint.atrtc.0.clock=0
Add in /etc/rc.conf:
Turn off all CPU core clocks on idle
performance_cx_lowest="C2"
economy_cx_lowest="C2"Disable background fsck at boot
background_fsck="NO"
also, are you getting the high CPU only on traffic or also when there is zero activity?
Did I get that right that you forward the USB- modem to the guest?Cheers,
Chris -
I would try "harder" to eliminate the VMWare CPU throttling effects… did you try something like this: http://communities.vmware.com/thread/87794
...reading:
Add in /boot/loader.conf:Disable CPU frequency/voltage throttling control
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1Disable local APIC timers (FreeBSD 8+)
hint.apic.0.clock=0
Reduce interrupt rate (at the cost of slightly increases response time)
kern.hz=100
Saves 128 interrupts per second per core at the cost of reduced scheduling precision
hint.atrtc.0.clock=0
Add in /etc/rc.conf:
Turn off all CPU core clocks on idle
performance_cx_lowest="C2"
economy_cx_lowest="C2"Disable background fsck at boot
background_fsck="NO"
Will give it a try in the morning(in about ten hours).
also, are you getting the high CPU only on traffic or also when there is zero activity?
Don't know. Will have to pull the plug to the primary, secondary and tertiary routes to get zero activity. Will give it a try in the morning.
Did I get that right that you forward the USB- modem to the guest?
Correct. Using it as a tertiary WAN for my personal network part.
-
Add in /boot/loader.conf:
Disable CPU frequency/voltage throttling control
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1Disable local APIC timers (FreeBSD 8+)
hint.apic.0.clock=0
Reduce interrupt rate (at the cost of slightly increases response time)
kern.hz=100
Saves 128 interrupts per second per core at the cost of reduced scheduling precision
hint.atrtc.0.clock=0
Add in /etc/rc.conf:
Turn off all CPU core clocks on idle
performance_cx_lowest="C2"
economy_cx_lowest="C2"Disable background fsck at boot
background_fsck="NO"
No-go. Same result. Did not set background_fsck="NO".
also, are you getting the high CPU only on traffic or also when there is zero activity?
pfSense 2.0.1 346-373 MHz when there is zero activity. About 4905 MHz when download client(Windows guest inside on the same host) only uses 1706 MHz. Pressed the disconnect button inside interfaces to disconnect the USB WWAN connection. pfSense 1.2.3 0 MHz when there is zero activity.
Both guests on same host. They share the same physical interfaces for primary WAN, secondary WAN and LAN. pfSense 2.0.1 also uses a physical interface for WLAN network for passthrough to Captive Portal.
-
Imagine how expensice this would be if running in a cloud environment and you pay for CPU usage….
-
have the same problem on esxi 5.0.1 for both pfsense 2.0.1 and pfsense 2.1.
m0n0 and others is working great, just pfsense, And some times it use 100% cpu and loss response, the network also shutdown.
-
Anybody that can test the same OS as pfsense running standalone in a VM to see if its the OS or specific to PFSense?
-
I just checked one of our ESX 5 boxes that houses not only our builder VMs but a batch of test pfSense VMs as well - at the moment, they're all idle.
FreeBSD 8.1 amd64 host - 81MHz
FreeBSD 8.1 i386 host - 86MHz
FreeBSD 8.3 amd64 host - 79MHz
FreeBSD 8.3 i386 host - 83MHz
pfSense 1.2.3 - 17MHz
pfSense 2.0.1 amd64 - 36MHz
pfSense 2.0.2 amd64 - 38MHz
pfSense 2.0.2 i386 - 51MHz
pfSense 2.1 amd64 - 41MHz
pfSense 2.1 i386 - 49MHzThe builders are running open-vm-tools-nox11, and at the moment the pfSense firewalls do not have tools installed.
So while the 1.2.3 VM is using less, it's not significantly less. I would still hesitate to call this a general issue. There must be something about the hardware (real or virtual)/config/etc bringing it out.
-
Can people pls. make a list of what packages they run as well??
-
Can people pls. make a list of what packages they run as well??
Packages:
-
AutoConfigBackup
-
Open-VM-Tools
-
Shellcmd
-
squid
-
squidGuard
Hardware:
-
Running a USB phone as tertiary route(only pfSense 2.0.1 is using it, not 1.2.3).
-
Supermicro X8DTi-LNF4 board with Intel Dual 82576 Dual-Port Gigabit Ethernet(all four NIC's are used by ESXi to share them among the two pfSense's inside the box)
-
2 x Xeon x5650
-
LSI 9280 24i4e with BBU and SafeStore.
-
Some serial ports and USB ports are used, but except for above, none of them are used by any pfSense.
-
-
Can people pls. make a list of what packages they run as well??
Packages:
-
Cron 0.1.5
-
Dashboard Widget: Snort 0.3.2
-
mailreport 1.2
-
mtr-nox11 0.82
-
NRPE v2 2.12_3 v2.1
-
ntop 4.1.0_3 v2.3
-
Open-VM-Tools-8.8.1 528969
-
OpenVPN Client Export Utility 0.24
-
RRD Summary 1.1
-
snort 2.9.2.3 pkg v. 2.5.1
-
Unbound 1.4.14_01
VMware version:
- ESXi 5.0.0, 469512
Hardware:
-
P5K-E board
-
6 GB ram
-
Intel 82574L gigabit card
-
Intel 82576 dual-port gigabit card
-
Core 2 Duo E6750 @ 2.66Ghz
-
iScsi connection to Synology NAS
-
-
Can people pls. make a list of what packages they run as well??
Packages:
-
AutoConfigBackup
-
Open-VM-Tools
-
Shellcmd
-
squid
-
squidGuard
Hardware:
-
Running a USB phone as tertiary route(only pfSense 2.0.1 is using it, not 1.2.3).
-
Supermicro X8DTi-LNF4 board with Intel Dual 82576 Dual-Port Gigabit Ethernet(all four NIC's are used by ESXi to share them among the two pfSense's inside the box)
-
2 x Xeon x5650
-
LSI 9280 24i4e with BBU and SafeStore.
-
Some serial ports and USB ports are used, but except for above, none of them are used by any pfSense.
I've now tried the following with these results:
-
pfSense 1.2.3 ~ 110 % off(pfSense CPU usage versus vCenter reported CPU usage)
-
pfSense 2.0.1 ~ 710 % off(pfSense CPU usage versus vCenter reported CPU usage)
-
pfSense 2.1.0 ~ 720 % off(pfSense CPU usage versus vCenter reported CPU usage)
All default installs, 32-bit. Only one LAN and one WAN. No packages.
Tried using the ATA VM drives instead of LSI Parallel SCSI on 2.0.1. No measurable difference.I'm starting to wonder if it could be the pNIC's ???.
Anyone able to test with some other NIC's than 82576? Basically disabling/not using the 82576 inside ESXi. -
-
Very sure that its the OS….
Why should the nics use high CPU when no traffic and therefore no offloading occurs?
-
Running VMWare ESXi 5.0.0-469512-standard
pfSense 2.0.1-Release (amd64) 12/12/2011.
Packages:
Backup System 0.1.5
bandwidthd System 2.0.1.3
Cron Services 0.1.5
darkstat Network Management 3.0.714
Lightsquid Network Report 1.8.2 pkg v.2.32
mailreport Network Management 1.2
ntop Network Management 4.1.0_3 v2.3
Open-VM-Tools Services 8.7.0.3046 (build-313025)
OpenVPN Client Export Utility Security 0.24
pfBlocker Firewall 1.0.2
Sarg Network Report 2.3.2 pkg v.0.6.1
squid Network 2.7.9 pkg v.4.3.1
vnstat2 Network Management 1.10_2
widescreen Enhancements 0.2Physical Hardware is a Dell PowerEdge 2950
1x Quad Core Xeon e5345 (2.33Ghz)
8gb RAM
Dual Ethernet Standard built-in Dell NICs (2x BroadCom BCM5708 Gig-Eth)
Running 3 Virtual machines, only one is pfSense.pfSense Virtual Machine:
2x E1000 NICs
LSI Logic SAS.
1 GB RAM
20 GB Disk Space.
There are no resource allocation limits configured in VMWare.pfSense has a couple of VPN clients connected and two gateways that are 2xT1 Each. 8 Routing policies to direct outbound traffic via gateways depending upon source and destination plus some route failover and load balancing. Traffic levels are almost max'd out on both gateways during the business day.
Both pfSense TOP and VMware Utilization show 15-25% and are very close to each other in readings.
Basically, mine works great.
-
To clarify my previous post… I Did a little more checking.
FWIW - I'm running the Open VMtools installed via pfSense Config package page.
The actual cpu usage is different than I thought as I misread the chart.
Currently, running only two virts, the second is basically 100% idle.
pfSense is configured for 1GB RAM as above and the resources are set to shared.
From vShpere Client:
CPU is 775 mghz (out of 9308 mghz system capacity), Host Memory used is 1175MB, guest Memory is 18%From pfSense Web - Diagnostics, System Activity:
last pid: 22805; load averages: 0.07, 0.11, 0.08 up 0+13:48:42 14:55:10
153 processes: 5 running, 122 sleeping, 8 zombie, 18 waitingMem: 655M Active, 93M Inact, 186M Wired, 22M Cache, 110M Buf, 18M Free
Swap: 2048M Total, 196M Used, 1852M Free, 9% InusePID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 64K CPU2 2 811:38 100.00% {idle: cpu2}
11 root 171 ki31 0K 64K RUN 1 801:14 100.00% {idle: cpu1}
11 root 171 ki31 0K 64K CPU0 0 793:16 100.00% {idle: cpu0}
11 root 171 ki31 0K 64K CPU3 3 813:53 96.97% {idle: cpu3}
0 root 76 0 0K 128K sched 0 1024.8 0.00% {swapper}
0 root -68 0 0K 128K - 1 7:21 0.00% {em1 taskq}
0 root -68 0 0K 128K - 3 6:21 0.00% {em0 taskq}
23518 proxy 45 0 731M 612M kqread 2 6:06 0.00% squid
9567 root 44 0 6924K 852K select 1 3:10 0.00% powerd
2655 root 44 0 28456K 8008K select 1 2:33 0.00% bsnmpd
12 root -32 - 0K 288K WAIT 0 2:20 0.00% {swi4: clock}
4 root -8 - 0K 16K - 3 1:50 0.00% g_down
7633 root 76 20 8292K 544K wait 2 0:45 0.00% sh
12 root -40 - 0K 288K WAIT 1 0:37 0.00% {swi2: cambio}
12 root -64 - 0K 288K WAIT 2 0:36 0.00% {irq18: mpt0}
3 root -8 - 0K 16K - 0 0:29 0.00% g_up
8950 nobody 44 0 8164K 1740K select 1 0:23 0.00% darkstat
24 root 44 - 0K 16K syncer 0 0:22 0.00% syncerIn all - roughly 10% cpu physical CPU used and inside the virt, it appears to be 10%. So, lower CPU than I originally thought, but still the numbers agree.
I'm getting ready to install an third virt that will actually have significant cpu/disk access, so it may get interesting.
-
@Veni: Sorry I was away for a bit - to clarify this, are you using direct access for the USB- device or are you routing it through the host- OS?
In general it would be helpful for everybody to list whether:- there are any devices accessed directly
- which devices - causing the high CPU - use which IRQ's and what IRQ's are in access by other devices (check that by digging down from "vmstat 2", "mpstat 2", then go to "sar -I XALL 2 10" and check which the high interrupts are resolving to in "cat /proc/interrupts"
I would point my finger on IRQ's as it seems that the host system actually has a high CPU. So check your hosts NIC (and eventually USB) IRQ's, possibly give them fixed, separate IRQs. Furthermore, try to enable adaptive moderation on your NICs (Adaptive RX: off TX: off) …check with "ethtool -c ethx" - or simply try to increase the mtu to a high value (on host and guest - something like "ifconfig eth2 mtu 9000") to see whether your CPU usage goes down.
As I use the faster, free product called oVirt I (un)fortunately don't have the means to test this on an ESXi system :D oops, sorry... make love not war!!!! ;)
-
@Veni: Sorry I was away for a bit - to clarify this, are you using direct access for the USB- device or are you routing it through the host- OS?
No problem :). Always nice to keep the thread going so that we can come to an end. At this point this is a home machine so basically nobody is loosing anything on this problem.
-
I'm running the USB phone routed through the ESXi to the guest.
-
I've tried(see my last post) several guest installations without the USB phone, just basic pfSense installations from scratch.
-
I've checked with vmstat -i that I'm not using any shared IRQ addresses in the guests OS's mentioned in my last post.
-
My real pfSense 2.0.1 installation(the one with the USB phone) is using shared IRQ's(em3 shared with uhci0+ @ rate 6 and em0 shared with ehci0 @ rate 1186).
Update:
Forgot that i run one more FreeBSD based guest, running FreeNAS(FreeBSD 7.3-RELEASE-p7) and it shows up @ 26 MHz according to vSphere Client while top shows 100% idle.
Sounds that something @ FreeBSD 8+ and some hardware in our boxes are not working as they should.We would need a FreeBSD guru that knows what differs between FreeBSD 7 and 8,
or somebody that can find the common denominator within our boxes. -
-
I'm wondering if what you guys are seeing is the load generated from the vSwitches.
These are certainly not CPU cost free, and also are not known by the guest OS, and as such would not register as a load in the guest.
To someone's point above, has anyone experiencing this issue tested using DirectPath I/O to forward the NIC's directly to the pfSense guest and see if the extra load goes away or not?
-
New update with the following:
FreeBSD 8.1 i386: 26 MHz.
FreeBSD 8.2 i386: 26 MHz.
FreeBSD 9.0 i386: 0 MHz.
OpenBSD 5.1 i386: 26 MHz.Values given by vSphere Client when clients are idle.
So at a first glance it does not sound like it's the BSD part that is the problem, but I'm not familiar
with how to export the NAT and rules from pfSense to packet filter on these platforms, because I would
like to try to put some real load on them, but for that I need inbound NAT rules for it to be effective.I'm wondering if what you guys are seeing is the load generated from the vSwitches.
Hi mattlach. That one slipped by me. Have not been been taking into account that the vSwitches do take up some cpu cycles, but why would there be such a big difference between pfSense 1.2.3 and 2.0.x?
-
Can you see realtime performance on the ESXi host?? That is where the high CPU use is and not in the client….