Pfsense 2.1 vmware cpu host high usage

kenshirothefist

Is it possible that this is related to VMware virtual machine monitor mode?

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1036775

For example:

datetime| vmx| MONITOR MODE: allowed modes : BT HV HWMMU
datetime| vmx| MONITOR MODE: user requested modes : BT HV HWMMU
datetime| vmx| MONITOR MODE: guestOS preferred modes: BT HWMMU HV
datetime| vmx| MONITOR MODE: filtered list : BT HWMMU HV

Where:

allowed modes – Refers to the mode that the underlying hardware is capable of.
user requested modes – Refers to the setting defined in the virtual machine configuration.
guestOS preferred modes – Refers to the default values for the selected guest operating system.
filtered list – Refers to the actual monitor modes acceptable for use by the Hypervisor, with the left-most mode being utilized.

I have "automatic" for my pfsense VM and it reads like this:

datetime| vmx| I120: MONITOR MODE: allowed modes : BT32 HV HWMMU
datetime| vmx| I120: MONITOR MODE: user requested modes : BT32 HV HWMMU
datetime| vmx| I120: MONITOR MODE: guestOS preferred modes: HWMMU HV BT32
datetime| vmx| I120: MONITOR MODE: filtered list : HWMMU HV BT32

Therefore it is using hardware MMU and hardware instruction set virtualization … can't change it right now, but can someone test with different settings and post results?

Also, is it possible that is related to using distributed vSwitch? Can someone test by using regular vSwitch vs. distributed vSwitch (again, I can't change my environment to regular vSwitch right now)?

deagle

Distributed vSwitch is just an abstraction for many standard vSwitches.

kenshirothefist

FYI: there is even more overhead when you go from 1 vCPU to 2 vCPU … I had 2.3 GHz CPU usage when my pfSense VM was configured with 1 vCPU (approaching to limit 2.67 GHz), then I reconfigured VM to use 2 vCPU and now I have 3.0 GHz CPU usage (probably from CPU threads trashing) ... this is really annoying ... and I really don't want to go back to physical ...

KOM

I've been following this thread but didn't think it was affecting me. Then I took a look. pfSense tells me it's using 2% CPU. VMware tells me it's using almost nothing. ESXTop tells me 20%…

My config:

Dell Powervault NX3000 (8 x L5520 @ 2.26 GHz)
ESXi 5.5U1
pfSense 2.1.3 i386
2 x vCPU, 2GB RAM
VM version 8 hardware
Intel E1000 vNICs

podilarius

I have 2 pfSense VMs running (one 2.1.3 [143MHz used] and the other 2.2-Alpha [95MHz used]) and both are running with very low CPU. Both as reported by ESX and pfSense.
Are you guys running powerd, I am?

Edit:
Meant to add my config:
ASUS Server
AMD Opteron 12 core processor
Intel NICs on 2.1.3
VMXNET3 on 2.2
1024MB of memory
VMTools package is installed
No other packages running.

biggsy

KOM and podilarius,

Under what sort of network load?

My CPU numbers are also low - until I start a heavy (120 Mb/s) download then they diverge very quickly. The ESXi/esxtop numbers go through the roof while pfSense sees little change.

KOM

I pumped up a few torrents to try and saturate my 90/90 link and could only manage about 10Mb/s. Even then, ESXTop showed pfsense taking from 50-103% of %USED.

I don't use powerd.

podilarius

I am about to load test my 2.2 so I will let you know.

podilarius

@podilarius:

I am about to load test my 2.2 so I will let you know.

Okay so pushing 500Mbits in and out of pfsense 2.2, I am getting 5987MHz and ESXTOP is showing 225-266% CPU usage for just pfsense 2.2. I hope that is out of 1000%.
The VM itself is showing 89-91% usage. I have a AMD 6234 at 2.4GHz per core and has 12 cores.
As you all know, if you are running 2.1 series and before, pf and i think ipfw are giant locked and will only use 1 processor.
To me it seems that 2.2 is giant locked to 2 CPUs. This could be because I have 2 nics involved, so I am not sure if it is one CPU per NIC or just locked to dual CPUs. I have asked in another thread with no answer.
I am trying to get to 10GB speed, but I seem locked to 1GBE, but alas that is another issue for another topic.

Gabri.91

Hi,
is this problem solved or still present?

I want to move my pfSense to ESXI purchasing new hardware, but now I'm not really sure about it..

MichelZ

I seem to have the same problem. 1.5 GhZ in ESX, ~14% on pfsense, about 1 MBit (!) traffic… :(
Happens to both VM's of a HA Pair. Using Intel NICs, E1000 vNIC

MichelZ

What is the ESXi host machine and processor? Supermicro X8DTU / E5620
Which version of pfSense and whether 32 or 64-bit? 64-bit
How many vCPUs have you allocated to the VM? 1
How much memory have you allocated to the VM? 1 GB
Have you installed the pfSense packaged VM tools or the VMware-supplied tools? Open-VM-Tools
Are you using the e1000 adapter type or something else? E1000

gdi2k

I have the same problem with pfsense 2.1.5 running on KVM on Ubuntu Server 14.04.

See attached screenshot with pfsense running top on the right and the host machine running the VMs on the left.

What is the ESXi host machine and processor? Thinkserver TS140 / Intel Xeon CPU E3-1225 v3 @ 3.20GHz
Which version of pfSense and whether 32 or 64-bit? pfsense 2.1.5 - 32-bit
How many vCPUs have you allocated to the VM? 4
How much memory have you allocated to the VM? 2 GB
Have you installed the pfSense packaged VM tools or the VMware-supplied tools? No
Are you using the e1000 adapter type or something else? E1000

1_0862.png_thumb

EarlBacid

I have the same issue running pfsense 2.1.5 within a proxmox (kvm) virtualization.

What is the ESXi host machine and processor? PCEngines APU (AMD G-T40E, 2*1GHZ, 4GB RAM), running proxmox 3.2 under debian 7
Which version of pfSense and whether 32 or 64-bit? pfsense 2.1.5 - 64-bit
How many vCPUs have you allocated to the VM? 2
How much memory have you allocated to the VM? 2 GB
Are you using the e1000 adapter type or something else? testet all kind of virtual NICs including virtio

pfsense idle: while pfsense assuming less than 10% on both CPUs, the hosts recognizes about 50-60% on both cores.
pfsense busy: while pfsense assuming about 30% on both CPUs, the hosts recognizes about 70-80% on both cores. Throughput is limited to aprox. 80 MBit/s.
Other guests like a Debian installation consume only 1-2 % of host CPU during idle state.

I also tried the latest 2.2 snapshot. The cpu consumption decreased to 20-30%, which is still to much, but much better than 2.1.5, but the throughput was limited to ~40 MBit, so this is not an option since my internet connection is 100 MBit/s

Another issue is, that I have to emulate the CPU as an qemu64 cpu, becaus using "host" causes pfsense to crash during bootup (other guests are ok with the "host" option). I also had to turn of all kind of checksum offloading to reach these throughputs. with checksum offloading enabled, the throughput is less than 1 MBit/s

I have no packages installed.

Guldil

Just to help.

What is the ESXi host machine and processor? Supermicro H8DCL / AMD 4386 / ESX 5.5.0 2068190
Which version of pfSense and whether 32 or 64-bit? 32-bit - 2.1.5 (no tweak)
How many vCPUs have you allocated to the VM? 1
How much memory have you allocated to the VM? 1 GB
Have you installed the pfSense packaged VM tools or the VMware-supplied tools? Open-VM-Tools
Are you using the e1000 adapter type or something else? E1000

Idle Time : 371 Mhz / 10% in Performance Vmware Tabs / 0% in Pfsense Dashboard
High Load (download full speed) : 5857 Mhz / 100 % in Performance Vmware Tabs / 98% in Pfsense Dahboard

Terrabit_AH

More to help

i have the same issue on several vm's on our esxi 5.5 Cluster
100% CPU and only 3-5 mbit traffic.
also i have the Problem with openvpn on heavy load >200mbit the ip stack Crash an i get DUP! icmp pings.
now i install one new pfs 2.1.5 with a clean config and e1000 nics.
i change the vm today in the evening.
all vm's have 2gb ram and 1 vcpu.
the new one is our boarder router with bgp that shut route 1000mbit.
i Report the results next week.
we have esii 5.5 U1.

regards alexander

Terrabit_AH

so here the results of my test

the current pfs 2.1.5 have defenitly a bug under vmware 5.5u1.

in one test the failure occurse 2 min after the restart, i think the reasen was the high load (400mbit) traffic.

after several time something is Crash and i get Dup! if i make a ping.

the power will trunkated by 100mbit on each Interface.

i test it with 8 cores then with 4 cores. my vm have 4 nics all Intel 1000.

here the ping:

PING 193.84.xxx.xxx (193.84.178.161) 56(84) bytes of data.
64 bytes from 193.84.xxx.xxx: icmp_seq=1 ttl=64 time=8.10 ms
64 bytes from 193.84.xxx.xxx: icmp_seq=1 ttl=64 time=8.10 ms (DUP!)
64 bytes from 193.84.xxx.xxx: icmp_seq=2 ttl=64 time=8.38 ms
64 bytes from 193.84.xxx.xxx: icmp_seq=2 ttl=64 time=8.38 ms (DUP!)

after a reboot of the pfs everything is ok.

now after one day tests …. i have News.

i tested several versions with 8 cores with 6 cores with 4 cores but only 2 cores are stable.

now with 1 socket and 2 cores no error occures since 6 hours.

the Performance is poor but no error's. the cpu is constantly at 70%.

last pid: 12575; load averages: 0.10, 0.17, 0.19 up 0+06:28:28 16:14:20
98 processes: 3 running, 78 sleeping, 17 waiting

Mem: 64M Active, 25M Inact, 111M Wired, 408K Cache, 24M Buf, 7698M Free
Swap: 2048M Total, 2048M Free

PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
11 root 171 ki31 0K 32K CPU0 0 351:45 82.96% [idle{idle: cpu0}]
11 root 171 ki31 0K 32K RUN 1 343:38 73.00% [idle{idle: cpu1}]
0 root -68 0 0K 224K - 1 40:29 13.96% [kernel{em0 taskq}]
0 root -68 0 0K 224K - 0 30:51 10.99% [kernel{em4 taskq}]
12 root -32 - 0K 272K WAIT 0 0:02 1.95% [intr{swi4: clock}]
12 root -32 - 0K 272K WAIT 0 0:58 0.00% [intr{swi4: clock}]
14 root -16 - 0K 16K - 1 0:37 0.00% [yarrow]
0 root -16 0 0K 224K sched 1 0:36 0.00% [kernel{swapper}]
0 root -68 0 0K 224K - 1 0:24 0.00% [kernel{em3 taskq}]
256 root 76 20 6908K 1380K kqread 1 0:17 0.00% /usr/local/sbin/check_reload_status
79750 root 44 0 59596K 6756K select 1 0:06 0.00% /usr/local/bin/vmtoolsd -c /usr/local/share
20203 root 44 0 24232K 5420K kqread 0 0:06 0.00% /usr/local/sbin/lighttpd -f /var/etc/lighty
15152 root 44 0 5784K 1464K select 0 0:05 0.00% /usr/local/sbin/apinger -c /var/etc/apinger
0 root -68 0 0K 224K - 0 0:05 0.00% [kernel{em1 taskq}]
86756 root 52 0 150M 38940K piperd 1 0:05 0.00% /usr/local/bin/php
79831 root 44 0 146M 33480K accept 1 0:04 0.00% /usr/local/bin/php
16 root -16 - 0K 16K pftm 1 0:02 0.00% [pfpurge]
53737 root 44 0 6960K 1652K select 1 0:02 0.00% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/va

really good shit, if i found no solution i should say bye bye pfsense :(

cmb

Has one of the minority who's seeing this actually contacted VMware support? It only happens to a tiny fraction. The VM isn't using that much CPU, if the hypervisor is…guess whose fault that is? More than likely not ours or FreeBSD's.

kenshirothefist

@cmb:

Has one of the minority who's seeing this actually contacted VMware support? It only happens to a tiny fraction. The VM isn't using that much CPU, if the hypervisor is…guess whose fault that is? More than likely not ours or FreeBSD's.

As somebody already noted in one of previous posts - it is likely that the majority is hitting this issue, but are simply unaware of it. Because under low or modest load you never notice the high CPU usage in VMware, if you aren't explicitly monitoring the VM from VMware side (and many users are only monitoring load inside pfsense).

If there are any users, who are not seeing this issues (make sure you actually look at VMware virtual machine CPU usage under modest load) - please, post your configuration or VMware environment description.

KOM

Just today I'm being hit with 90+Mbps of external junk traffic. I started getting VMware alarms telling me that my pfSense CPU usage was excessive. When I check via pfSense dashboard, CPU is at ~45%. When I check VMware performance, CPU is at 90+%. See image.

CPU.png_thumb