Pfsense high cpu usage KVM (Unraid)



  • Hi, i am trying to figure out a nice set-up for all my virtual servers.
    Right now i have put all my VM's in a virtual network (vibr0) and added the pfsense to it as a firewall for all the VM's.

    The issue i am having right now is that the cpu usage is insane high when doing transfers/ speedtests over the firewall or even in the firewall terminal itself.

    Altough sometimes the speed i am supposed to get (250mbit/s download) is nearly reached, it comes with 100% cpu usage.

    I have done a check were i use speedtest-cli in the command line of the pfsense, and check in another window the cpu usage with top -S -H. This shows the following:53fc4209-4419-42aa-8c93-25538c7b5458-afbeelding.png
    The speed that i got with this test is 150mbit/s download.

    And according to unraid the cpu usage on the cores was around80% all used by pfsense VM.

    I tried:
    Switching virtual nic (i started with a virtual intel nic, but have the same results with a vmware network card (vmnetx3)).
    Shutting down all other vm's during the testing -> got me better results but still high cpu usage.

    Does anyone have any clue what might cause this or how to fix?
    fyi: i only have one physical nic in my server, which is bridged to my pfsense vm for the network connection. All the other VM's and the pfsense have a connection to vibr0, where IP's are set static.

    If anyone knows how to fix my issue or can help me i would really appriciate it.


  • Netgate Administrator

    What CPU actually is it? What speed is it running at?

    If it was an older CPU stuck at, say, 800MHz you might see that sort of usage.

    Steve



  • @stephenw10 I am running on an (old-school) FX-8350. Stock speeds, water cooled running at 4ghz max (nearly always at maximum). I pass it trough 2 out of 8 cores, so i was thinking like 2*4000MHz would be enough.


  • Netgate Administrator

    Hmm, yeah if that's what it's really getting it should be far more than what is needed for 250Mbps.

    What is the output of sysctl hw.clockrate or sysctl dev.cpu.0 ?

    Steve



  • @stephenw10 Heres the output:
    b5aacd21-ca29-4b6c-bbc6-7bceb21cfaca-afbeelding.png

    Default clock of an fx 8350 is 3.6Ghz. Just know that this is a Virtual Machine. Unraid config over here:
    514be666-7f25-4f3d-9ada-8361166d9694-afbeelding.png
    db78850f-b84b-444c-87ca-528685e525e3-afbeelding.png

    During a speedtest on the pfsense (speedtest-cli with 150mbit download) the clock rates are this on unraid (8 core cpu so 8 speeds):
    ebee771d-6297-453b-b54d-10ea38bba758-afbeelding.png



  • Also a little addon on how it looks in the pfsense WebGui when the firewall is at idle and when doing a speedtest:
    3cf15aa2-0e25-49c9-8fc9-885b86d42664-afbeelding.png
    22641d74-edf6-4785-bb28-4656f8a01811-afbeelding.png
    During a speedtest top -S -H:
    0b0e0f96-af34-4aed-bcf9-feddd7efbf1f-afbeelding.png



  • From what i have found so far i think this has to do because i am using virtual nic and not a physical nic. Can someone confirm this?


  • Netgate Administrator

    It should not just of itself. There are many people running virtualised and not seeing that, including in KVM.

    Something about Unraids setup perhaps? I've never run that personally.

    Steve



  • indeed , i'm using kvm on my ubuntu server and i don't have this. idk what unraid is so i can't be of any help



  • Maybe i should just try to reïnstall it. Shouldn't be that hard to do. Ill post more after some more testing.



  • A reïnstall made no change, the cpu usage went up on 1 of the cores. during this test i even gave it 8 Cpu core's (4.0ghz) and 4GB of RAM. Download speed was 150mbit. So i have no clue what the option is other than the virtual nic or something...
    Sadly i dont have any other nics available to test with. Any suggestions on a step i might try out?

    Thanks!


  • Netgate Administrator

    With vmx NICs you will need to add the following line to /boot/loader.conf.local to get multiple queue support:
    hw.pci.honor_msi_blacklist=0

    Reboot to apply that. Check the output of vmstat -i to be sure it's creating multiple queues.

    Be sure all hardware offloading support is disabled in Sys > Adv > Networking.

    Steve



  • @stephenw10

    Hi, Thanks for your reply,

    I tried to find the /boot/loader.conf.local file but could only find a /boot/loader.conf
    I tried adding it into there ( hw.pci.honor_msi_blacklist=0 ) but still no change.
    It has done something because it moved up in the file.

    During speedtest i get these results with vmstat -i:
    ae70e044-cd64-426d-aa9b-831bafb1867b-afbeelding.png
    And when using the top -S -H command still get the same results.

    Any other suggestions?

    Thanks!



  • you need to create the file
    /boot/loader.conf.local
    if it's missing
    copy inside
    hw.pci.honor_msi_blacklist=0
    save and reboot


  • Netgate Administrator

    Yup create the file if it doesn't exist. If you put it in loader.conf it may get overwritten.

    However that will only do anything for vmx NICs. You have em NICs there currently.

    Steve



  • @stephenw10 Allright, will set them to VMXNET3, reboot, create the file with the line and inform if there are any changes.

    Thanks for the help @kiokoman & @stephenw10 !

    Creating config file:
    982b0dd4-2c9c-4ecf-9ac8-be2915f3b4be-afbeelding.png



  • Okay so further testing will come in later but for now i seem to reach my maximum provider speed on my linux server behind the firewall:
    30ee4f36-acce-4b80-9fbf-49be03d205a0-afbeelding.png

    BUT it did drop back down to 14.4Megabyte's per second and go up and down all the time:
    9e4f1f56-336e-4798-bf02-06b239e5bad7-afbeelding.png
    Cpu usage seems to have set a bit:
    e79a25f5-47e3-4adf-983d-fd919f94def1-afbeelding.png

    Using SMB protocol i get this from moving a file WAN to LAN:
    e45d4618-c85f-4ace-af3c-533642fda829-afbeelding.png

    It's 2 virtual cores are running at nearly full power (cpu 6/7) (cpu 4 is being used on the server side in the LAN network.):
    45cb9478-33ab-446f-b5ad-60bca539c805-afbeelding.png

    I don't know if this is just a performance bug but speeds seem to have increased, altough cpu usage is still high (compared to the hardware specifications of pfsense)

    Changing to a quad core (virtual processor) did not change much either, cpu usage stays high on 2 cores:
    016158cd-9055-41c1-9087-e3bc4ac87e56-afbeelding.png

    Wish i could put my finger on the issue.


  • Netgate Administrator

    I still only see one tx queue and one rx queue on each NIC. Does vmstat -i show more?

    I assume you created that file in /boot

    Steve



  • @stephenw10

    yep its placed under /boot/loader.conf.local
    9c8aefad-741a-439a-9981-93582a95b5a6-afbeelding.png

    vmstat -i during speedtest on server in lan side:
    a7a528cf-d4b7-4795-bc21-5c250ff4579f-afbeelding.png



  • I actually don't know how to read the vmstat -i, but i hope you might know more @stephenw10



  • one queue

    vmx0: tq0 (transmission queue 0)
    vmx0: rq0 (receive queue 0)

    with multiple queue you should see tq0 / tq1 etc etc


  • Netgate Administrator

    Yeah, that. Though I don't have anything vmx to test again right now.
    I think it probably is working as you are seeing the high numbered IRQs which MSI uses.
    Try removing that line or commenting it out and rebooting. Do you see any change?

    On other NICs you might see something like:

    [2.4.4-RELEASE][root@5100.stevew.lan]/root: vmstat -i
    interrupt                          total       rate
    irq7: uart0                          432          0
    irq16: sdhci_pci0                    536          0
    cpu0:timer                      68688188       1001
    cpu3:timer                       1069435         16
    cpu2:timer                       1060293         15
    cpu1:timer                       1086989         16
    irq264: igb0:que 0                 68630          1
    irq265: igb0:que 1                 68630          1
    irq266: igb0:que 2                 68630          1
    irq267: igb0:que 3                 68630          1
    irq268: igb0:link                      3          0
    irq269: igb1:que 0                 68630          1
    irq270: igb1:que 1                 68630          1
    irq271: igb1:que 2                 68630          1
    irq272: igb1:que 3                 68630          1
    irq273: igb1:link                      1          0
    irq274: ahci0:ch0                   4473          0
    irq290: xhci0                         85          0
    irq291: ix0:q0                    216643          3
    irq292: ix0:q1                     47933          1
    irq293: ix0:q2                    325480          5
    irq294: ix0:q3                    514752          7
    irq295: ix0:link                       2          0
    irq301: ix2:q0                     74629          1
    irq302: ix2:q1                       507          0
    irq303: ix2:q2                      1703          0
    irq304: ix2:q3                     89446          1
    irq305: ix2:link                       1          0
    irq306: ix3:q0                     70295          1
    irq307: ix3:q1                      4985          0
    irq308: ix3:q2                    186433          3
    irq309: ix3:q3                    413486          6
    irq310: ix3:link                       1          0
    Total                           74405771       1084
    

    https://www.freebsd.org/cgi/man.cgi?query=vmx#MULTIPLE_QUEUES

    Steve



  • try to add this on your loader.conf.local

    hw.vmx.txnqueue="4"
    hw.vmx.rxnqueue="4"
    


  • @kiokoman & @stephenw10

    I added the rule with
    hw.vmx.txnqueue="4"
    hw.vmx.rxnqueue="4"

    I did not see any change whatsoever in vmstat -i:
    5aef0aff-dddc-4b0e-936f-aead27b125d6-afbeelding.png

    and commenting out the first rule also did not change anything:
    1f593ee1-26de-4e5d-b6b4-e79c72ffecd6-afbeelding.png

    Edit:

    Even when doing a download on a server in LAN and using top -S -H i have this outcome:
    1adb31da-90e6-406b-99d7-968622a15c02-afbeelding.png


  • Netgate Administrator

    You are seeing load on all CPUs there and none is at 100% so it's not CPU limited at that point.



  • @stephenw10 i have increased it before to 4 cores running at 4ghz. Right now i dont know what to do at all:( i really like the easy way of working with pfsense but i dont know what further investigation i can do because the cpu usage is skyrocket high with 250mbit/s


  • Netgate Administrator

    Yes, there is something significantly wring with your virtualisation setup there. You can pass 250Mbps with a something ancient and slow like a 1st gen APU at 1GHz.

    Steve



  • @stephenw10 Poor me then, i will see if i will try some other things with this setup



  • to me the problem should be investigated on the vm side more than from inside pfsense. i see on google that people tend to bridge the interface instead off using the passthrough for unraid.
    personally, for example, i was never able to make pfSense work reliable under virtualbox and i had to change the vm to qemu/kvm