Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Performance issue on virtualised pfSense

    Scheduled Pinned Locked Moved General pfSense Questions
    24 Posts 3 Posters 3.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      shshs
      last edited by shshs

      Greetings guys, I'll be thankful for your help with the following issue.

      We have a HA pair of pfSense (2.6.0-RELEASE (amd64) built on Mon Jan 31 19:57:53 UTC 2022, FreeBSD 12.3-STABLE) running on KVM with 4xCPU and 4Gb of RAM, they both work with 10G NIC which is emulated in pfsense by VirtIO driver. Ain't nothing wrong with cluster itself but recently we started to observe high CPU utilisation (in GUI) and packet drops/delays on all traffic that is passing this vNIC. pfSense here is "a router on a stick", 10G vlan trunk is used between HW servers (where pfSense is hosted) and L2 switch.

      Before it CPU utilisation was below 10%, now it's 40-50%:

      $ top -aSH
      last pid: 66579;  load averages:  1.71,  1.73,  1.70                                                                                                       up 0+01:15:58  09:27:44
      348 threads:   7 running, 309 sleeping, 32 waiting
      CPU:  0.2% user,  0.0% nice, 17.8% system, 26.5% interrupt, 55.5% idle
      Mem: 13M Active, 51M Inact, 128M Laundry, 510M Wired, 392M Buf, 127M Free
      Swap: 1638M Total, 26M Used, 1612M Free, 1% Inuse
      
        PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
         11 root        155 ki31     0B    64K CPU2     2  55:57  82.14% [idle{idle: cpu2}]
         11 root        155 ki31     0B    64K RUN      1  52:00  72.31% [idle{idle: cpu1}]
          0 root        -92    -     0B   704K CPU3     3  36:06  70.17% [kernel{vtnet1 rxq 0}]
         11 root        155 ki31     0B    64K CPU0     0  37:09  61.09% [idle{idle: cpu0}]
         12 root        -72    -     0B   528K CPU0     0  21:26  32.30% [intr{swi1: netisr 0}]
         12 root        -92    -     0B   528K WAIT     0  31:50  28.41% [intr{irq262: virtio_pci3}]
         11 root        155 ki31     0B    64K RUN      3  45:53  27.30% [idle{idle: cpu3}]
         12 root        -72    -     0B   528K WAIT     2   9:57  16.08% [intr{swi1: pfsync}]
         12 root        -92    -     0B   528K WAIT     2   3:52   6.96% [intr{irq265: virtio_pci4}]
      50536 root         20    0    11M  2280K select   2   0:52   0.88% /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf -b 10.0.11.253
      64581 root         20    0    12M  2536K bpf      0   0:51   0.79% /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
      76202 unbound      20    0    61M    19M kqread   0   0:07   0.24% /usr/local/sbin/unbound -c /var/unbound/unbound.conf{unbound}
         21 root        -16    -     0B    16K -        2   0:06   0.15% [rand_harvestq]
      77844 root         20    0    14M  3908K CPU2     2   0:00   0.10% top -aSH
         12 root        -60    -     0B   528K WAIT     0   0:06   0.10% [intr{swi4: cloc
      

      You can see that [kernel{vtnet1 rxq 0}] kernel eats 60-70%. vtnet1 is a trunk where we have most of the pfSense traffic. On the L2 switch itself we have 50-100Mbit/s traffic load both Tx and Rx, looks normal.

      Interrupt counters:

      /root: systat -vmstat 1
      
          2 users    Load  1.63  1.61  1.56                  Jul  7 10:21
         Mem usage:  97%Phy  2%Kmem                           VN PAGER   SWAP PAGER
      Mem:       REAL            VIRTUAL                      in   out     in   out
              Tot   Share      Tot    Share    Free  count     
      Act 176940K  28560K 1266228K   75572K 137088K  pages     
      All 178320K  29832K 1302784K  103512K                     ioflt  Interrupts
      Proc:                                                     cow    3954 total
        r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt        zfod        atkbd0 1
                  211       26K    3   1K   3K  15K             ozfod       uhci1 uhci
                                                               %ozfod   212 cpu0:timer
      16.3%Sys  25.8%Intr  0.2%User  0.0%Nice 57.8%Idle         daefr   198 cpu1:timer
      |    |    |    |    |    |    |    |    |    |    |       prcfr   208 cpu2:timer
      ========+++++++++++++                                     totfr   158 cpu3:timer
                                             172 dtbuf          react     7 virtio_pci
      Namei     Name-cache   Dir-cache    145602 desvn          pdwak       virtio_pci
         Calls    hits   %    hits   %      2910 numvn        5 pdpgs     1 virtio_pci
           559     559 100                  1337 frevn          intrn   243 virtio_pci
                                                           510M wire      1 virtio_pci
      Disks vtbd0                                        15508K act    2926 virtio_pci
      KB/t  32.00                                        45904K inact       virtio_pci
      tps       1                                          127M laund
      MB/s   0.03                                          134M free
      %busy     0                                          392M buf
      

      Memory buffers:

      root: netstat -m
      3046/4244/7290 mbufs in use (current/cache/total)
      1805/2559/4364/249786 mbuf clusters in use (current/cache/total/max)
      1805/1737 mbuf+clusters out of packet secondary zone in use (current/cache)
      0/4/4/124893 4k (page size) jumbo clusters in use (current/cache/total/max)
      0/0/0/37005 9k jumbo clusters in use (current/cache/total/max)
      0/0/0/20815 16k jumbo clusters in use (current/cache/total/max)
      4371K/6195K/10566K bytes allocated to network (current/cache/total)
      0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
      0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
      0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
      0/0/0 requests for jumbo clusters denied (4k/9k/16k)
      0 sendfile syscalls
      0 sendfile syscalls completed without I/O request
      0 requests for I/O initiated by sendfile
      0 pages read by sendfile as part of a request
      0 pages were valid at time of a sendfile request
      0 pages were valid and substituted to bogus page
      0 pages were requested for read ahead by applications
      0 pages were read ahead by sendfile
      0 times sendfile encountered an already busy page
      0 requests for sfbufs denied
      0 requests for sfbufs delayed
      

      Hardware info from the KVM host:

      $ lspci
      82:00.0 Ethernet controller [0200]: Solarflare Communications SFC9020 10G Ethernet Controller [1924:0803]
      
      $ ethtool -i enp130s0f0
      driver: sfc
      version: 4.1
      firmware-version: 3.3.2.1000
      expansion-rom-version: 
      bus-info: 0000:82:00.0
      supports-statistics: yes
      supports-test: yes
      supports-eeprom-access: no
      supports-register-dump: yes
      supports-priv-flags: no
      

      What steps should be taken for further troubleshooting? Thanks.

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Do you have pfBlocker running?

        S 1 Reply Last reply Reply Quote 0
        • S
          shshs @stephenw10
          last edited by shshs

          @stephenw10 , no I don't. And while one unit reboot utilisation of the active one is the same, so problem reoccurs.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Is there actually traffic present? Some sort of network loop?

            S 1 Reply Last reply Reply Quote 0
            • S
              shshs @stephenw10
              last edited by shshs

              @stephenw10 , there are no loops, traffic 50-100Mb/s both directions as I mentioned.

              I'm just curious is it normal to have 40-70% one core utilisation for system process that is called [kernel{vtnet1 rxq 0}] ? How much traffic can I get on my 10Gb NIC passing through pfSense without performance degrade. Does it has something to do with vNIC queue size or host KVM linux settings?

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, I mean I'd expect a single core to pass a lot more than that but it will obviously pass a lot more with multicores/multiqueue NICs.

                Do you have multiqueue disabled in the sysctls?

                [2.7.0-DEVELOPMENT][admin@cedev.stevew.lan]/root: sysctl hw.vtnet
                hw.vtnet.rx_process_limit: 512
                hw.vtnet.mq_max_pairs: 8
                hw.vtnet.mq_disable: 0
                hw.vtnet.lro_disable: 0
                hw.vtnet.tso_disable: 0
                hw.vtnet.csum_disable: 0
                

                Or in KVM

                Do those NICs shows as multiqueue in the boot logs?

                What changed between the 10% and 50% use situations?

                S 1 Reply Last reply Reply Quote 0
                • S
                  shshs @stephenw10
                  last edited by

                  @stephenw10 , ain't nothing changed from pfSense perspective when we noticed performance degrade.

                  Regarding multiqueue capability, on pfSense guest:

                  /root: sysctl hw.vtnet
                  hw.vtnet.rx_process_limit: 512
                  hw.vtnet.mq_max_pairs: 8
                  hw.vtnet.mq_disable: 0
                  hw.vtnet.lro_disable: 0
                  hw.vtnet.tso_disable: 0
                  hw.vtnet.csum_disable: 0
                  

                  On linux host:

                  # ethtool -S enp130s0f0
                  NIC statistics:
                       rx_noskb_drops: 0
                       rx_nodesc_trunc: 0
                       tx_bytes: 117663001808396
                       tx_good_bytes: 117663001808396
                       tx_bad_bytes: 0
                       tx_packets: 394346355386
                       tx_bad: 0
                       tx_pause: 0
                       tx_control: 0
                       tx_unicast: 393533426208
                       tx_multicast: 778204848
                       tx_broadcast: 34724330
                       tx_lt64: 0
                       tx_64: 133721490546
                       tx_65_to_127: 77339988519
                       tx_128_to_255: 62782083547
                       tx_256_to_511: 62501260602
                       tx_512_to_1023: 18919528903
                       tx_1024_to_15xx: 21005284882
                       tx_15xx_to_jumbo: 18076718387
                       tx_gtjumbo: 0
                       tx_collision: 0
                       tx_single_collision: 0
                       tx_multiple_collision: 0
                       tx_excessive_collision: 0
                       tx_deferred: 0
                       tx_late_collision: 0
                       tx_excessive_deferred: 0
                       tx_non_tcpudp: 0
                       tx_mac_src_error: 0
                       tx_ip_src_error: 0
                       rx_bytes: 221848242143190
                       rx_good_bytes: 221848242143190
                       rx_bad_bytes: 0
                       rx_packets: 452802863050
                       rx_good: 452802863050
                       rx_bad: 0
                       rx_pause: 0
                       rx_control: 0
                       rx_unicast: 452695437804
                       rx_multicast: 49781922
                       rx_broadcast: 57643324
                       rx_lt64: 0
                       rx_64: 325328389
                       rx_65_to_127: 130124419365
                       rx_128_to_255: 98720637028
                       rx_256_to_511: 95056512114
                       rx_512_to_1023: 32474881787
                       rx_1024_to_15xx: 96101084367
                       rx_15xx_to_jumbo: 0
                       rx_gtjumbo: 0
                       rx_bad_gtjumbo: 0
                       rx_overflow: 0
                       rx_false_carrier: 0
                       rx_symbol_error: 0
                       rx_align_error: 0
                       rx_length_error: 0
                       rx_internal_error: 0
                       rx_nodesc_drop_cnt: 0
                       tx_merge_events: 290566742
                       tx_tso_bursts: 0
                       tx_tso_long_headers: 0
                       tx_tso_packets: 0
                       tx_tso_fallbacks: 0
                       tx_pushes: 3305249032
                       tx_pio_packets: 0
                       tx_cb_packets: 1701382
                       rx_reset: 0
                       rx_tobe_disc: 0
                       rx_ip_hdr_chksum_err: 0
                       rx_tcp_udp_chksum_err: 11869
                       rx_inner_ip_hdr_chksum_err: 0
                       rx_inner_tcp_udp_chksum_err: 0
                       rx_outer_ip_hdr_chksum_err: 0
                       rx_outer_tcp_udp_chksum_err: 0
                       rx_eth_crc_err: 0
                       rx_mcast_mismatch: 26
                       rx_frm_trunc: 0
                       rx_merge_events: 0
                       rx_merge_packets: 0
                       tx-0.tx_packets: 394346403017
                       tx-1.tx_packets: 2
                       tx-2.tx_packets: 0
                       tx-3.tx_packets: 0
                       tx-4.tx_packets: 0
                       tx-5.tx_packets: 0
                       tx-6.tx_packets: 0
                       tx-7.tx_packets: 0
                       tx-8.tx_packets: 1
                       tx-9.tx_packets: 0
                       tx-10.tx_packets: 2
                       tx-11.tx_packets: 3
                       rx-0.rx_packets: 35352224928
                       rx-1.rx_packets: 35097722216
                       rx-2.rx_packets: 34532357156
                       rx-3.rx_packets: 50495387611
                       rx-4.rx_packets: 36002566213
                       rx-5.rx_packets: 34079572593
                       rx-6.rx_packets: 38382266621
                       rx-7.rx_packets: 38362390639
                       rx-8.rx_packets: 33880795280
                       rx-9.rx_packets: 48521808392
                       rx-10.rx_packets: 34180770870
                       rx-11.rx_packets: 33915051801
                  
                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, actually the ALTQ changes in pfSense prevent that (after some playing with settings!).

                    That loading where the actual pf load is shown. Anything changed there? More packges? Longer rule lists?

                    Steve

                    S 1 Reply Last reply Reply Quote 0
                    • S
                      shshs @stephenw10
                      last edited by

                      @stephenw10 said in Performance issue on virtualised pfSense:

                      That loading where the actual pf load is shown. Anything changed there? More packges? Longer rule lists?

                      We don't use shaping on pfSense.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        That doesn't matter, multiqueue is disabled for vtnet(4) in the pfSense build to allow ALTQ to run on it whether or not it's actually used.

                        C 1 Reply Last reply Reply Quote 0
                        • C
                          chrcoluk @stephenw10
                          last edited by chrcoluk

                          @stephenw10 Yeah multiqueue doesnt work for vtnet on pfSense,I requested it on redmine as I noticed they added a toggle for the hyper-v net driver, but the response was because its a compile time only flag they wont be able to add a toggle.

                          pfSense CE 2.8.0

                          S 1 Reply Last reply Reply Quote 0
                          • S
                            shshs @chrcoluk
                            last edited by

                            @chrcoluk , @stephenw10, guys could you please explain how multi queue capability impacts performance. Are there ways to mitigate it, for example adding more CPU/RAM or it's just a limit of virtualised appliance routining capability?

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              The NICs can only have one Rx and one Tx queue which means they are only serviced by one CPU core. So to go faster you need that CPU core to run faster, more CPU cores doesn't help.
                              That's particularly true here where you are running as 'router on a stick' so only have one NIC/queue-pair doing all the work.

                              Nothing has changed in that respect though. There's nothing that should have suddenly increased the load for the same throughput.

                              Steve

                              S 1 Reply Last reply Reply Quote 0
                              • S
                                shshs @stephenw10
                                last edited by

                                @stephenw10 , ok thanks for explanation.
                                In case of using smart NICs like Intel X710, do you think that network can perform better in the same conditions. Or it's just a limit to have one dedicated CPU core to proceed network traffic on ~100Mb/s rate? Can smart NIC itself proceed network traffic in hardware instead of software? Or is it better to setup pfSense cluster on a separate servers?

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Any multiqueue NIC can spread the load across multiple CPU cores for most traffic types. That includes other types in KVM like vmxnet.

                                  Other hardware offloading is not usually of much use in pfSense, or any router, where the router is not the end point for TCP connections. So 'TCP off-loading' is not supported.

                                  Steve

                                  S 1 Reply Last reply Reply Quote 0
                                  • S
                                    shshs @stephenw10
                                    last edited by shshs

                                    @stephenw10, how can I know if my ethernet card will work under vmxnet driver in KVM with multiqueue capability? Actually I was expecting performance degrade on speeds above ~1Gb/s. Do you think that running pfSense on bare metal server can provide me a performance near to 10Gig firewalling capacity?

                                    Let's say having the same hardware on which I'm hosting pfSense now, will it help if I setup it natively, without KVM? Will I have multiQ capability for my 10Gb NICs?

                                    1 Reply Last reply Reply Quote 0
                                    • stephenw10S
                                      stephenw10 Netgate Administrator
                                      last edited by

                                      It doesn't matter what the hardware is as long as the hypervisor supports it. Unless you are using PCI pass-through the hypervisor presents the NIC type to the VM with whatever you've configured it as. I'm using Proxmox here, which is KVM, and vmxnet is one of the NIC types it can present.

                                      S 1 Reply Last reply Reply Quote 0
                                      • S
                                        shshs @stephenw10
                                        last edited by shshs

                                        @stephenw10 , is there any chance to change virtio to vmxnet network drivers in virsh and get multiqueue NIC? It's a big deal to change such settings in our environment that's why I'm asking. If it was in my lab I would easily test, but if I do it now, I may loose virtual appliance and access to it. Is it worth even trying to change virtio to vmxnet ?

                                        stephenw10S 1 Reply Last reply Reply Quote 0
                                        • stephenw10S
                                          stephenw10 Netgate Administrator @shshs
                                          last edited by

                                          @shshs said in Performance issue on virtualised pfSense:

                                          Is it worth even trying to change virtio to vmxnet ?

                                          Only if you're not seeing the throughput you need IMO. High CPU use on one core is not a problem until it hits 100% and you need more.

                                          Steve

                                          S 1 Reply Last reply Reply Quote 0
                                          • S
                                            shshs @stephenw10
                                            last edited by

                                            @stephenw10, thanks a lot! How can I verify if NIC is multiQ, except of verifying vmxnet driver?

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.