Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    2.4.5-p1 HA carp setup on Hyper-V, high hvevent0 CPU usage

    Scheduled Pinned Locked Moved General pfSense Questions
    11 Posts 2 Posters 713 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      I would check vmstat -i and ps -auxwwd so if anything is showing high there.

      It may not though. That seems to be load servicing the hv_vmbus.

      Steve

      1 Reply Last reply Reply Quote 0
      • R
        rgijsen
        last edited by rgijsen

        This post is deleted!
        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Yes, that's exactly what I would try; failover to the secondary and see if the load exists there when it is master.

          Steve

          1 Reply Last reply Reply Quote 0
          • R
            rgijsen
            last edited by

            As stated before when I do a carp failover to the second node, or even shutdown the primary node, the second node shows practically no load at all. However, with the secondary node being active, the primary stills shows that load on the hvevent process.
            So my next step I guess will be to mount the pfsense disk to a new VM with the same NIC config, and see if anything's wrong from that perspective. If that fails, I can still try rebuilding the whole VM.
            Or would there be any other clues?

            1 Reply Last reply Reply Quote 0
            • R
              rgijsen
              last edited by rgijsen

              I had to delete my second post as it turns out I forgot to mask the IP addresses 😖

              I'll post it again below, but mind that it actually is a response to stephenw10's first reply.

              vmstat -i

              interrupt                          total       rate
              cpu0:timer                      18539648        160
              cpu1:timer                      14713803        127
              cpu2:timer                      15964083        138
              cpu3:timer                      15701489        136
              cpu0:hyperv                    478580870       4130
              cpu1:hyperv                     99633883        860
              cpu2:hyperv                     99907288        862
              cpu3:hyperv                     90973736        785
              Total                          834014800       7198
              

              ps -auxwwd

              USER    PID  %CPU %MEM    VSZ   RSS TT  STAT STARTED       TIME COMMAND
              root      0  88.8  0.1      0   800  -  DLs  Wed01    499:07.14 [kernel]
              root     11 292.9  0.0      0    64  -  RNL  Wed01   6830:15.44 - [idle]
              root     12   1.0  0.0      0   224  -  WL   Wed01      8:17.97 - [intr]
              root      1   0.0  0.1   5016   848  -  ILs  Wed01      0:00.01 - /sbin/init --
              www   32077  16.0  0.8  22016 11932  -  Ss   Wed01    124:37.03 |-- /usr/local/sbin/haproxy -f /var/etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -D
              root    344   0.0  1.7  94900 25024  -  Ss   Wed01      0:02.65 |-- php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm)
              root  10455   0.0  2.6  99124 39452  -  S    23:31      0:41.73 | |-- php-fpm: pool nginx (php-fpm)
              root  55168   0.0  2.6  99124 39256  -  S    23:34      0:41.27 | |-- php-fpm: pool nginx (php-fpm)
              root  98139   0.0  2.6  99124 39256  -  S    23:08      0:42.40 | `-- php-fpm: pool nginx (php-fpm)
              root    359   0.0  0.2   6756  2640  -  INs  Wed01      0:00.02 |-- /usr/local/sbin/check_reload_status
              root    361   0.0  0.2   6756  2396  -  IN   Wed01      0:00.00 | `-- check_reload_status: Monitoring daemon of check_reload_status
              root    414   0.0  0.3   9160  4984  -  Is   Wed01      0:00.06 |-- /sbin/devd -q -f /etc/pfSense-devd.conf
              root   7792   0.0  0.4  14460  6796  -  Is   Wed01      0:00.00 |-- /usr/local/sbin/mpd5 -b -d /var/etc/l2tp-vpn -p /var/run/l2tp-vpn.pid -s l2tps l2tps
              root   8312   0.0  0.2   6380  2376  -  Is   Wed01      0:00.18 |-- /usr/sbin/cron -s
              root  13350   0.0  0.4  12472  5948  -  Ss   Wed01      0:08.50 |-- /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
              proxy 14438   0.0  0.1   6268  2212  -  Ss   Wed01      0:02.25 |-- /usr/sbin/ftp-proxy -a <masked ip> -v
              root  46084   0.0  0.5  21636  7396  -  Is   11:02      0:00.00 |-- nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf (nginx)
              root  46259   0.0  0.6  23684  8360  -  S    11:02      0:14.40 | |-- nginx: worker process (nginx)
              root  46466   0.0  0.5  23684  8280  -  I    11:02      0:03.23 | `-- nginx: worker process (nginx)
              root  49364   0.0  0.1   6208  2220  -  Is   11:07      0:04.66 |-- /usr/sbin/hv_kvp_daemon
              root  49818   0.0  0.1   6196  2004  -  Is   11:07      0:00.00 |-- /usr/sbin/hv_vss_daemon
              root  51968   0.0  0.2   6976  2612  -  IN   00:01      0:00.00 |-- /bin/sh /etc/rc.update_pkg_metadata
              root  52083   0.0  0.1   4144  1824  -  INC  00:01      0:00.00 | `-- sleep 78926
              root  60998   0.0  0.2   8168  3736  -  Is   Wed01      0:00.00 |-- /usr/local/libexec/ipsec/starter --daemon charon
              root  61228   0.0  1.1  55900 17208  -  Is   Wed01      0:10.75 | `-- /usr/local/libexec/ipsec/charon --use-syslog
              root  73334   0.0  0.5  12676  7344  -  Is   11:03      0:00.00 |-- /usr/sbin/sshd
              root   7545   0.0  0.5  12968  7828  -  Ss   09:53      0:00.27 | `-- sshd: <user>@pts/0 (sshd)
              root  37777   0.0  0.2   6976  2640  0  Is   09:53      0:00.00 |   `-- /bin/sh /etc/rc.initial
              root  48222   0.0  0.2   7284  3484  0  S    09:53      0:00.03 |     `-- /bin/tcsh
              root   2424   0.0  0.2   6824  2672  0  R+   10:00      0:00.00 |       `-- ps -auxwwd
              root  77295   0.0  0.2   6908  2344  -  Is   Wed01      0:15.63 |-- /usr/local/bin/dpinger -S -r 0 -i <gateway> -B <masked ip> -p /var/run/dpinger_<gateway>~<masked ip>~<masked ip>.pid -u /var/run/dpinger_<gateway>~<masked ip>~<masked ip>.sock -C /etc/rc.gateway_alarm -d 1 -s 500 -l 2000 -t 60000 -A 1000 -D 500 -L 20 <masked ip>
              root  86712   0.0  0.4  10312  5692  -  Ss   Wed01    208:14.88 |-- /usr/local/sbin/openvpn --config /var/etc/openvpn/server1.conf
              root  88850   0.0  0.2   6968  2808  -  Ss   Wed01      0:52.38 |-- /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid
              root  90697   0.0  0.2   6412  2476  -  Ss   Wed01      0:40.06 |-- /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -l /tmp/haproxy_chroot/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf
              root  43415   0.0  0.2   6976  2488  -  Is   10:59      0:00.00 | `-- /bin/sh /usr/local/sbin/sshguard -i /var/run/sshguard.pid
              root  43635   0.0  0.1   6196  1876  -  I    10:59      0:00.00 |   |-- /bin/cat
              root  43907   0.0  0.3  12016  5184  -  IC   10:59      0:00.00 |   |-- /usr/local/libexec/sshg-parser
              root  44184   0.0  0.2   6536  2372  -  IC   10:59      0:00.05 |   |-- /usr/local/libexec/sshg-blocker
              root  44469   0.0  0.2   6976  2488  -  I    10:59      0:00.00 |   `-- /bin/sh /usr/local/sbin/sshguard -i /var/run/sshguard.pid
              root  44518   0.0  0.2   6976  2476  -  I    10:59      0:00.00 |     `-- /bin/sh /usr/local/libexec/sshg-fw-pf
              root  91192   0.0  0.1   6192  1896  -  Is   Wed01      0:00.00 |-- /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/bin/ping_hosts.sh
              root  91342   0.0  0.1   6192  1912  -  I    Wed01      0:00.04 | `-- minicron: helper /usr/local/bin/ping_hosts.sh  (minicron)
              root  91357   0.0  0.1   6192  1896  -  Is   Wed01      0:00.00 |-- /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts
              root  91586   0.0  0.1   6192  1912  -  I    Wed01      0:00.00 | `-- minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts  (minicron)
              root  91798   0.0  0.1   6192  1896  -  Is   Wed01      0:00.00 |-- /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pid /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data
              root  92231   0.0  0.1   6192  1912  -  I    Wed01      0:00.00 | `-- minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data  (minicron)
              root  95054   0.0  1.9 241920 28616  -  Is   Wed01      0:13.11 |-- /usr/local/sbin/filterdns -p /var/run/filterdns.pid -i 300 -c /var/etc/filterdns.conf -d 1
              root  47601   0.0  0.2   6976  2536 v0- SN   11:02      0:15.75 |-- /bin/sh /var/db/rrd/updaterrd.sh
              root  91893   0.0  0.1   4144  1824  -  SNC  10:00      0:00.00 | `-- sleep 60
              root  69685   0.0  0.1   6316  2032 v0  Is+  11:26      0:00.00 |-- /usr/libexec/getty Pc ttyv0
              root  61754   0.0  0.1   6316  2032 v1  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv1
              root  62014   0.0  0.1   6316  2032 v2  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv2
              root  62204   0.0  0.1   6316  2032 v3  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv3
              root  62552   0.0  0.1   6316  2032 v4  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv4
              root  62819   0.0  0.1   6316  2032 v5  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv5
              root  62865   0.0  0.1   6316  2032 v6  Is+  Wed01      0:00.00 |-- /usr/libexec/getty Pc ttyv6
              root  63124   0.0  0.1   6316  2032 v7  Is+  Wed01      0:00.00 `-- /usr/libexec/getty Pc ttyv7
              root      2   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [crypto]
              root      3   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [crypto returns 0]
              root      4   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [crypto returns 1]
              root      5   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [crypto returns 2]
              root      6   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [crypto returns 3]
              root      7   0.0  0.0      0    32  -  DL   Wed01      0:00.00 - [cam]
              root      8   0.0  0.0      0    16  -  DL   Wed01      0:00.02 - [soaiod1]
              root      9   0.0  0.0      0    16  -  DL   Wed01      0:00.02 - [soaiod2]
              root     10   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [audit]
              root     13   0.0  0.0      0    64  -  DL   Wed01      0:00.00 - [ng_queue]
              root     14   0.0  0.0      0    48  -  DL   Wed01      0:00.03 - [geom]
              root     15   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [sequencer 00]
              root     16   0.0  0.0      0    16  -  DL   Wed01      0:00.02 - [soaiod3]
              root     17   0.0  0.0      0    16  -  DL   Wed01      0:00.01 - [soaiod4]
              root     18   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [sctp_iterator]
              root     19   0.0  0.0      0    16  -  DL   Wed01      0:26.85 - [pf purge]
              root     20   0.0  0.0      0    16  -  DL   Wed01      0:40.00 - [rand_harvestq]
              root     21   0.0  0.0      0    48  -  DL   Wed01      0:02.03 - [pagedaemon]
              root     22   0.0  0.0      0    16  -  DL   Wed01      0:00.00 - [vmdaemon]
              root     23   0.0  0.0      0    16  -  DNL  Wed01      0:00.00 - [pagezero]
              root     24   0.0  0.0      0    32  -  DL   Wed01      0:02.41 - [bufdaemon]
              root     25   0.0  0.0      0    16  -  DL   Wed01      0:00.54 - [bufspacedaemon]
              root     26   0.0  0.0      0    16  -  DL   Wed01      0:04.46 - [syncer]
              root     27   0.0  0.0      0    16  -  DL   Wed01      0:00.60 - [vnlru]
              root     66   0.0  0.0      0    16  -  DL   Wed01      0:00.09 - [md0]
              

              No obvious culprits to me, just kernel and specifically hvevent0 being high.

              The only obvious difference in both pfSense VM's is that the primary one runs on a dedicated HPe DL360 gen8, which, even with the latest firmwares and bios, has Current PTI status: Enabled, by means of 'auto detect'. The second node runs on a Gen10 with loads of other VMs, but that one has PTI disabled. That shouldn't make too much difference, especially as it's the hyper-v servics that get loaded, rather than some actual BSD OS related stuff.

              I'll try moving the passive box to the same host as the primary, and see what happens.

              1 Reply Last reply Reply Quote 0
              • R
                rgijsen
                last edited by

                I've had no success so far. In the past I've had one VM which had something of a corrupted configuration (from hypervisor perspective) which made it very slow to live-migrate to another host. In the end I removed that VM, created a new VM with the same disk / network config and that issue was gone.
                I did the same with this pfSense VM, but alas, no cigar. The next thing I'm going to try is moving this first node to a gen10 host, like the second node is. While pfSense shouldn't be that CPU intensive, who know there is something of a hardware issue, even if virtualized. If THAT doesn't work, I guess I'll reinstall it from scratch and restore the config.
                It'll take a while though, I can't live failover pfSense unfortunately, as most connections are through haproxy, which doesn't sync states.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  It could well be something like that. That load appears to be all hypervisor generated, it's unclear what could be causing it.
                  https://www.freebsd.org/cgi/man.cgi?query=hv_vmbus&sektion=4

                  Steve

                  1 Reply Last reply Reply Quote 0
                  • R
                    rgijsen
                    last edited by

                    Coming back to this, I have to come clear. Next weekend is my monthly maintenance, and to rule some things out I did a carp failover to the second node this night. Practically no CPU load on the second node. However, now there's actually a lot of traffic going on, I have the same phenomenon on the second node as well. I guess I ran top -a on that second node, rather than top -aSH, showing no kernel use at all. Now on the second node I have haproxy consuming about 9-11% and kernel{hvevent0} about 15%.

                    Even if I wonder why there's so much kernel usage on a rather low overall load (about 10.000 states, that't peanuts), it seems to be a hypervisor integration issue, or maybe even just regular kernel load. I can't try another hypervisor or even physical machine, as these pfSense boxes are in production. In my test setup I can't generate that amount of load, which is why I never saw the issue there probably.

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Ah, OK so when you failover you are actually seeing about the same loading on the other node?

                      It's probably just where the load appears for that. Unless you're actually seeing an issue there I wouldn't worry too much.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • R
                        rgijsen
                        last edited by

                        I'll leave it at it is. Thanks for the insights!

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.