Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    High CPU and load very high after updating to 2.7.1 and 2.7.2

    Scheduled Pinned Locked Moved General pfSense Questions
    12 Posts 3 Posters 3.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      What NICs are you using? What packages are you running?

      What does top -HaSP actually show in 2.7.2?

      Steve

      1 Reply Last reply Reply Quote 0
      • C
        cocojeff3
        last edited by

        The test after clearing my config and formatting the box I was running a base config with no customizations or packages. My NICs are listed below. I did collect the top -HaSP however the output and screen shots were lost due to a power outage after getting the box reformatted and my normal 2.7.0 config back in place to get my systems back up and running. If that is 100% required to make progress I can re-upgrade my box back to 2.7.2 and collect that data. Please advise if that is needed to make progress on this issue. I only remember the php-fpm: pool nginx from memory as that was the only thing that stood out as odd when I compared it to my VM running 2.7.0 at the same time.

        em0: <Intel(R) PRO/1000 PT 82571EB/82571GB (Quad Copper)> port 0xd880-0xd89f mem 0xfea80000-0xfea9ffff,0xfea60000-0xfea7ffff irq 16 at device 0.0 on pci3
        em1: <Intel(R) PRO/1000 PT 82571EB/82571GB (Quad Copper)> port 0xdc00-0xdc1f mem 0xfeae0000-0xfeafffff,0xfeac0000-0xfeadffff irq 17 at device 0.1 on pci3
        em2: <Intel(R) PRO/1000 PT 82571EB/82571GB (Quad Copper)> port 0xe880-0xe89f mem 0xfeb80000-0xfeb9ffff,0xfeb60000-0xfeb7ffff irq 17 at device 0.0 on pci4
        em3: <Intel(R) PRO/1000 PT 82571EB/82571GB (Quad Copper)> port 0xec00-0xec1f mem 0xfebe0000-0xfebfffff,0xfebc0000-0xfebdffff irq 18 at device 0.1 on pci4
        em4: <Intel(R) 82567LF-3 ICH10> port 0xc880-0xc89f mem 0xfe9c0000-0xfe9dffff,0xfe9fa000-0xfe9fafff irq 20 at device 25.0 on pci0

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Hmm, nothing very exotic there. I'd expect em NICs to work fine. I have a system here using a similar CPU and NICs that doesn't show that.

          The best way to solve something like this is if we can replicate it. The next best way is to get as much info as we can from the machine hitting the issue.

          1 Reply Last reply Reply Quote 0
          • C
            cocojeff3
            last edited by

            I completed the update again and it remains the same behavior as with the last upgrade. The upgrade was completed without issue. The boot time from start to finish went from 4:10.46 on 2.7.0 to 21:50.14 on 2.7.2. I have the following packages installed:
            pfSense-pkg-arpwatch
            pfSense-pkg-Avahi
            pfSense-pkg-Backup
            pfSense-pkg-darkstat
            pfSense-pkg-nmap
            pfSense-pkg-pfBlockerNG-devel
            pfSense-pkg-RRD_Summary
            pfSense-pkg-Service_Watchdog
            pfSense-pkg-Status_Traffic_Totals
            pfSense-pkg-suricata
            pfSense-pkg-System_Patches
            The load one hour after boot is 2.34 which is almost 4 times more than what is was running 2.7.0.

            I have the following logs from before and after the upgrade and the OS boot information included in the attached zip file.
            top -HaSP
            systat -vmstat 1
            netstat -m
            systat -iostat 1

            Data.zip

            1 Reply Last reply Reply Quote 0
            • S
              Squish
              last edited by Squish

              https://forum.netgate.com/topic/184245/high-interrupt-cpu-usage-in-v2-7-1/11

              This seems to be the same issue here, where it was observed as interrupts in other hypervisors and on bare metal. I haven't been able to find any actual cause or solution. My best guess so far is that it is kernel related, something to do with interrupt moderation or device polling. I haven't tried rebuilding the kernel yet.

              I did observe similar spikes when viewing the web UI as well.

              Since it seems it is not related to virtualization after all, I will wait to see where best to continue this and be sure to post any of my results.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, some of those things like acb upload stalling make it look like a connectivity issue. Can the firewall ping out as expected?

                1 Reply Last reply Reply Quote 0
                • C
                  cocojeff3
                  last edited by

                  Yes the firewall outgoing connectivity works fine. I can ping out from the device as expected and its services can connect to get their updates ect.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Hmm, it could be the encryption part of the code, maybe using something in the new openssl version.

                    Does that firewall have any crypto hardware that could have been in use?

                    1 Reply Last reply Reply Quote 0
                    • C
                      cocojeff3
                      last edited by

                      No this box does not have any crypto hardware to use.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        Hmm, OK well if I had that box here I would start by testing a clean install with a basic config and see if that still hits it. That would narrow the issue to either something unusual in the hardware or something specific to the config.

                        1 Reply Last reply Reply Quote 0
                        • C
                          cocojeff3
                          last edited by

                          I did do that when testing last weekend and I can confirm that with a factory default config the CPU usage and load was greater on 2.7.1 and 2.7.2. This is not an issue with the hardware, or any specify post installation configuration. This is an issue with the base system running 2.7.1 and 2.7.2 on this hardware. is there some log or debug level that i can get you output for that might allow you to narrow down the issue so that I can get this box back to running at normal utilization?

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.