Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    100% CPU load after upgrading from 2.4.5-p1 to 2.6.0 on some firewalls

    Scheduled Pinned Locked Moved General pfSense Questions
    5 Posts 3 Posters 639 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • U
      unico-dm
      last edited by

      Hi folks. Maybe you can help us on this one. We see 100% CPU load on some upgraded firewalls. This renders them unusuable and affects our network.

      What are we doing? At the moment we roll out pfSense version 2.6.0 to all our firewall-pairs. We do that as an inplace upgrade from 2.4.5-p1. Short version of the procedure: We set the target firewall to permanent maintenance mode, switch off any sync between the pairs and then start the upgrade.

      What's wrong? Half a dozen upgrades went well - these firewalls are working fine and are performing. But now we've encountered strange behaviour after upgrading two firewalls. Symptoms: The affected firewalls are not reachable most of the time and we see network issues.

      We suspect this is because of two things.

      1. most of the time cpu load is 100%. Following processes use up all cpu resources.
      /usr/local/bin/dpinger
      /usr/sbin/syslogd
      [kernel{if_io_tqg_4}]
      [intr{swi1: pfsync}]
      

      Exception: Sometimes for 2-3h CPU load is OK but then everything starts again.

      1. Log analysis shows gateway errors. All gateways on all interfaces have high latency or even timeouts. I can see as well CARP events (MASTER/BACKUP flapping even with maintenance mode on, thus the network issues). The gateways are fine and reachable. So no reason why the firewall shouldn't be able to ping the gateways.

      What have we done so far? Set up firewall from scratch with config.xml. With the fresh set up the firewall behaves normally. But with the imported configuration the issues start.

      Next steps We now try to set the firewalls up bit by bit to try to see, which configuration triggers the problem. But as the configuration is quite large this is very very very time-consuming.

      Maybe one of you folks can help us find the cause a bit faster. Is there any known issue that explains this behaviour? Any kind of configuration that could trigger high CPU load? Or how could we analyze further? Thanks in advance for your help! Much appreciated!

      S 1 Reply Last reply Reply Quote 1
      • S
        SteveITS Galactic Empire @unico-dm
        last edited by

        @unico-dm re upgrade process, yours sounds complicated. Netgate recommends to upgrade the backup, fail over, upgrade the primary and that what we’ve always done.
        https://docs.netgate.com/pfsense/en/latest/install/upgrade-guide-ha.html

        2.4 to 2.6 is a pretty big skip…

        If gzip was running I’d suggest turning off log compression but that’s not usually necessary unless it’s a slow CPU.

        Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
        When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
        Upvote 👍 helpful posts!

        U 1 Reply Last reply Reply Quote 0
        • U
          unico-dm @SteveITS
          last edited by

          @steveits

          Upgrade procedure is as you've described it. Sorry I didn't make that clear.

          Upgrade from 2.4.5-p1 to 2.6.0 is a big step, yep.

          Processors are Intel Xeon D-1518 and Intel Xeon E5-2600 v4. Our Workload usually bores the CPU. So no issue here. (Similar upgraded hardware still is bored on 2.6.0:) Will try to disable compression to ease analysis.

          I will gather more info according to https://docs.netgate.com/pfsense/en/latest/troubleshooting/high-cpu-load.html

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            Install the System Patches package. In the list of recommended patches apply the pfcounter patch for this bug.
            In some situations that can use all the CPU if it gets stuck in a reload loop.

            Steve

            U 1 Reply Last reply Reply Quote 1
            • U
              unico-dm @stephenw10
              last edited by unico-dm

              @stephenw10 When we installed the patch, the symptoms were completely gone So thanks a lot for pointing us in the right direction!

              So case solved ✅

              Additional info:

              • We think we know why this happens only on those two firewalls. It happens that they are the ones with the most rules and aliases (by far) in our environment.
              • we couldn't pinpoint it at first, because the 15min reload interval kept the load on maximum, so we couldn't see the interval (=the underlying mechanism) at all. But we could trigger the behavior in a "calm phase" by editing and applying a random rule (and thus triggering the reload). Then the load would be up for several minutes.
              1 Reply Last reply Reply Quote 1
              • First post
                Last post
              Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.