• Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login
Netgate Discussion Forum
  • Categories
  • Recent
  • Tags
  • Popular
  • Users
  • Search
  • Register
  • Login

Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5

Problems Installing or Upgrading pfSense Software
40
141
40.5k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B
    bmeeks @A Former User
    last edited by bmeeks Apr 1, 2020, 7:20 PM Apr 1, 2020, 7:16 PM

    @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

    @bmeeks @BBcan177

    I turned off bogons. I deactivated pfblocker. I set the max table size to 60000. Rebooted. The reboot was quick again, like 2.4.4-p3. I ping6 google.com from a lan side device while doing a filter reload. The ping times don't change by any meaningful amount.

    Makes me wonder if the pfctl fix was overlooked when the release was built?

    Set max table size to 100000. Reboot. Reboot is longer again. Turn on pfblocker and some ip lists that add up to more than 60000 but less than 100000. Problem returns. Latency when reloading the filters. Bigger tables bigger problem.

    Edited to add: the max table size doesn't appear to make any difference, other than setting a hard limit on table size. The actual size of the aliases/tables is what triggers the problem.

    It sure looks to me that anything over that sixty some thousand mark causes the issue. I could be wrong, wouldn't be the first time, but this sure looks like the issue.

    See my later edit to my post here: https://forum.netgate.com/topic/151690/increased-memory-and-cpu-spikes-causing-latency-outage-with-2-4-5/91. I have a theory why a value less that 65,536 seems to work better.

    ? 1 Reply Last reply Apr 1, 2020, 7:30 PM Reply Quote 0
    • G
      getcom @bmeeks
      last edited by Apr 1, 2020, 7:27 PM

      This post is deleted!
      1 Reply Last reply Reply Quote 0
      • ?
        A Former User @bmeeks
        last edited by Apr 1, 2020, 7:30 PM

        @bmeeks Makes sense, since they're all loaded with one call. Total number of all tables would have to be under that number.

        Also makes me take note of the fact that the "fork" of pfsense is still on 11.2 even as they have been more aggressive with pursuing freebsd upgrades historically.

        B 1 Reply Last reply Apr 1, 2020, 7:32 PM Reply Quote 0
        • N
          nzkiwi68 @getcom
          last edited by Apr 1, 2020, 7:32 PM

          @getcom Yep, that looks like my SiteA issue, if both firewalls are on, zero VPN traffic passes yet CARP is fine.

          • Power down the backup firewall and VPN traffic instantly starts passing, or

          • Power down the primary firewall and after failover stuff occurs, VPN traffic starts passing

          1 Reply Last reply Reply Quote 0
          • B
            bmeeks @A Former User
            last edited by Apr 1, 2020, 7:32 PM

            @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

            @bmeeks Makes sense, since they're all loaded with one call. Total number of all tables would have to be under that number.

            Also makes me take note of the fact that the "fork" of pfsense is still on 11.2 even as they have been more aggressive with pursuing freebsd upgrades historically.

            A possible solution for loading very large numbers of tables or IP addresses would be split them up into chunks of 65,000 or less and iterate through a loop making a series of ioctl() function calls to create the tables or addresses.

            1 Reply Last reply Reply Quote 0
            • G
              getcom @bmeeks
              last edited by Apr 1, 2020, 7:35 PM

              @bmeeks said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

              @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

              @bmeeks @BBcan177

              I turned off bogons. I deactivated pfblocker. I set the max table size to 60000. Rebooted. The reboot was quick again, like 2.4.4-p3. I ping6 google.com from a lan side device while doing a filter reload. The ping times don't change by any meaningful amount.

              Makes me wonder if the pfctl fix was overlooked when the release was built?

              Set max table size to 100000. Reboot. Reboot is longer again. Turn on pfblocker and some ip lists that add up to more than 60000 but less than 100000. Problem returns. Latency when reloading the filters. Bigger tables bigger problem.

              Edited to add: the max table size doesn't appear to make any difference, other than setting a hard limit on table size. The actual size of the aliases/tables is what triggers the problem.

              It sure looks to me that anything over that sixty some thousand mark causes the issue. I could be wrong, wouldn't be the first time, but this sure looks like the issue.

              The actual patch file can be accessed here: https://security.FreeBSD.org/patches/EN-20:04/pfctl.patch. What the patch does is remove the former arbitrary hardcoded limit of 65,535 (defined as PF_TABLES_MAX_REQUEST) and allows the use of a sysctl parameter instead. Deeper research into the other pf related source code would be required to determine if allowing that larger PF_TABLES_MAX_REQUEST value has an adverse impact.

              Looking a bit farther into what the patch actually does gives me a theory. The 65,535 number does not appear to be a limit on the number of IP addresses in a given table. It appears, instead, to be a limit on the number of tables or addresses you can add to the firewall during a single call to the corresponding ioctl() function. That limit was formerly hardcoded to 65,535. Now, with the addition of a sysctl variable for customizing this limit, I can envision a scenario where with a very high value for this new sysctl value that you are overloading the other pf areas. In particular, you would be requesting "too many tables and/or addresses in a single ioctl() call". So this may well be why lowering the value improves performance! You are no longer "overloading" the other ioctl routines that are actually creating the tables or addresses in RAM.

              This makes sense, but as a consequence, this then makes pfBlockerNG unusable or with larger values, pfSense itself makes it unusable. The choice is yours...if you currently need both, you will need a fix or rollback in the near future.

              B 1 Reply Last reply Apr 1, 2020, 8:15 PM Reply Quote 0
              • B
                bmeeks @getcom
                last edited by bmeeks Apr 1, 2020, 8:28 PM Apr 1, 2020, 8:15 PM

                @getcom said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                @bmeeks said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                @bmeeks @BBcan177

                I turned off bogons. I deactivated pfblocker. I set the max table size to 60000. Rebooted. The reboot was quick again, like 2.4.4-p3. I ping6 google.com from a lan side device while doing a filter reload. The ping times don't change by any meaningful amount.

                Makes me wonder if the pfctl fix was overlooked when the release was built?

                Set max table size to 100000. Reboot. Reboot is longer again. Turn on pfblocker and some ip lists that add up to more than 60000 but less than 100000. Problem returns. Latency when reloading the filters. Bigger tables bigger problem.

                Edited to add: the max table size doesn't appear to make any difference, other than setting a hard limit on table size. The actual size of the aliases/tables is what triggers the problem.

                It sure looks to me that anything over that sixty some thousand mark causes the issue. I could be wrong, wouldn't be the first time, but this sure looks like the issue.

                The actual patch file can be accessed here: https://security.FreeBSD.org/patches/EN-20:04/pfctl.patch. What the patch does is remove the former arbitrary hardcoded limit of 65,535 (defined as PF_TABLES_MAX_REQUEST) and allows the use of a sysctl parameter instead. Deeper research into the other pf related source code would be required to determine if allowing that larger PF_TABLES_MAX_REQUEST value has an adverse impact.

                Looking a bit farther into what the patch actually does gives me a theory. The 65,535 number does not appear to be a limit on the number of IP addresses in a given table. It appears, instead, to be a limit on the number of tables or addresses you can add to the firewall during a single call to the corresponding ioctl() function. That limit was formerly hardcoded to 65,535. Now, with the addition of a sysctl variable for customizing this limit, I can envision a scenario where with a very high value for this new sysctl value that you are overloading the other pf areas. In particular, you would be requesting "too many tables and/or addresses in a single ioctl() call". So this may well be why lowering the value improves performance! You are no longer "overloading" the other ioctl routines that are actually creating the tables or addresses in RAM.

                This makes sense, but as a consequence, this then makes pfBlockerNG unusable or with larger values, pfSense itself makes it unusable. The choice is yours...if you currently need both, you will need a fix or rollback in the near future.

                I'm going to go back and research how this was coded in FreeBSD 11.2/RELEASE (which is what 2.4.4_p3 was based on). Will be interesting to see what may have changed from 11.2 to 11.3 of FreeBSD.

                LATER EDIT: The PF_TABLES_MAX_REQUEST limit was a hardcoded value (a defined constant, actually) in the 11.2-RELEASE of FreeBSD. The recent FreeBSD 11.3/STABLE patch I referenced in an earlier post above is titled "Missing Tuneable", but I think it might really should have been titled "New Tuneable". Using "Missing" in the title could be taken by some to mean it was there previously in earlier versions and was erroneously removed and is being restored by the patch. Instead, it appears to me the patch added this system tuneable as a new feature and removed the former hardcoded limit. Anecdotal evidence from posters in this thread indicates allowing that former hard limit to now be essentially unlimited can produce undesirable side effects.

                ? G 2 Replies Last reply Apr 1, 2020, 9:13 PM Reply Quote 0
                • ?
                  A Former User @bmeeks
                  last edited by Apr 1, 2020, 9:13 PM

                  @bmeeks

                  If I'm not mistaken that "tunable" wasn't in 2.4.4-p3 in System->Advanced->Firewall tab. Could someone double check that please.

                  So, in 2.4.4-p3 how did the big, ~110k, bogonsv6 table get loaded in 2.4.4-p3? If I set it to 60000 and then turn on bogons I get the can not allocate memory error and Alert.

                  1 Reply Last reply Reply Quote 0
                  • G
                    getcom @bmeeks
                    last edited by Apr 1, 2020, 9:15 PM

                    @bmeeks said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                    @getcom said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                    @bmeeks said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                    @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                    @bmeeks @BBcan177

                    I turned off bogons. I deactivated pfblocker. I set the max table size to 60000. Rebooted. The reboot was quick again, like 2.4.4-p3. I ping6 google.com from a lan side device while doing a filter reload. The ping times don't change by any meaningful amount.

                    Makes me wonder if the pfctl fix was overlooked when the release was built?

                    Set max table size to 100000. Reboot. Reboot is longer again. Turn on pfblocker and some ip lists that add up to more than 60000 but less than 100000. Problem returns. Latency when reloading the filters. Bigger tables bigger problem.

                    Edited to add: the max table size doesn't appear to make any difference, other than setting a hard limit on table size. The actual size of the aliases/tables is what triggers the problem.

                    It sure looks to me that anything over that sixty some thousand mark causes the issue. I could be wrong, wouldn't be the first time, but this sure looks like the issue.

                    The actual patch file can be accessed here: https://security.FreeBSD.org/patches/EN-20:04/pfctl.patch. What the patch does is remove the former arbitrary hardcoded limit of 65,535 (defined as PF_TABLES_MAX_REQUEST) and allows the use of a sysctl parameter instead. Deeper research into the other pf related source code would be required to determine if allowing that larger PF_TABLES_MAX_REQUEST value has an adverse impact.

                    Looking a bit farther into what the patch actually does gives me a theory. The 65,535 number does not appear to be a limit on the number of IP addresses in a given table. It appears, instead, to be a limit on the number of tables or addresses you can add to the firewall during a single call to the corresponding ioctl() function. That limit was formerly hardcoded to 65,535. Now, with the addition of a sysctl variable for customizing this limit, I can envision a scenario where with a very high value for this new sysctl value that you are overloading the other pf areas. In particular, you would be requesting "too many tables and/or addresses in a single ioctl() call". So this may well be why lowering the value improves performance! You are no longer "overloading" the other ioctl routines that are actually creating the tables or addresses in RAM.

                    This makes sense, but as a consequence, this then makes pfBlockerNG unusable or with larger values, pfSense itself makes it unusable. The choice is yours...if you currently need both, you will need a fix or rollback in the near future.

                    I'm going to go back and research how this was coded in FreeBSD 11.2/RELEASE (which is what 2.4.4_p3 was based on). Will be interesting to see what may have changed from 11.2 to 11.3 of FreeBSD.

                    LATER EDIT: The PF_TABLES_MAX_REQUEST limit was a hardcoded value (a defined constant, actually) in the 11.2-RELEASE of FreeBSD. The recent FreeBSD 11.3/STABLE patch I referenced in an earlier post above is titled "Missing Tuneable", but I think it might really should have been titled "New Tuneable". Using "Missing" in the title could be taken by some to mean it was there previously in earlier versions and was erroneously removed and is being restored by the patch. Instead, it appears to me the patch added this system tuneable as a new feature and removed the former hardcoded limit. Anecdotal evidence from posters in this thread indicates allowing that former hard limit to now be essentially unlimited can produce undesirable side effects.

                    Agreed, I`m just thinking about if I should revert the patch and compile the "unleashed" FreeBSD 11.3 kernel for testing.

                    1 Reply Last reply Reply Quote 0
                    • T
                      tman222
                      last edited by Apr 1, 2020, 9:22 PM

                      @bmeeks - I think you are on the right track with the ioctl() calls and how this might have changed going from 11.2 to 11.3. Since the value was 65535 before, how were large tables loaded in 11.2 and prior? Did the kernel allow multiple iotctl() call in succession or something similar? In 11.3 and beyond are we then potentially limited to making just one very large ioctl() call instead (by setting the new net.pf.request_maxcount tunable) as a security precaution?

                      Another related issue on Redmine:

                      https://redmine.pfsense.org/issues/9356

                      I guess my natural question is, would the optimal value be something close to the largest single table (e.g. IP alias / block list)? That assumes that it is a separate ioctl() call for loading each set of entries vs. trying to load everything with one call. Something interesting to try might be to leave the max table size (pfSense > Advanced > Firewall & NAT > Firewall Maximum Table Entries) unchanged at the default value (or whatever value it was adjusted to if default was too low) and then adjust just the net.pf.request_maxcount tunable to be a little larger than the single largest table (IP alias / block list), by setting it in loader.conf.local. Does anyone see any improvement in performance in such a scenario?

                      B 1 Reply Last reply Apr 1, 2020, 9:33 PM Reply Quote 0
                      • B
                        bmeeks @tman222
                        last edited by bmeeks Apr 1, 2020, 9:33 PM Apr 1, 2020, 9:33 PM

                        @tman222 said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                        @bmeeks - I think you are on the right track with the ioctl() calls and how this might have changed going from 11.2 to 11.3. Since the value was 65535 before, how were large tables loaded in 11.2 and prior? Did the kernel allow multiple iotctl() call in succession or something similar? In 11.3 and beyond are we then potentially limited to making just one very large ioctl() call instead (by setting the new net.pf.request_maxcount tunable) as a security precaution?

                        Another related issue on Redmine:

                        https://redmine.pfsense.org/issues/9356

                        I guess my natural question is, would the optimal value be something close to the largest single table (e.g. IP alias / block list)? That assumes that it is a separate ioctl() call for loading each set of entries vs. trying to load everything with one call. Something interesting to try might be to leave the max table size (pfSense > Advanced > Firewall & NAT > Firewall Maximum Table Entries) unchanged at the default value (or whatever value it was adjusted to if default was too low) and then adjust just the net.pf.request_maxcount tunable to be a little larger than the single largest table (IP alias / block list), by setting it in loader.conf.local. Does anyone see any improvement in performance in such a scenario?

                        I'm not a kernel coding expert, so I don't know the answers to some of the questions here. I just did a quick scan of the 11.2 and 11.3 source files for some of the pf modules to see that the changes were in the listed patch. There may have been other changes to FreeBSD 11.3 that are not covered in that patch I linked above. If there were other changes, then those may also be in play. Really are going to have to wait on the pfSense developer team to figure it out. They have the required skills and knowledge.

                        1 Reply Last reply Reply Quote 0
                        • D
                          daNutz
                          last edited by Apr 1, 2020, 9:38 PM

                          It probably makes no difference but i tried turning my default for Firewall Maximum Table Entries from 1,000,000 to the lowest acceptable value of 400,000. I rebooted and it wouldnt provide internet access so i returned the value back to default and reboot again, since then things have seemed more stable.. i will monitor.

                          1 Reply Last reply Reply Quote 0
                          • ?
                            A Former User
                            last edited by A Former User Apr 1, 2020, 10:45 PM Apr 1, 2020, 9:40 PM

                            Looking at the pfctl code. A "reload" is a flush followed by a load. It's the purge that violates the hard limit as best I can tell. Tables are loaded one by one. Any one table larger than the limit would error. Why the purge error is beyond my very rusty c skills. That's why the total items in all tables greater than the limit causes an error.

                            I'm not convinced that the limit has anything to do with the latency issue other than smaller total items in tables causes less latency. Other things have changed in the way that is implemented, again as best I can tell.

                            I'm entirely in over my head at this point and I don't have the time (days) to get back in the right mindset to figure out the code.

                            1 Reply Last reply Reply Quote 0
                            • T
                              tman222
                              last edited by Apr 1, 2020, 10:22 PM

                              Did a few tests and found that setting the net.pf.request_maxcount value independent of Maximum Table Entries value made little difference - still seeing latencies and load spike briefly. I tried 200000 for request_maxcount (against a Max Table Entry value of 2 Million) as well as 131071 (to see if helped to be a power of 2). No dice unfortunately. As others have already alluded to, there seems to be something deeper here that changed.

                              T 1 Reply Last reply Apr 3, 2020, 4:24 AM Reply Quote 0
                              • ?
                                A Former User
                                last edited by Apr 2, 2020, 2:04 PM

                                Later today I will downgrade from 2.4.5 to 2.4.4-p3. Is there anything anyone would like me to do before I downgrade and after to help mitigate your latency issues? I suppose this is an opportunity to compare like configurations other than the pfsense version.

                                1 Reply Last reply Reply Quote 0
                                • D
                                  digdug3 @A Former User
                                  last edited by Apr 2, 2020, 3:03 PM

                                  @muppet said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                                  Redmine

                                  I don't see a Redmine ticket yet, please submit one.

                                  1 Reply Last reply Reply Quote 0
                                  • G
                                    getcom
                                    last edited by Apr 2, 2020, 5:07 PM

                                    FYI: I just opened a Netgate ticket and reported the situation with 2.4.5

                                    The workaround for now:

                                    • disable "Block bogon networks" on WAN interfaces
                                    • In System => Advanced => Firewall & NAT: set "Firewall Maximum Table Entries" to a value < 65535

                                    With this changes the pfBlockerNG-devel cannot load lists anymore with an overall value > "Firewall Maximum Table Entries" - count of custom rule sets.
                                    Unasigned or reserved IP addresses will not be blocked anymore.
                                    CPU usage stays at ~1-5% without spikes.
                                    Memory usage in my setup is 20-30% higher compared to 2.4.4-P3, but the spikes are gone.

                                    I will test now the behavior of changing rule sets and interfaces/VLAN setups.

                                    N 1 Reply Last reply Apr 2, 2020, 11:03 PM Reply Quote 3
                                    • ?
                                      A Former User
                                      last edited by Apr 2, 2020, 7:19 PM

                                      I've reverted to 2.4.4-p3. I have tried but can not get it to exhibit the same behavior. I can add or remove very large tables and there is no meaningful latency or any unexpected cpu spikes.

                                      What conclusion can be drawn from this adventure? To be honest I'm not entirely sure. Only an understanding of how pf has changed in the upgrade from 11.2 Release to 11.3 Stable will reveal what is going on.

                                      For now I'm happy to just get on with things...

                                      J 1 Reply Last reply Apr 2, 2020, 8:00 PM Reply Quote 0
                                      • J
                                        jdeloach @A Former User
                                        last edited by jdeloach Apr 2, 2020, 8:07 PM Apr 2, 2020, 8:00 PM

                                        @jwj said in Increased Memory and CPU Spikes (causing latency/outage) with 2.4.5:

                                        Only an understanding of how pf has changed in the upgrade from 11.2 Release to 11.3 Stable will reveal what is going on.

                                        For now I'm happy to just get on with things...

                                        I reverted to 2.4.4 p3 but discovered that I still couldn't run Snort due to the changes recently made, so I went back to 2.4.5. I have Snort and pfBlockerNG-devel installed.

                                        I ended up playing with the settings in System, Advanced, Power Savings. Not sure what the defaults were, but with PowerD enabled and AC Power, Battery Power, and Unknown Power set to Hiadaptive, Adaptive, Adaptive respectively, I don't see the CPU spikes anymore but I do still see increases in latency but they are less than 100ms on my pfSense box. Out of curiosity, does anyone know what Unknown power is? The setting for that one seemed to be the most sensitive with respect to CPU increases.

                                        I'm tired of messing with this and ready to move on. I can live with the way my box is working at the present time.

                                        Hopefully they can find what the issue is and fix it before we get to pfSense version 2.5 which is based on FreeBSD 12.x. Who knows what new problems may arise then...

                                        Edit: The reason I got to playing with the Power Savings settings was because the CPU temperature was running around 42 to 50 degrees C with 2.4.5 installed and it normally runs around 22 to 25 degrees C with 2.4.4 p3 installed. With the Power Savings settings that I specified above, the temperature is around 25 degrees C now.

                                        ? ? 2 Replies Last reply Apr 2, 2020, 8:20 PM Reply Quote 0
                                        • ?
                                          A Former User @jdeloach
                                          last edited by Apr 2, 2020, 8:20 PM

                                          @jdeloach I don't recall the exact thread or procedure but you can get snort/suricata to work on 2.4.4-p3 by changing the repository to point to 2.4.4 and installing the "older" versions. Search around a bit if you think you'll have another go at downgrading.

                                          I'm back on 2.4.5 also, it's fine if you just don't touch anything that causes the thing to want to reload the filters... Life goes on such as it is...

                                          1 Reply Last reply Reply Quote 1
                                          103 out of 141
                                          • First post
                                            103/141
                                            Last post
                                          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.