Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Latency spikes during Filter reload - CE 2.6.0

    General pfSense Questions
    6
    36
    4.2k
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C
      cclarke69 @stephenw10
      last edited by

      @stephenw10 - Here is the ThinkBroadband Monitor showing pre and post upgrade

      c992bc11-674d-4561-b55c-aa807ac60967-image.png

      Stopping the rc.filter_configure_sync cron job running stops the latency spikes.

      1 Reply Last reply Reply Quote 0
      • A
        Averlon @stephenw10
        last edited by

        @stephenw10 said in Latency spikes during Filter reload - CE 2.6.0:

        Testing here with a 1700 line ruleset and not seeing this. Still digging....

        Maybe there is more to it than just rule count.

        @cclarke69

        Do you have any Rules with advanced Options like State Type != keep or Gateway override for policy based routing? Do you use Gateway Groups in some rules?

        C 1 Reply Last reply Reply Quote 0
        • C
          cclarke69 @Averlon
          last edited by

          @averlon - From memory,

          • State Type != keep -> no

          • Gateway groups -> yes

          • Gateway override -> yes

          • Also Traffic shaping -> yes

          1 Reply Last reply Reply Quote 0
          • A
            Averlon
            last edited by Averlon

            For Reference

            09e20f83-ed7d-43b3-8c96-bd675854f9ca-image.png

            Have currently only console access via IPMI. Gonna do some tests later, when I get in-band access to that machine.

            1 Reply Last reply Reply Quote 0
            • C
              cclarke69 @stephenw10
              last edited by

              @stephenw10 @averlon - as a test I disabled SMP by adding kern.smp.disabled=1 to /boot/loader.conf.local . Early indications are that this mitigates the latency issue. There was apparently a similar issue in 2.4.5 - https://forum.netgate.com/topic/149595/2-4-5-a-20200110-1421-and-earlier-high-cpu-usage-from-pfctl

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Yes, though it isn't a regression of that issue directly as that was easy to replicate in the end.

                Just to confirm you are seeing spikes pinging to the firewall or though it? Or Both?

                C 1 Reply Last reply Reply Quote 0
                • C
                  cclarke69 @stephenw10
                  last edited by

                  @stephenw10 - I see both.

                  1 Reply Last reply Reply Quote 0
                  • stephenw10S
                    stephenw10 Netgate Administrator
                    last edited by

                    Ok, with a large generated ruleset I am able to see latency spikes when reloading it for the time it's loading.
                    But disabling does not make any difference. In fact it makes it significantly worse, which is what I'd expect.

                    What values are you seeing there with only one CPU core?

                    Steve

                    C 2 Replies Last reply Reply Quote 0
                    • C
                      cclarke69 @stephenw10
                      last edited by

                      @stephenw10 - I've not tested technically. It may be that the dashboard isn't show increased latency while the reload is running due to CPU spike. TBM is showing 15 min spikes, so I guess my early optimism is misplaced.

                      1 Reply Last reply Reply Quote 0
                      • C
                        cclarke69 @stephenw10
                        last edited by

                        @stephenw10 - Having re-enabled SMP, I ran a continuous ping test from my PC to WAN address with 1000 byte payload. Steady state ping time is 4-5s. When reload is running, it results in 4 timeouts and 5-6 massively higher than normal ping times (1-5s vs 4-5ms), so actual period of high latency is ~10s.

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Ok, I've replicated this here given enough rules and tables:

                          [22.01-RELEASE][root@5100.stevew.lan]/root: pfctl -sr | wc -l
                              1121
                          [22.01-RELEASE][root@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
                          0.520u 2.295s 0:02.81 100.0%    203+182k 1+0io 0pf+0w
                          
                          [21.05.2-RELEASE][root@5100.stevew.lan]/root: pfctl -sr | wc -l
                              1116
                          [21.05.2-RELEASE][root@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
                          0.302u 0.270s 0:00.57 100.0%    202+176k 0+0io 0pf+0w
                          

                          Try that test dircetly. I see latency to the firewall while the ruleset is reloading and because it seems to take significantly longer that becomes noticeable.

                          There is a bug open for this here: https://redmine.pfsense.org/issues/12827

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • T
                            tman222
                            last edited by

                            I wanted to chime in on this thread to note that I also see regular CPU spikes now ~15-30min apart after the upgrade to 2.6.0/22.01, that were not present in 2.5.2. I don't have as large of a rule set as the other posters, so the CPU only spikes to 2-3% and no noticeable impact to traffic / latency so far.

                            A 1 Reply Last reply Reply Quote 1
                            • S SteveITS referenced this topic on
                            • A
                              Averlon @tman222
                              last edited by

                              @tman222
                              It's normal to have some sort of load on the firewall's CPU - all traffic on pfSense is processed at interrupt level. When you open and look at your dashboard, it consumes CPU time as well. 2 - 3% load is a usual load profile, looking on a idle system. In general the load may spike up to 100% for a period of time and not necessary affect latency of traffic.
                              For some reason, reloading large rulesets "tackles" the firewall, causing all traffic to stop for a few seconds.

                              1 Reply Last reply Reply Quote 0
                              • stephenw10S
                                stephenw10 Netgate Administrator
                                last edited by

                                Yeah, 2-3% is nothing to worry about.

                                If you hit this you will see something like:

                                last pid: 16601;  load averages:  0.46,  0.17,  0.10                                                                  up 1+16:21:09  16:44:08
                                599 threads:   7 running, 547 sleeping, 43 waiting, 2 lock
                                CPU:  1.6% user,  0.0% nice, 60.9% system,  0.0% interrupt, 37.5% idle
                                Mem: 29M Active, 201M Inact, 702M Wired, 2905M Free
                                ARC: 229M Total, 46M MFU, 176M MRU, 168K Anon, 1423K Header, 5909K Other
                                     109M Compressed, 315M Uncompressed, 2.90:1 Ratio
                                Swap: 1024M Total, 1024M Free
                                
                                  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
                                16601 root         20    0    16M  8132K CPU3     3   0:03 100.00% pfctl -f /tmp/rules.debug
                                56110 root         20    0    11M  2644K CPU2     2   0:22  70.96% /usr/local/bin/dpinger -S -r 0 -i WAN_DHCP -B 172.21.16.21 -p /var/run/dpi
                                59112 root         20    0    19M  6856K *pf ru   0   0:10  66.11% /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid{ntpd}
                                   11 root        155 ki31     0B    64K RUN      3  39.9H  55.95% [idle{idle: cpu3}]
                                   11 root        155 ki31     0B    64K CPU1     1  39.9H  39.34% [idle{idle: cpu1}]
                                   11 root        155 ki31     0B    64K RUN      0  39.9H  33.16% [idle{idle: cpu0}]
                                   11 root        155 ki31     0B    64K RUN      2  39.9H  21.50% [idle{idle: cpu2}]
                                56372 root         20    0    14M  2696K *pf ru   3   0:21  12.56% /usr/local/bin/dpinger -S -r 0 -i VTI0_VTIV4 -B 10.45.13.2 -p /var/run/dpi
                                    0 root        -92    -     0B   736K -        1   4:16   0.17% [kernel{dummynet}]
                                

                                Even there though you'll notice no CPU core is pegged at 100%.

                                Steve

                                1 Reply Last reply Reply Quote 0
                                • S
                                  SteveITS Galactic Empire @Averlon
                                  last edited by

                                  @averlon The new System Patches package lists recommended patches, and one is to Disable pf counter data preservation to temporarily work around latency when reloading large rulesets.

                                  Pre-2.7.2/23.09: Only install packages for your version, or risk breaking it. Select your branch in System/Update/Update Settings.
                                  When upgrading, allow 10-15 minutes to restart, or more depending on packages and device speed.
                                  Upvote 👍 helpful posts!

                                  1 Reply Last reply Reply Quote 3
                                  • T totowentsouth referenced this topic on
                                  • T totowentsouth referenced this topic on
                                  • A
                                    Averlon
                                    last edited by

                                    @steveits

                                    Listing recommended Patches via the the System Patches package is a great enchantment.
                                    Meanwhile I've rolled back to 2.5. The 2.6 enchantment, which I wanted to get from that Release (IP Fragmentation over IPsec) isn't working in my configuration. Probably I'll have to change the IPsec Filter Mode to "VTI only" to get this work. This means also, I'll have to migrate a bunch of Rules (~900) from the IPsec section to each VTI Interface and spend some thoughts on how I design rules for encrypted traffic between sites when a firewall has to do a transit role.

                                    I've encountered another Issue with HAProxy on 2.6 - The process started to consume permanent 100% CPU Time (one core) randomly. After a restart of the service, all went back to normal, at least for a while. I didn't look further into this, maybe it has something to do with ocsp preload enabled on some front-ends.

                                    For now I'm back on 2.5.x., all in all it's running more stable in my configuration, even if 2.6 has some great updates in the IPSec code what makes overall configuration more smooth and reduce boot time about 50%.

                                    1 Reply Last reply Reply Quote 0
                                    • First post
                                      Last post
                                    Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.