Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Latency spikes during Filter reload - CE 2.6.0

    Scheduled Pinned Locked Moved General pfSense Questions
    36 Posts 6 Posters 4.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Yes, that. And those times look fine.

      You might also try:

      [22.01-RELEASE][admin@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
      0.377u 0.329s 0:00.70 98.5%	208+187k 1+0io 0pf+0w
      

      Hardly additional rules on that box though:

      [22.01-RELEASE][admin@5100.stevew.lan]/root: pfctl -sr | wc -l
           121
      

      Steve

      C 4 Replies Last reply Reply Quote 0
      • C
        cclarke69 @stephenw10
        last edited by

        @stephenw10 - time pfctl -f /tmp/rules.debug -> 6.06 real 0.35 user 5.70 sys

        1 Reply Last reply Reply Quote 0
        • C
          cclarke69 @stephenw10
          last edited by

          @stephenw10 - 0.370u 5.780s 0:06.15 100.0% 203+182k 5+0io 0pf+0w

          1 Reply Last reply Reply Quote 0
          • C
            cclarke69 @stephenw10
            last edited by

            @stephenw10 - If it helps, I've restarted the pfSense and observed the stats. The WAN RTT was very high for ~50s after the GUI became available. The OpenVPN interfaces carried over the WAN connection gave normal RTT immediately.

            1 Reply Last reply Reply Quote 0
            • C
              cclarke69 @stephenw10
              last edited by

              @stephenw10 - And wireguard doesn't start after reboot. Having resaved the wireguard peers, the Gateways looked like

              d04c5886-afff-434b-b5a8-e83fb0d52e76-image.png

              When all should be sub 10ms.

              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                Hmm, those Wireguard stats are continually? Or for the 50s after boot?

                6s to load the ruleset is pretty extreme too.

                Testing here with a 1700 line ruleset and not seeing this. Still digging....

                C A 4 Replies Last reply Reply Quote 0
                • C
                  cclarke69 @stephenw10
                  last edited by

                  @stephenw10 - The stats above are for WAN and 2 OpenVPN interfaces, during the ~50s after Wireguard starts. I assume the rules are reloaded at that time? The other point I was making is that Wireguard won't start after reboot, until the WG peers have been disabled and re-enabled. I believe there's another thread somewhere on that topic. Wireguard was fine on 2.5.2

                  1 Reply Last reply Reply Quote 0
                  • C
                    cclarke69 @stephenw10
                    last edited by

                    @stephenw10 - Here is the ThinkBroadband Monitor showing pre and post upgrade

                    c992bc11-674d-4561-b55c-aa807ac60967-image.png

                    Stopping the rc.filter_configure_sync cron job running stops the latency spikes.

                    1 Reply Last reply Reply Quote 0
                    • A
                      Averlon @stephenw10
                      last edited by

                      @stephenw10 said in Latency spikes during Filter reload - CE 2.6.0:

                      Testing here with a 1700 line ruleset and not seeing this. Still digging....

                      Maybe there is more to it than just rule count.

                      @cclarke69

                      Do you have any Rules with advanced Options like State Type != keep or Gateway override for policy based routing? Do you use Gateway Groups in some rules?

                      C 1 Reply Last reply Reply Quote 0
                      • C
                        cclarke69 @Averlon
                        last edited by

                        @averlon - From memory,

                        • State Type != keep -> no

                        • Gateway groups -> yes

                        • Gateway override -> yes

                        • Also Traffic shaping -> yes

                        1 Reply Last reply Reply Quote 0
                        • A
                          Averlon
                          last edited by Averlon

                          For Reference

                          09e20f83-ed7d-43b3-8c96-bd675854f9ca-image.png

                          Have currently only console access via IPMI. Gonna do some tests later, when I get in-band access to that machine.

                          1 Reply Last reply Reply Quote 0
                          • C
                            cclarke69 @stephenw10
                            last edited by

                            @stephenw10 @averlon - as a test I disabled SMP by adding kern.smp.disabled=1 to /boot/loader.conf.local . Early indications are that this mitigates the latency issue. There was apparently a similar issue in 2.4.5 - https://forum.netgate.com/topic/149595/2-4-5-a-20200110-1421-and-earlier-high-cpu-usage-from-pfctl

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Yes, though it isn't a regression of that issue directly as that was easy to replicate in the end.

                              Just to confirm you are seeing spikes pinging to the firewall or though it? Or Both?

                              C 1 Reply Last reply Reply Quote 0
                              • C
                                cclarke69 @stephenw10
                                last edited by

                                @stephenw10 - I see both.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Ok, with a large generated ruleset I am able to see latency spikes when reloading it for the time it's loading.
                                  But disabling does not make any difference. In fact it makes it significantly worse, which is what I'd expect.

                                  What values are you seeing there with only one CPU core?

                                  Steve

                                  C 2 Replies Last reply Reply Quote 0
                                  • C
                                    cclarke69 @stephenw10
                                    last edited by

                                    @stephenw10 - I've not tested technically. It may be that the dashboard isn't show increased latency while the reload is running due to CPU spike. TBM is showing 15 min spikes, so I guess my early optimism is misplaced.

                                    1 Reply Last reply Reply Quote 0
                                    • C
                                      cclarke69 @stephenw10
                                      last edited by

                                      @stephenw10 - Having re-enabled SMP, I ran a continuous ping test from my PC to WAN address with 1000 byte payload. Steady state ping time is 4-5s. When reload is running, it results in 4 timeouts and 5-6 massively higher than normal ping times (1-5s vs 4-5ms), so actual period of high latency is ~10s.

                                      1 Reply Last reply Reply Quote 0
                                      • stephenw10S
                                        stephenw10 Netgate Administrator
                                        last edited by

                                        Ok, I've replicated this here given enough rules and tables:

                                        [22.01-RELEASE][root@5100.stevew.lan]/root: pfctl -sr | wc -l
                                            1121
                                        [22.01-RELEASE][root@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
                                        0.520u 2.295s 0:02.81 100.0%    203+182k 1+0io 0pf+0w
                                        
                                        [21.05.2-RELEASE][root@5100.stevew.lan]/root: pfctl -sr | wc -l
                                            1116
                                        [21.05.2-RELEASE][root@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
                                        0.302u 0.270s 0:00.57 100.0%    202+176k 0+0io 0pf+0w
                                        

                                        Try that test dircetly. I see latency to the firewall while the ruleset is reloading and because it seems to take significantly longer that becomes noticeable.

                                        There is a bug open for this here: https://redmine.pfsense.org/issues/12827

                                        Steve

                                        1 Reply Last reply Reply Quote 0
                                        • T
                                          tman222
                                          last edited by

                                          I wanted to chime in on this thread to note that I also see regular CPU spikes now ~15-30min apart after the upgrade to 2.6.0/22.01, that were not present in 2.5.2. I don't have as large of a rule set as the other posters, so the CPU only spikes to 2-3% and no noticeable impact to traffic / latency so far.

                                          A 1 Reply Last reply Reply Quote 1
                                          • S SteveITS referenced this topic on
                                          • A
                                            Averlon @tman222
                                            last edited by

                                            @tman222
                                            It's normal to have some sort of load on the firewall's CPU - all traffic on pfSense is processed at interrupt level. When you open and look at your dashboard, it consumes CPU time as well. 2 - 3% load is a usual load profile, looking on a idle system. In general the load may spike up to 100% for a period of time and not necessary affect latency of traffic.
                                            For some reason, reloading large rulesets "tackles" the firewall, causing all traffic to stop for a few seconds.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.