Latency spikes during Filter reload - CE 2.6.0

stephenw10

Hmm, well it doesn't appear to be a regression of that bug specifically:

[22.01-RELEASE][admin@5100.stevew.lan]/root: pfctl -t bogonsv6 -T flush
128188 addresses deleted.
[22.01-RELEASE][admin@5100.stevew.lan]/root: time pfctl -t bogonsv6 -T add -f /etc/bogonsv6
128188/128188 addresses added.
0.406u 0.234s 0:00.64 98.4%	205+185k 1+0io 0pf+0w

At least not on that hardware.

Steve

Averlon

The impact on the Firewall is similar to Redmine 10414, not that heavy as described there but bad enough to silence collaboration for 3 to 5 seconds. The cause for this is somewhere else, table sizes seems to be irrelevant. It can be triggered by pfctl -f /tmp/rules.debug at a certain line count of rules. It starts lightly somewhere between 400 and 600 lines and getting worst if more rules are present.

Cool_Corona

@averlon Same as the bug in 2.5.0 as I recall??

stephenw10

We are setting up testing to try to replicate locally.

cclarke69

I see the same symptom.

I7-5500u
8gb RAM
6x Intel NICS
120gb SSD

Home network for the family, so failing the WAF currently.

stephenw10

What sort of latency are you seeing? How many firewall lines?

cclarke69

@stephenw10 - around 300 active rules across 10 interfaces. WAN RTT goes from <7ms to >200 ms for around 50s. RTTd from 0.2ms to > 800ms for the same period.

stephenw10

Urgh, that's pretty bad.

How many actual rulset lines though? As reported by: pfctl -sr | wc -l

If you run the table reload commands I showed above do those come back in reasonable time?

Steve

cclarke69

@stephenw10 - output of pfctl -sr | wc -l is 1987. That command returns in about 1s

cclarke69

@stephenw10 - Which table reload command do you mean?

cclarke69

@stephenw10 - 1 table created.
128188/128188 addresses added.
0.29 real 0.12 user 0.16 sys

for time pfctl -t bogonsv6 -T add -f /etc/bogonsv6

stephenw10

Yes, that. And those times look fine.

You might also try:

[22.01-RELEASE][admin@5100.stevew.lan]/root: time pfctl -f /tmp/rules.debug
0.377u 0.329s 0:00.70 98.5%	208+187k 1+0io 0pf+0w

Hardly additional rules on that box though:

[22.01-RELEASE][admin@5100.stevew.lan]/root: pfctl -sr | wc -l
     121

Steve

cclarke69

@stephenw10 - time pfctl -f /tmp/rules.debug -> 6.06 real 0.35 user 5.70 sys

cclarke69

@stephenw10 - 0.370u 5.780s 0:06.15 100.0% 203+182k 5+0io 0pf+0w

cclarke69

@stephenw10 - If it helps, I've restarted the pfSense and observed the stats. The WAN RTT was very high for ~50s after the GUI became available. The OpenVPN interfaces carried over the WAN connection gave normal RTT immediately.

cclarke69

@stephenw10 - And wireguard doesn't start after reboot. Having resaved the wireguard peers, the Gateways looked like

When all should be sub 10ms.

stephenw10

Hmm, those Wireguard stats are continually? Or for the 50s after boot?

6s to load the ruleset is pretty extreme too.

Testing here with a 1700 line ruleset and not seeing this. Still digging....

cclarke69

@stephenw10 - The stats above are for WAN and 2 OpenVPN interfaces, during the ~50s after Wireguard starts. I assume the rules are reloaded at that time? The other point I was making is that Wireguard won't start after reboot, until the WG peers have been disabled and re-enabled. I believe there's another thread somewhere on that topic. Wireguard was fine on 2.5.2

cclarke69

@stephenw10 - Here is the ThinkBroadband Monitor showing pre and post upgrade

Stopping the rc.filter_configure_sync cron job running stops the latency spikes.

Averlon

@stephenw10 said in Latency spikes during Filter reload - CE 2.6.0:

Testing here with a 1700 line ruleset and not seeing this. Still digging....

Maybe there is more to it than just rule count.

@cclarke69

Do you have any Rules with advanced Options like State Type != keep or Gateway override for policy based routing? Do you use Gateway Groups in some rules?