Performance regression 2.7.2 to 2.8

fathead

@stephenw10 Thank you!
Looks like it affects both.
before:
scrub on $WAN inet all max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1432 fragment reassemble
after setting MSS:
scrub on $WAN inet all max-mss 1432 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1412 fragment reassemble
What file creates the scrubs that are applied?

stephenw10

Oh, OK. That looks correct then. Except it's not including the 6RD overhead. What we want to see there is:

scrub on $WAN inet all max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1412 fragment reassemble

Yeah OK let me dig into this. It's likely a simple patch....

stephenw10

Can you test a patch?

mr1226.diff

That should allow if_pppoe to work as expected. mpd5/netgraph is a different matter!

fathead

@stephenw10 Thanks for the reply!
That worked instantly.

scrub on $WAN inet all   max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all   max-mss 1412 fragment reassemble

stephenw10

Nice

And traffic passing as expected?

I imagine one of our devs might have a better patch than that but it proves the issue.

fathead

@stephenw10
99% of FIN_WAIT_2 are gone.
WAN side seems OK.
LAN side inconsistently get NO_TRAFFIC:NO_TRAFFIC with 64:ff9b::7f00:1

stephenw10

Hmm, that could be nothing if you're not seeing connection issues at the clients.

You could try setting a slightly lower MSS value and see if it changes anything.

fathead

ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Sometimes works, sometimes does not.

stephenw10

Hmm, just with different ping sizes?

MSS has no effect on pings, only TCP. So nothing should have changed there.

fathead

@stephenw10
ping6 from lan side to pfSense it self and lan to lan.
I have only tested with small packets, however so far size does not matter.

Both fail sometimes:
ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Even the default address of 64:ff9b::c0a8:101 sometimes fails.
What I do not understand is why it comes and goes.
setting lan side mtu/mss to 1.4k, 1.5k or 9k changes nothing.

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

ping6 from lan side to pfSense it self and lan to lan.

Hmm, well that would have nothing to do with the pppoe change on WAN. Something local blocking traffic?

fathead

@stephenw10 Only package installed is System_Patches for that one patch and all lan firewall rules are pass except port 53.

stephenw10

And you only see this for ping6? Internal IPv4 traffic is unaffected?

firstofnine

I was having this issue as well and can confirm the diff has solved my 6rd issues in 2.8.0

If you'd like I have a pcap from when I was having issues I can provide.

fathead

@stephenw10 Unable to reproduce with v4 and the link local, fe80::1:1 is always reachable.
It also affects Virtual IPs.
Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?
10.0.0.1/32 has been reassigned to lan and cpu can idle.
For supplementary information if a ping6 64:ff9b::a00:3 is started it will fail, restarting pfSense while the ping6 remains undisturbed; when pfSense boots up ping6 is successful for about 10 minutes.
Restarted pfSense 3 times testing VIPs, ping6 64:ff9b::7f00:1 can work if it is just one ping, if two or more lan IPs ping6 at the same time it does not work; this all may or may not be correct behavior if pfSense is seeing all pings from the same ip.

ping6: Warning: time of day goes back (-16520us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=269 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16536us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=270 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16529us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=271 ttl=64 time=0.000 ms (DUP!)
64 bytes from 64:ff9b::7f00:1: icmp_seq=770 ttl=64 time=0.207 ms
ping6: Warning: time of day goes back (-16541us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=272 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16471us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=273 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16543us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=274 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16523us), taking countermeasures

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?

No, not just that. It might if it's having to deal with a lot of traffic to that VIP that otherwise gets blocked.

Can I assume that applying that patch has not changed this new problem? Just that it too is new in 2.8?

fathead

@stephenw10
With or without patch mr1226.diff
No traffic on any VIPs and cpu is high.
kea-dhcp6 php-fpm.
kea-dhcp6 is using almost about 3% when 10.0.0.1/32 IP Alias is set on the wan, set it to lan kea-dhcp6 uses 0.00%

fathead

Block private networks and loopback addresses
Is enabled on wan, turning that off is all the same high cpu.

stephenw10

Oh this is on the PPPoE WAN?

That's a known issue: https://redmine.pfsense.org/issues/16235

Try the patch refferenced there.

fathead

@stephenw10 Yes the PPPoE WAN.
Is this the patch?
This patch does fix high cpu, however when a VIP is set on wan, it breaks the whole nat 64:ff9b::/96 address space, or is a reload/restart needed?