Performance regression 2.7.2 to 2.8

fathead

@stephenw10
MSS is set to 1472 on the only wan interface, the only other interface is the lan.
interfaces.php?if=wan How does this not affect ipv4?
ifconfig shows wan_stf with mtu1472
How to check MSS or are the pf scrub rules in a file or only in running memory?
Also does a doc exist on docs.netgate.com about ipv6 packet too big being a default pass rule, is creating a pass rule for this redundant?

fathead

This post is deleted!

stephenw10

Ah, OK the 6RD tunnel is not exposed dircetly... Hmm. I don't use 6RD.

So, yes, applying that on the WAN will affect IPv4 traffic. And more importantly may not apply it to traffic inside the 6RD tunnel. Setting it on LAN would though.

The actual value required may vary but in the one other case I've seen it was 1472.

We are digging into this....

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

How to check MSS or are the pf scrub rules in a file or only in running memory?

Look at the ruleset file: /tmp/rules.debug

fathead

@stephenw10 Thank you!
Looks like it affects both.
before:
scrub on $WAN inet all max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1432 fragment reassemble
after setting MSS:
scrub on $WAN inet all max-mss 1432 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1412 fragment reassemble
What file creates the scrubs that are applied?

stephenw10

Oh, OK. That looks correct then. Except it's not including the 6RD overhead. What we want to see there is:

scrub on $WAN inet all max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all max-mss 1412 fragment reassemble

Yeah OK let me dig into this. It's likely a simple patch....

stephenw10

Can you test a patch?

mr1226.diff

That should allow if_pppoe to work as expected. mpd5/netgraph is a different matter!

fathead

@stephenw10 Thanks for the reply!
That worked instantly.

scrub on $WAN inet all   max-mss 1452 fragment reassemble
scrub on $WAN_STF inet6 all   max-mss 1412 fragment reassemble

stephenw10

Nice

And traffic passing as expected?

I imagine one of our devs might have a better patch than that but it proves the issue.

fathead

@stephenw10
99% of FIN_WAIT_2 are gone.
WAN side seems OK.
LAN side inconsistently get NO_TRAFFIC:NO_TRAFFIC with 64:ff9b::7f00:1

stephenw10

Hmm, that could be nothing if you're not seeing connection issues at the clients.

You could try setting a slightly lower MSS value and see if it changes anything.

fathead

ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Sometimes works, sometimes does not.

stephenw10

Hmm, just with different ping sizes?

MSS has no effect on pings, only TCP. So nothing should have changed there.

fathead

@stephenw10
ping6 from lan side to pfSense it self and lan to lan.
I have only tested with small packets, however so far size does not matter.

Both fail sometimes:
ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Even the default address of 64:ff9b::c0a8:101 sometimes fails.
What I do not understand is why it comes and goes.
setting lan side mtu/mss to 1.4k, 1.5k or 9k changes nothing.

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

ping6 from lan side to pfSense it self and lan to lan.

Hmm, well that would have nothing to do with the pppoe change on WAN. Something local blocking traffic?

fathead

@stephenw10 Only package installed is System_Patches for that one patch and all lan firewall rules are pass except port 53.

stephenw10

And you only see this for ping6? Internal IPv4 traffic is unaffected?

firstofnine

I was having this issue as well and can confirm the diff has solved my 6rd issues in 2.8.0

If you'd like I have a pcap from when I was having issues I can provide.

fathead

@stephenw10 Unable to reproduce with v4 and the link local, fe80::1:1 is always reachable.
It also affects Virtual IPs.
Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?
10.0.0.1/32 has been reassigned to lan and cpu can idle.
For supplementary information if a ping6 64:ff9b::a00:3 is started it will fail, restarting pfSense while the ping6 remains undisturbed; when pfSense boots up ping6 is successful for about 10 minutes.
Restarted pfSense 3 times testing VIPs, ping6 64:ff9b::7f00:1 can work if it is just one ping, if two or more lan IPs ping6 at the same time it does not work; this all may or may not be correct behavior if pfSense is seeing all pings from the same ip.

ping6: Warning: time of day goes back (-16520us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=269 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16536us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=270 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16529us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=271 ttl=64 time=0.000 ms (DUP!)
64 bytes from 64:ff9b::7f00:1: icmp_seq=770 ttl=64 time=0.207 ms
ping6: Warning: time of day goes back (-16541us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=272 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16471us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=273 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16543us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=274 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16523us), taking countermeasures

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?

No, not just that. It might if it's having to deal with a lot of traffic to that VIP that otherwise gets blocked.

Can I assume that applying that patch has not changed this new problem? Just that it too is new in 2.8?