Performance regression 2.7.2 to 2.8

stephenw10

Nice

And traffic passing as expected?

I imagine one of our devs might have a better patch than that but it proves the issue.

fathead

@stephenw10
99% of FIN_WAIT_2 are gone.
WAN side seems OK.
LAN side inconsistently get NO_TRAFFIC:NO_TRAFFIC with 64:ff9b::7f00:1

stephenw10

Hmm, that could be nothing if you're not seeing connection issues at the clients.

You could try setting a slightly lower MSS value and see if it changes anything.

fathead

ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Sometimes works, sometimes does not.

stephenw10

Hmm, just with different ping sizes?

MSS has no effect on pings, only TCP. So nothing should have changed there.

fathead

@stephenw10
ping6 from lan side to pfSense it self and lan to lan.
I have only tested with small packets, however so far size does not matter.

Both fail sometimes:
ping6 -s56 64:ff9b::7f00:1
ping6 -s32 64:ff9b::7f00:1
Even the default address of 64:ff9b::c0a8:101 sometimes fails.
What I do not understand is why it comes and goes.
setting lan side mtu/mss to 1.4k, 1.5k or 9k changes nothing.

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

ping6 from lan side to pfSense it self and lan to lan.

Hmm, well that would have nothing to do with the pppoe change on WAN. Something local blocking traffic?

fathead

@stephenw10 Only package installed is System_Patches for that one patch and all lan firewall rules are pass except port 53.

stephenw10

And you only see this for ping6? Internal IPv4 traffic is unaffected?

firstofnine

I was having this issue as well and can confirm the diff has solved my 6rd issues in 2.8.0

If you'd like I have a pcap from when I was having issues I can provide.

fathead

@stephenw10 Unable to reproduce with v4 and the link local, fe80::1:1 is always reachable.
It also affects Virtual IPs.
Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?
10.0.0.1/32 has been reassigned to lan and cpu can idle.
For supplementary information if a ping6 64:ff9b::a00:3 is started it will fail, restarting pfSense while the ping6 remains undisturbed; when pfSense boots up ping6 is successful for about 10 minutes.
Restarted pfSense 3 times testing VIPs, ping6 64:ff9b::7f00:1 can work if it is just one ping, if two or more lan IPs ping6 at the same time it does not work; this all may or may not be correct behavior if pfSense is seeing all pings from the same ip.

ping6: Warning: time of day goes back (-16520us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=269 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16536us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=270 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16529us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=271 ttl=64 time=0.000 ms (DUP!)
64 bytes from 64:ff9b::7f00:1: icmp_seq=770 ttl=64 time=0.207 ms
ping6: Warning: time of day goes back (-16541us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=272 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16471us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=273 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16543us), taking countermeasures
64 bytes from 64:ff9b::7f00:1: icmp_seq=274 ttl=64 time=0.000 ms (DUP!)
ping6: Warning: time of day goes back (-16523us), taking countermeasures

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

Is it expected behavior that the cpu usage is high with an VIPs 10.0.0.1/32 is on wan?

No, not just that. It might if it's having to deal with a lot of traffic to that VIP that otherwise gets blocked.

Can I assume that applying that patch has not changed this new problem? Just that it too is new in 2.8?

fathead

@stephenw10
With or without patch mr1226.diff
No traffic on any VIPs and cpu is high.
kea-dhcp6 php-fpm.
kea-dhcp6 is using almost about 3% when 10.0.0.1/32 IP Alias is set on the wan, set it to lan kea-dhcp6 uses 0.00%

fathead

Block private networks and loopback addresses
Is enabled on wan, turning that off is all the same high cpu.

stephenw10

Oh this is on the PPPoE WAN?

That's a known issue: https://redmine.pfsense.org/issues/16235

Try the patch refferenced there.

fathead

@stephenw10 Yes the PPPoE WAN.
Is this the patch?
This patch does fix high cpu, however when a VIP is set on wan, it breaks the whole nat 64:ff9b::/96 address space, or is a reload/restart needed?

stephenw10

Yup without that when you add a VIP on a PPPoE WAN and have if_pppoe enabled then the connection loops continually. The logs will have shown it reconnecting every few seconds which obviously load the CPU significantly.

So how exactly does NAT64 fail?

What are you using that VIP for?

fathead

@stephenw10

So how exactly does NAT64 fail?

Native v6 traffic is normal.
Outside NAT64 so far is not working with a VIP set on wan.
The VIPs them selves are reachable example 64:ff9b::10.0.0.3

WAN	ipv6-icmp	10.0.0.4:1 (fdbb::8[1]) -> 77.47.127.138:8 (64:ff9b::100:1[1])	NO_TRAFFIC:NO_TRAFFIC	2 / 2	160 B / 120 B

What are you using that VIP for?

v4 VIPs for ping, v6 VIPs for DNS.

stephenw10

@fathead said in Performance regression 2.7.2 to 2.8:

Outside NAT64 so far is not working with a VIP set on wan.

OK so you are setting that VIP just as something to ping from a V6 only client device?

Can I assume it still responds to ping from an internal IPv4 client?

fathead

@stephenw10

OK so you are setting that VIP just as something to ping from a V6 only client device?

Can I assume it still responds to ping from an internal IPv4 client?

Yes and yes.