if_pppoe problems with php-fpm causing loops. (resolved)

chrcoluk

ISP is doing LNS maintenance over the upcoming week, so will see how that reconnection goes.

Also I patched mss clamping in the scrub code to make it 52 bytes instead of 40 for timestamps overhead, and wow things have never performed so good, so that combined with this driver is working really well for me now.

stephenw10

Hmm did I miss something about timestamps? What did you discover that required that?

w0w

So, are we talking about RFC 1323 and a modification to /etc/inc/filter.inc?

chrcoluk

I noticed pfSense code if mss clamp is enabled just does the basic 40 bytes (IPv4) 60 bytes (IPv6) so e.g. a 1460 bytes MSS clamp on IPv4 with a 1500 byte MTU, I think it should be 1448 in such a scenario, so I changed the 40 bytes to 52, currently I have only patched the IPv4 code. Timestamps now days is on by default in operating systems and important on high bandwidth.

I did notice youtube videos had a stall before playing, as soon as I made this adjustment that has been fixed, and there is a wider improvement I am noticing as well.

It is a very basic patch as well. Modifying the firewall generation script, scrub rules.

w0w yes to both.

                /* set up MSS clamping */
                if ($mss) {
                        /* different size of IPv4/IPv6 header, https://redmine.pfsense.org/issues/11409 */
-                        $mssclamp4 = "max-mss " . ($mss - 40);
+                        $mssclamp4 = "max-mss " . ($mss - 52);

mssclamp6 is right under that as well.

w0w

@chrcoluk said in if_pppoe problems with php-fpm causing loops. (resolved):

I have only patched the IPv4 code

Why not both stacks?

chrcoluk

@w0w I will do both stacks, it was something I did very quickly and I want to check if its still 12 bytes for ipv6 as well.

Patched it now. So on my 1500 bytes MTU, 1448 MSS for IPV4, 1428 MSS for IPv6.

w0w

@chrcoluk
I did the same and even captured WAN packets to confirm. Looks like it's working — we'll see. By the way, I'm using gigabit PPPoE.

stephenw10

Hmm, interesting. I can't say I've noticed that. But also I wasn't looking for it specifically.

chrcoluk

After some more testing, it looks like if_pppoe resolves an issue related to fragments that existed in the legacy pppoe code.

w0w

@chrcoluk said in if_pppoe problems with php-fpm causing loops. (resolved):

issue related to fragments

What issue?

chrcoluk

@w0w I had a device that had issues with small tcp packets, it still fails on the legacy code but now passes on the new code. I didnt really consider it an issue pppoe side before, but the issue is gone on if_pppoe.

chrcoluk

PPP session was down for a while late afternoon, seen notification from ISP, and then when I got to firewall could just see repeated "received unexpected pad0" on console, and logs looked like was in a loop retrying.

After I confirmed ONT ok, I disabled WAN, enabled it, selected apply and it woke up within seconds.

Now there was external cause for the initial disconnection, was an outage on CityFibre the FTTP local provider, so the issue is it failed to connect automatically rather than the initial disconnection.

stephenw10

Hmm, interesting. Did you try just disconnecting and reconnecting in Status > Interfaces?

ajtuk

I had the same issue today after CityFibre went down. The PPPOE connection does not restart, luckily I can SSH in from the FTTC line and reboot it. Then it works fine. But if the ISP drops the connection, it's either access the GUI and click Connect or reboot.

chrcoluk

There was a second later outage, on the second outage it came back up by itself.

chrcoluk

@stephenw10 I didnt try that as I had experiences recently, using that method could leave services down, whilst toggling in interfaces always seems to keep everything running cleanly.

There is a chance of more drops as AAISP havent closed the incident, given CF are refusing to communicate with them on what has gone wrong, if it gets stuck again, I will try a cycle on the interfaces screen.

chrcoluk

@stephenw10 It went down again, didnt auto recover, status interfaces said it was still up (no ip's), I took it down then back up again and it came back right away.

w0w

@chrcoluk
When the PPPoE interface shows 'up', it means it's in the process of connecting. How long did you wait? Next time look what does

pppcfg pppoe0

show.

chrcoluk

@w0w It was already down for 2 hours and I had people moaning at me, no time to sit there watching it I am afraid. I wasnt asked to sit and watch it either, just to restart it on the interfaces page.

But I will try and remember that command. Thank you.

Future testing at more convenient times might be possible via yanking a cable.

What is logged during what seems to be a connection attempt before I intervened, I removed all noise like service restarts etc.

If I remember right things like timeout settings used to be tunable?.

Jul 26 06:11:28 	kernel 	222645 	if_pppoe: pppoe2: LCP keepalive timeout
Jul 26 06:10:40 	kernel 	222596 	if_pppoe: pppoe: GENERIC ERROR (errortag 1)
Jul 26 06:09:46 	kernel 	222542 	pppoe2: link state changed to UP
Jul 26 06:09:46 	kernel 	222542 	pppoe2: link state changed to DOWN
Jul 26 06:09:46 	kernel 	222542 	pppoe2: link state changed to UP
Jul 26 06:09:38 	kernel 	222535 	pppoe2: received unexpected PADO
Jul 26 06:09:38 	kernel 	222535 	pppoe2: received unexpected PADO
Jul 26 06:09:38 	kernel 	222535 	pppoe2: link state changed to DOWN
Jul 26 06:09:38 	kernel 	222535 	if_pppoe: pppoe2: LCP keepalive timeout

chrcoluk

@w0w I am seeing indications in the log it isnt going down as you suggested, as an example dpinger stayed running on static pid, logging almost every second ping failures. Also no switch of gateway, when the switch criteria is "down" state. (my phone was left in all night as a internet uplink).
If its failing to go to a down state, it would make some sense as to why a manual restart works and leaving it alone doesnt. As the obvious question is what is different between me restarting it manually and it trying to restart itself, there must be a difference.

When I intervened dpinger exited on signal 15.