LCP no reply to echo requests

Thondwe

@fireodo In my pfsense drops the connection "nicely" after missing echos, but the other end still has the connection up - and it takes some 10 mins for it to drop so that reconnection can occur :(

So do wonder if the pppoe server (I guess) is just going slow for a bit so the echos aren't timely - hence making my end more tolerant would help?

stephenw10

You might be able to set a custom mpd.conf file to ignore LCP echos or set different values but you probably don't want to. The other end should always reply there.
dpinger shows that error when it can no lnoger send packets at all because the ppp link has been brought down but it might be failing to see ping responses before that.

fireodo

@thondwe said in LCP no reply to echo requests:

So do wonder if the pppoe server (I guess) is just going slow for a bit so the echos aren't timely - hence making my end more tolerant would help?

You could change the way the PPPoE client is reacting at the loose of echo requests by editing the /etc/inc/interfaces.inc at line 2565 - but be careful you could ruin the functionality of your pfsense! Maybe this:
Redmine helps you!

stephenw10

You don't have to do that, you can use a custom mpd conf file and add whatever value you need.
Copy your existing conf file from /var/etc to /conf. So for exmaple:
cp /var/etc/mpd_wan.conf /conf/mpd_wan.conf

pfSense use a conf file there in preference to the generated one so you can make any valid changes to it and they will be retained.

Steve

Thondwe

@stephenw10 Thanks for that - does that file/folder also get included in a backup?

I'm waiting to see if I get any joy from the ISP - still at the stage of pointing issue at my end. Meanwhile diving down the rabbit hole, leads me to

OpenReach etc - waiting 10 mins before the old connection is cleared is "normal"
PPPoE isn't maybe the best option for FTTP - but legacy...

If I get no joy from ISP end, I'll try decreasing the "sensitivity" to the timeouts.

Paul

stephenw10

It doesn't get backed up with the config by default, no. Though you could use the Filer package to do so.
The custom conf file is meant to be used for debugging really.

fireodo

@stephenw10 said in LCP no reply to echo requests:

You don't have to do that, you can use a custom mpd conf file and add whatever value you need.
Copy your existing conf file from /var/etc to /conf. So for exmaple:
cp /var/etc/mpd_wan.conf /conf/mpd_wan.conf

pfSense use a conf file there in preference to the generated one so you can make any valid changes to it and they will be retained.

Steve

Indeed thats the better solution!

Regards,
fireodo

Thondwe

@thondwe For now I've increased the time it waits before shutting down my connection to 600 seconds.

My working theory is that the far end pppoe server is just getting busy or there's some sort of server/router failover going on - with 600 secs should see if far end resumes sending the echo responses at all...

Paul

stephenw10

It's possible. I'll be interested in that result, I've seen issues with PPPoE to Openreach myself. Though not for a long time.

Thondwe

@stephenw10 Update. Another drop last night...

May 16 21:26:45 ppp 33312 [wan_link0] LCP: no reply to 1 echo request(s)
May 12 23:47:30 ppp 92558 [wan_link0] LCP: no reply to 1 echo request(s)
May 12 13:24:13 ppp 32028 [wan_link0] LCP: no reply to 1 echo request(s)
May 11 20:39:27 ppp 32028 [wan_link0] LCP: no reply to 1 echo request(s)

Gateway Ping started dropping just before this - and earlier in the evening, but apparently not sufficient to kill the PPPoE connection

May 16 21:34:41 dpinger 84988 WAN_PPPOE 8.8.8.8: sendto error: 65
May 16 21:26:17 dpinger 84988 WAN_PPPOE 8.8.8.8: Alarm latency 7974us stddev 148us loss 22%
May 16 18:03:42 dpinger 84988 WAN_PPPOE 8.8.8.8: Clear latency 7898us stddev 144us loss 6%
May 16 18:02:43 dpinger 84988 WAN_PPPOE 8.8.8.8: Alarm latency 7915us stddev 175us loss 22%

Checked the ONT while the LCP echos were failing - Port Flashing Green, LOS off, PON Green, Power Green.

Firewall showed outgoing traffic on WAN connection, but nothing in.

Disabled/Reenabled WAN NIC and reconnected straight away.

Frustrated!

stephenw10

Hmm, that does seem like the upstream server really did stop responding then.

How long did you leave it? I assume it failed to recover by itself even after the extended time?

Thondwe

@stephenw10 The failure I had at midnight did recover after about 10 mins - so assume that the session timed out and allowed my side to reconnect.

Suspect it fits other patterns of people complaining of intermittent failures - just that pfsense (and/or) I have more evidence of the symptoms?

Thing is it's also been rock solid stable for most of the time I've had it, then it gets these odd wobbles.

Next time I'll grab all the logs of the box and see what else is happening - would something cause the firewall/pfblocker to block ALL incoming traffic for a time??

Paul

stephenw10

It could, but not if the state was initiated outbound as it is with the monitoring pings. And that would have no affect on anything at the LCP layer. pf doesn't see those at all.

Thondwe

@stephenw10 So been digging through this thread...

MPD Fix

Seems that something goes on between pfsense and the ONT which changed between mpd versions - i.e. old works, new not so good. Which I think I read as when the link fails, pfsense doesn't trigger the ONT to close things down so has to wait to reconnect - which it does after about 10 mins (from my midnight failure).

If I can fix that, then at least recovery will be quicker and/or automatic?

Still doesn't explain why it's dying, but I'm still pointing the blame at the ISP end?

Not sure how much is done inside the ONT - assume its mostly equivalent to a fibre-ethernet bridge - i.e. all the PPPoE stuff is on a server somewhere?

stephenw10

I wouldn't expect the ONT to be involved there. Or the immediate remote side. If the session incorrectly stays open it would be between pfSense and the remote PPPoE server which can be some way upstream from the fiber.

Thondwe

@stephenw10

Figured ONT was dumb to the pppoe layer - I assume from reading elsewhere that - this from the PPP logs...

Name: "acc-aln8.sx"

Is likely to be the name of the pppoe server in a data centre somewhere?

Plenty of threads on various forums out there with similar problems - i.e. dropped connections with openreach based FTTP and various vendors kit - and plenty more that go down the mpd5 rabbit hole which suggests that there's a bug somewhere in the freebsd stack mpd5/netgraph/drivers which causes at least the bad shutdown and hence the wait for the far end to timeout.

Very few fixes - but one of the easiest for me to try is to stuff a switch between the pfsense box and the ONT (think is related to bug being in the ethernet card driver (my intel cards not mentioned though!)? Anyway had some free ports on my switch so stuck them into their own VLAN world and will see what happens. Have a few other options because of this to try more extensive packet captures, alternative firewalls, etc.

stephenw10

Yes that server is probably at your ISP (or virtual ISP) and not at Openreach directly. And there's a good chance it varies between connections through some sort of load-balancing.
But, yes, try to solve the link failures first. That is certainly exaggerating whatever issues might be present in the PPP layer.

Thondwe

@stephenw10 agreed - the pfsense piece is frustrating and I suspect other (not bsd based mpd) devices mask the isp behaviour as they reconnect straight away and the isp considers a few reconnects a day as just noise - users resetting, local power outs etc.

Pfsense mpd not reconnecting nicely (seems same across other bsd platforms) is the killer with no obvious fix…. Very frustrating…

Thondwe

@thondwe Update: Put a switch in between ONT and pfsense box (Protectli) and touching wood/crossing fingers it's been running fine for 3 days so far.

Possible theories

Dodgy plug at pfsense end - seems unlikely for intermittent fault with no packets being dropped?
ONT and pfsense has a issue with EEE/802.3az power management? Digging through other threads and some official pfsense documentation it seems disabling EEE for the Intel igb driver cards is recommended. Most drop outs have occurred late evening (bed time for the kids) so possible EEE doing it's thing enough to drop connection? Anyway switch doesn't have EEE so disables EEE for both devices. Can't confirm that EEE is enabled by default for pfsense - should be a command to check but don't know what it is.

Can't find any docs about the ONT (BT Nokia) to confirm it supports EEE but would not be surprised - but SmartHub 6 doesn't seem to have 802.3az as a feature and perhaps it's not that common for it to be enabled/available on home routers?

Waiting to see if things carry on working reliably!

stephenw10

It could also be that the switch-ONT link is still flapping but pfSense doesn't see that any longer so the problem is no longer causing the same sort of issues.