LCP no reply to echo requests

Thondwe

@stephenw10 Thanks for that - does that file/folder also get included in a backup?

I'm waiting to see if I get any joy from the ISP - still at the stage of pointing issue at my end. Meanwhile diving down the rabbit hole, leads me to

OpenReach etc - waiting 10 mins before the old connection is cleared is "normal"
PPPoE isn't maybe the best option for FTTP - but legacy...

If I get no joy from ISP end, I'll try decreasing the "sensitivity" to the timeouts.

Paul

stephenw10

It doesn't get backed up with the config by default, no. Though you could use the Filer package to do so.
The custom conf file is meant to be used for debugging really.

fireodo

@stephenw10 said in LCP no reply to echo requests:

You don't have to do that, you can use a custom mpd conf file and add whatever value you need.
Copy your existing conf file from /var/etc to /conf. So for exmaple:
cp /var/etc/mpd_wan.conf /conf/mpd_wan.conf

pfSense use a conf file there in preference to the generated one so you can make any valid changes to it and they will be retained.

Steve

Indeed thats the better solution!

Regards,
fireodo

Thondwe

@thondwe For now I've increased the time it waits before shutting down my connection to 600 seconds.

My working theory is that the far end pppoe server is just getting busy or there's some sort of server/router failover going on - with 600 secs should see if far end resumes sending the echo responses at all...

Paul

stephenw10

It's possible. I'll be interested in that result, I've seen issues with PPPoE to Openreach myself. Though not for a long time.

Thondwe

@stephenw10 Update. Another drop last night...

May 16 21:26:45 ppp 33312 [wan_link0] LCP: no reply to 1 echo request(s)
May 12 23:47:30 ppp 92558 [wan_link0] LCP: no reply to 1 echo request(s)
May 12 13:24:13 ppp 32028 [wan_link0] LCP: no reply to 1 echo request(s)
May 11 20:39:27 ppp 32028 [wan_link0] LCP: no reply to 1 echo request(s)

Gateway Ping started dropping just before this - and earlier in the evening, but apparently not sufficient to kill the PPPoE connection

May 16 21:34:41 dpinger 84988 WAN_PPPOE 8.8.8.8: sendto error: 65
May 16 21:26:17 dpinger 84988 WAN_PPPOE 8.8.8.8: Alarm latency 7974us stddev 148us loss 22%
May 16 18:03:42 dpinger 84988 WAN_PPPOE 8.8.8.8: Clear latency 7898us stddev 144us loss 6%
May 16 18:02:43 dpinger 84988 WAN_PPPOE 8.8.8.8: Alarm latency 7915us stddev 175us loss 22%

Checked the ONT while the LCP echos were failing - Port Flashing Green, LOS off, PON Green, Power Green.

Firewall showed outgoing traffic on WAN connection, but nothing in.

Disabled/Reenabled WAN NIC and reconnected straight away.

Frustrated!

stephenw10

Hmm, that does seem like the upstream server really did stop responding then.

How long did you leave it? I assume it failed to recover by itself even after the extended time?

Thondwe

@stephenw10 The failure I had at midnight did recover after about 10 mins - so assume that the session timed out and allowed my side to reconnect.

Suspect it fits other patterns of people complaining of intermittent failures - just that pfsense (and/or) I have more evidence of the symptoms?

Thing is it's also been rock solid stable for most of the time I've had it, then it gets these odd wobbles.

Next time I'll grab all the logs of the box and see what else is happening - would something cause the firewall/pfblocker to block ALL incoming traffic for a time??

Paul

stephenw10

It could, but not if the state was initiated outbound as it is with the monitoring pings. And that would have no affect on anything at the LCP layer. pf doesn't see those at all.

Thondwe

@stephenw10 So been digging through this thread...

MPD Fix

Seems that something goes on between pfsense and the ONT which changed between mpd versions - i.e. old works, new not so good. Which I think I read as when the link fails, pfsense doesn't trigger the ONT to close things down so has to wait to reconnect - which it does after about 10 mins (from my midnight failure).

If I can fix that, then at least recovery will be quicker and/or automatic?

Still doesn't explain why it's dying, but I'm still pointing the blame at the ISP end?

Not sure how much is done inside the ONT - assume its mostly equivalent to a fibre-ethernet bridge - i.e. all the PPPoE stuff is on a server somewhere?

stephenw10

I wouldn't expect the ONT to be involved there. Or the immediate remote side. If the session incorrectly stays open it would be between pfSense and the remote PPPoE server which can be some way upstream from the fiber.

Thondwe

@stephenw10

Figured ONT was dumb to the pppoe layer - I assume from reading elsewhere that - this from the PPP logs...

Name: "acc-aln8.sx"

Is likely to be the name of the pppoe server in a data centre somewhere?

Plenty of threads on various forums out there with similar problems - i.e. dropped connections with openreach based FTTP and various vendors kit - and plenty more that go down the mpd5 rabbit hole which suggests that there's a bug somewhere in the freebsd stack mpd5/netgraph/drivers which causes at least the bad shutdown and hence the wait for the far end to timeout.

Very few fixes - but one of the easiest for me to try is to stuff a switch between the pfsense box and the ONT (think is related to bug being in the ethernet card driver (my intel cards not mentioned though!)? Anyway had some free ports on my switch so stuck them into their own VLAN world and will see what happens. Have a few other options because of this to try more extensive packet captures, alternative firewalls, etc.

stephenw10

Yes that server is probably at your ISP (or virtual ISP) and not at Openreach directly. And there's a good chance it varies between connections through some sort of load-balancing.
But, yes, try to solve the link failures first. That is certainly exaggerating whatever issues might be present in the PPP layer.

Thondwe

@stephenw10 agreed - the pfsense piece is frustrating and I suspect other (not bsd based mpd) devices mask the isp behaviour as they reconnect straight away and the isp considers a few reconnects a day as just noise - users resetting, local power outs etc.

Pfsense mpd not reconnecting nicely (seems same across other bsd platforms) is the killer with no obvious fix…. Very frustrating…

Thondwe

@thondwe Update: Put a switch in between ONT and pfsense box (Protectli) and touching wood/crossing fingers it's been running fine for 3 days so far.

Possible theories

Dodgy plug at pfsense end - seems unlikely for intermittent fault with no packets being dropped?
ONT and pfsense has a issue with EEE/802.3az power management? Digging through other threads and some official pfsense documentation it seems disabling EEE for the Intel igb driver cards is recommended. Most drop outs have occurred late evening (bed time for the kids) so possible EEE doing it's thing enough to drop connection? Anyway switch doesn't have EEE so disables EEE for both devices. Can't confirm that EEE is enabled by default for pfsense - should be a command to check but don't know what it is.

Can't find any docs about the ONT (BT Nokia) to confirm it supports EEE but would not be surprised - but SmartHub 6 doesn't seem to have 802.3az as a feature and perhaps it's not that common for it to be enabled/available on home routers?

Waiting to see if things carry on working reliably!

stephenw10

It could also be that the switch-ONT link is still flapping but pfSense doesn't see that any longer so the problem is no longer causing the same sort of issues.

Thondwe

@stephenw10 Guess I need to prove it - assuming I get a few more days without incident, there are some settings to disable EEE which I'll try with the old wiring setup and see if that's stable.

Problem is that I've had many months of stability in the past - so unless it shows an incident pretty quick I may not know.

FYI - seems that this site may describe the innards of the ONTs - it's in a BT Openreach branded box but otherwise all the lights and sockets match.

Nokia ONT chipsets

Paul